Skip to content

BUG: reset_index of level on a MultiIndex with NaT converts to np.nan #11479

Closed
@emsems

Description

@emsems

Not sure if it's know already, but couldn't find any open issues.
It's similar to this closed one:
#10388

using pandas 0.17 and numpy 1.10.1

code to reproduce the issues:

import pandas as pd
from pandas import DataFrame
import numpy as np

idx = np.arange(0, 10)  # could have an NaN?
tstamp = pd.date_range('201507010000', freq='h', periods=10).values
df = DataFrame({'id': idx, 'tstamp': tstamp, 'a': list('abcdefghij')})
df.loc[3, 'tstamp'] = pd.NaT

print 'without timezone:'
try:
    a = df.set_index(['id', 'tstamp']).reset_index('tstamp')
    print 'a works'
except Exception as e:
    print 'a fails: %s: %s' %(e.__class__.__name__, e)
try:
    b = df.set_index(['id', 'tstamp']).reset_index('tstamp').reset_index('id')
    print 'b works'
except Exception as e:
    print 'b fails: %s: %s' %(e.__class__.__name__, e)
try:
    c =  df.set_index(['id', 'tstamp']).reset_index()
    print 'c works'
except Exception as e:
    print 'c fails: %s: %s' %(e.__class__.__name__, e)
try:
    d =  df.set_index(['id', 'tstamp']).reset_index('id')
    print 'd works'
except Exception as e:
    print 'd fails: %s: %s' %(e.__class__.__name__, e)
print 'with timezone:'
df['tstamp'] = pd.DatetimeIndex(df['tstamp']).tz_localize('Europe/Berlin')
try:
    a = df.set_index(['id', 'tstamp']).reset_index('tstamp')
    print 'a works'
except Exception as e:
    print 'a fails: %s: %s' %(e.__class__.__name__, e)
try:
    b = df.set_index(['id', 'tstamp']).reset_index('tstamp').reset_index('id')
    print 'b works'
except Exception as e:
    print 'b fails: %s: %s' %(e.__class__.__name__, e)
try:
    c =  df.set_index(['id', 'tstamp']).reset_index()
    print 'c works'
except Exception as e:
    print 'c fails: %s: %s' %(e.__class__.__name__, e)
try:
    d =  df.set_index(['id', 'tstamp']).reset_index('id')
    print 'd works'
except Exception as e:
    print 'd fails: %s: %s' %(e.__class__.__name__, e)

Output:
without timezone:
a works
b works
c works
d fails: ValueError: Could not convert object to NumPy datetime
with timezone:
a fails: TypeError: data type not understood
b fails: TypeError: data type not understood
c fails: TypeError: data type not understood
d fails: ValueError: Could not convert object to NumPy datetime

Can you suggest any workaround?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions