Closed
Description
I am seeing odd behavior of dates being converted to datetime64[ns]
times when that is inappropriate.
In [272]: file = StringIO("xxyyzz20100101PIE\nxxyyzz20100101GUM\nxxyyww20090101EGG\nfoofoo20080909PIE")
In [273]: df = pd.read_fwf(file, widths=[6,8,3], names=["person_id", "dt", "food"], parse_dates=["dt"])
In [274]: df
Out[274]:
person_id dt food
0 xxyyzz 2010-01-01 00:00:00 PIE
1 xxyyzz 2010-01-01 00:00:00 GUM
2 xxyyww 2009-01-01 00:00:00 EGG
3 foofoo 2008-09-09 00:00:00 PIE
Everything looks good. However:
In [275]: df.dt.value_counts()
Out[275]:
1970-01-17 00:00:00 2
1970-01-18 00:00:00 1
1970-01-25 08:00:00 1
In [276]: df.dt.dtype
Out[276]: dtype('datetime64[ns]')
The type is being stored as epoch nanoseconds internally, but usually displaying usually as the dates I want. But then everything is getting messed up when we drop down to numpy land for things like value_counts. What is going on? Is this a bug or am I making mistakes?
In [280]: pd.__version__
Out[280]: '0.10.1'
In [281]: np.__version__
Out[281]: '1.6.1'