Closed
Description
Code Samples
import pandas as pd
frame = pd.DataFrame({'klass':range(5), 'ts': [pd.Timestamp('2017-03-23 08:22:42.173378+01'), pd.Timestamp('2017-03-23 08:22:42.178578+01'), pd.Timestamp('2017-03-23 08:22:42.173578+01'), pd.Timestamp('2017-03-23 08:22:42.178378+01'), pd.Timestamp('2017-03-23 08:22:42.163378+01')], 'attribute':['att1', 'att2', 'att3', 'att4', 'att5'], 'value': ['a', 'b', 'c', 'd', 'd']})
# At this point, frame.ts is of dtype datetime64[ns, pytz.FixedOffset(60)]
frame.set_index(['ts', 'klass'], inplace=True)
queried_index = frame.query('value=="d"').index
pivoted_frame = frame.reset_index().pivot_table(index=['klass', 'ts'], columns='attribute', values='value', aggfunc='first')
melted_frame = pd.melt(pivoted_frame.reset_index(), id_vars=['klass', 'ts'], var_name='attribute', value_name='value')
# At this point, melted_frame.ts is of dtype datetime64[ns]
queried_after_melted_index = melted_frame.query('value=="d"').set_index(['ts', 'klass']).index
frame.loc[queried_index] # Works
frame.loc[queried_index] = 'test' # Works
frame.loc[queried_after_melted_index] # Works
frame.loc[queried_after_melted_index] = 'test' # Breaks
The last statement gives:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 140, in __setitem__
indexer = self._get_setitem_indexer(key)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 127, in _get_setitem_indexer
return self._convert_to_indexer(key, is_setter=True)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 1230, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: "MultiIndex(levels=[[2017-03-23 07:22:42.163378, 2017-03-23 07:22:42.173378, 2017-03-23 07:22:42.173578, 2017-03-23 07:22:42.178378, 2017-03-23 07:22:42.178578], [0, 1, 2, 3, 4]],\n labels=[[3, 0], [3, 4]],\n names=['ts', 'klass']) not in index"
Problem description
- It is counter-intuitive that any operation (which does not explicitly mention in its docs that it does) alters the type of any column.
- Also counter-intuitive is that
frame.loc
has different behavior in a statement than it has in an assignment.
Expected Output
melted_frame.ts
andframe.ts
have the same dtype.DataFrame.loc
fails in both cases, not just in an assignment, or succeeds in both.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-66-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 20.7.0
Cython: None
numpy: 1.12.0
scipy: None
statsmodels: None
xarray: None
IPython: 5.3.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.7.3
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None