Skip to content

BUG: melt changes type of tz-aware columns #15785

Closed
@stigviaene

Description

@stigviaene

Code Samples

import pandas as pd
frame = pd.DataFrame({'klass':range(5), 'ts': [pd.Timestamp('2017-03-23 08:22:42.173378+01'), pd.Timestamp('2017-03-23 08:22:42.178578+01'), pd.Timestamp('2017-03-23 08:22:42.173578+01'), pd.Timestamp('2017-03-23 08:22:42.178378+01'), pd.Timestamp('2017-03-23 08:22:42.163378+01')], 'attribute':['att1', 'att2', 'att3', 'att4', 'att5'], 'value': ['a', 'b', 'c', 'd', 'd']})
# At this point, frame.ts is of dtype datetime64[ns, pytz.FixedOffset(60)]
frame.set_index(['ts', 'klass'], inplace=True)
queried_index = frame.query('value=="d"').index
pivoted_frame = frame.reset_index().pivot_table(index=['klass', 'ts'], columns='attribute', values='value', aggfunc='first')
melted_frame = pd.melt(pivoted_frame.reset_index(), id_vars=['klass', 'ts'], var_name='attribute', value_name='value')
# At this point, melted_frame.ts is of dtype datetime64[ns]
queried_after_melted_index = melted_frame.query('value=="d"').set_index(['ts', 'klass']).index
frame.loc[queried_index]  # Works
frame.loc[queried_index] = 'test'  # Works
frame.loc[queried_after_melted_index]  # Works
frame.loc[queried_after_melted_index] = 'test'  # Breaks

The last statement gives:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 140, in __setitem__
    indexer = self._get_setitem_indexer(key)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 127, in _get_setitem_indexer
    return self._convert_to_indexer(key, is_setter=True)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 1230, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: "MultiIndex(levels=[[2017-03-23 07:22:42.163378, 2017-03-23 07:22:42.173378, 2017-03-23 07:22:42.173578, 2017-03-23 07:22:42.178378, 2017-03-23 07:22:42.178578], [0, 1, 2, 3, 4]],\n           labels=[[3, 0], [3, 4]],\n           names=['ts', 'klass']) not in index"

Problem description

  • It is counter-intuitive that any operation (which does not explicitly mention in its docs that it does) alters the type of any column.
  • Also counter-intuitive is that frame.loc has different behavior in a statement than it has in an assignment.

Expected Output

  • melted_frame.ts and frame.ts have the same dtype.
  • DataFrame.loc fails in both cases, not just in an assignment, or succeeds in both.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 20.7.0
Cython: None
numpy: 1.12.0
scipy: None
statsmodels: None
xarray: None
IPython: 5.3.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.7.3
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsReshapingConcat, Merge/Join, Stack/Unstack, ExplodeTimezonesTimezone data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions