Description
xref #19602
Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: '0.25.3'
In [3]: df = pd.DataFrame({'mixed' : [1, 2, 'abc', 'def'], 'ints': [100, 200, 3
...: 00, 400]})
In [4]: df
Out[4]:
mixed ints
0 1 100
1 2 200
2 abc 300
3 def 400
In [5]: df.dtypes
Out[5]:
mixed object
ints int64
dtype: object
In [6]: df.query('ints < 300').set_index('mixed').index
Out[6]: Int64Index([1, 2], dtype='int64', name='mixed')
In [7]: df.set_index('mixed').query('ints < 300').index
Out[7]: Index([1, 2], dtype='object', name='mixed')
Problem description
In the above, I start with a DataFrame
with a column mixed
that has both integer and string values.
In statement [6], I do a query on a different column and then set the index to be the column mixed
. The resulting index now has an int64
dtype as opposed to having the dtype preserved from the original column.
But in statement [7], I first set the index, and then do the query, and now the index has the object
dtype.
This becomes an issue if one does some computation on the queried DataFrame and then create the index mixed
, and then you want to merge it back to the original DataFrame
. Now the original one will have mixed
as dtype 'O'
and the new one has mixed
as dtype 'int
'
Expected Output
From statement [6], I would have expected:
Index([1, 2], dtype='object', name='mixed')
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.54.2
sphinx : 2.3.0
blosc : None
feather : None
xlsxwriter : 1.2.6
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.2
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.11
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.6