Description
Code Sample, a copy-pastable example if possible
df = pd.DataFrame( data=[[0,0,0]], columns=['level_1','level_2','payload'] )
# make dataframe empty
df = df[ df.payload == -1 ]
# columns are all int64 here
print(df.info())
#output: level_1 0 non-null int64
#output: level_2 0 non-null int64
#output: payload 0 non-null int64
# set MultiIndex - levels are still int64
df = df.set_index(['level_1','level_2'])
print(str(df.index.levels[0].dtype))
print(str(df.index.levels[1].dtype))
#output: int64
#output: int64
# reset_index - former-levels columns are now float64
df = df.reset_index()
print(df.info())
#output: level_1 0 non-null float64
#output: level_2 0 non-null float64
#output: payload 0 non-null int64
Problem description
The dtypes are preserved instead if either
a) index is not a MultiIndex
b) dataframe is not empty
(b) is a big issue for programs that calculate subset of dataframes that sometimes
can be empty, since downstream code might expect a certain dtype and fail when it finds
a float64 instead.
Real-world scenario: sampling a system (a collection of processes or threads) at regular intervals,
and collecting some measures (cpu used, or other resources or figures); a very common strategy
in performance investigation software (e.g. check Oracle's v$active_session_history).
Here, the natural index is (sample_time, process_id), sample_time being datetime64 (a Time Series).
Even more naturally, we want to computes differences of sample_time, yielding a timedelta64,
and divide it by np.timedelta64(1,'s') to get the elapsed time in seconds; but when the
initial dataframe is empty, we try to divide float64 / np.timedelta64(1,'s') and get an exception.
An obvious workaround is to check for empty dataframes after EVERY reset_index()
and coerce the float64s back to their correct value - but that easily becomes a maintenance/coverage nightmare :O
Expected Output
resetted columns having their initial dtype
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: None
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.1
openpyxl: 2.4.9
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None