Description
xref #14205
The .loc
method of DataFrame
with different dtypes yields coerced type even if the resulting slice does only contain elements from one type. This happens only when selecting a single row.
I can guess that this might be intended because the implementation of loc
seems to first lookup the row as a single Series, doing the coercion and then applying the second (column) indexer.
However, when the column indexer narrows down the selection such that the upcasting would not have been necessary in the first place, it can be very surprising and may even cause bugs (on user-side) if it goes unnoticed. (Like, "I was sure that those column was int64
").
>>> import pandas as pd
>>> d = pd.DataFrame(dict(a=[1.23]))
>>> d["b"] = 666 # adding column with int
>>> d.info() # info as expected (column b is int64 - fine)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 2 columns):
a 1 non-null float64
b 1 non-null int64
dtypes: float64(1), int64(1)
memory usage: 24.0 bytes
>>> d.loc[0,"b"] # UNEXPECTED: returning a single float
666.0
>>> d.ix[0, "b"] # OK: returns a single int
666
>>> d.loc[[0], "b"] # OK
0 666
Name: b, dtype: int64
Feel free to close if the behavior s intended. Maybe this this a "bug" or an suggested API change. I dunno.
Perhaps related to #10503, #9519, #9269, #11594 ?
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Darwin
OS-release: 15.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
pandas: 0.17.0
[...]