Skip to content

DataFrame.loc[] prioritizes columns when setting with missing label #19110

Open
@toobaz

Description

@toobaz

Code Sample, a copy-pastable example if possible

In [2]: df = pd.DataFrame(np.arange(16).reshape(4, 4), index=pd.MultiIndex.from_product([[1, 2], ['a', 'b']]), columns=['a', 'b', 'c', 'd'])

In [3]: df.loc[2, 'a'] # select a row: good
Out[3]: 
a     8
b     9
c    10
d    11
Name: (2, a), dtype: int64

In [4]: df.loc[2, 'c'] # select a (part of) col: guessing game, but I understand it is a feature
Out[4]: 
a    10
b    14
Name: c, dtype: int64

In [5]: df.loc[2, 'e'] = -1 # now there is no column: add a row?

In [6]: df # ... nope, still adds a column
Out[6]: 
      a   b   c   d    e
1 a   0   1   2   3  NaN
  b   4   5   6   7  NaN
2 a   8   9  10  11 -1.0
  b  12  13  14  15 -1.0

In [7]: df.loc[3, 'f'] = -2 # what if the row label is entirely missing?

In [8]: df # sitll adds a row _and_ a col
Out[8]: 
        a     b     c     d    e    f
1 a   0.0   1.0   2.0   3.0  NaN  NaN
  b   4.0   5.0   6.0   7.0  NaN  NaN
2 a   8.0   9.0  10.0  11.0 -1.0  NaN
  b  12.0  13.0  14.0  15.0 -1.0  NaN
3     NaN   NaN   NaN   NaN  NaN -2.0

Problem description

In general, if df.index is a MultiIndex, pandas interprets the syntax df.loc[a, b] as df.loc[(a,b),:].

Out[4]: is (debatable, but) understandable: in absence of the desired row, and in presence of a column with the same name, it interprets as df.loc[(a,), b].

However, there is no reason why Out[5]: and Out[6]: should add a column: since priority when labels are present goes to the index, the same should happen when labels are absent.

Somewhat related to #17024 .

Expected Output

In [8]: df
Out[8]: 
        a     b     c     d
1 a   0.0   1.0   2.0   3.0
  b   4.0   5.0   6.0   7.0
2 a   8.0   9.0  10.0  11.0
  b  12.0  13.0  14.0  15.0
  e  -1.0  -1.0  -1.0  -1.0
3 f  -2.0  -2.0  -2.0  -2.0

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.23.0.dev0+42.g93033151a
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions