Skip to content

BUG: One column 2d arrays not coerced to 1d with ArrayManager #44788

Closed
@ivirshup

Description

@ivirshup

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import os
os.environ["PANDAS_DATA_MANAGER"] = "array"

import pandas as pd, numpy as np

df = pd.DataFrame(index=np.arange(10))
df["foo"] = np.ones((10, 1))
# ValueError: Expected a 1D array, got an array with shape (10, 1)


<details>
<summary> Full traceback </summary>

```pytb
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

/usr/local/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/usr/local/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'foo'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/pandas/core/frame.py in _set_item_mgr(self, key, value)
   3750         try:
-> 3751             loc = self._info_axis.get_loc(key)
   3752         except KeyError:

/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 

KeyError: 'foo'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/var/folders/bd/43q20k0n6z15tdfzxvd22r7c0000gn/T/ipykernel_14506/151211532.py in <module>
      5 
      6 df = pd.DataFrame(index=np.arange(10))
----> 7 df["foo"] = np.ones((10, 1))

/usr/local/lib/python3.9/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   3610         else:
   3611             # set column
-> 3612             self._set_item(key, value)
   3613 
   3614     def _setitem_slice(self, key: slice, value):

/usr/local/lib/python3.9/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   3795                     value = np.tile(value, (len(existing_piece.columns), 1)).T
   3796 
-> 3797         self._set_item_mgr(key, value)
   3798 
   3799     def _set_value(

/usr/local/lib/python3.9/site-packages/pandas/core/frame.py in _set_item_mgr(self, key, value)
   3752         except KeyError:
   3753             # This item wasn't present, just insert at end
-> 3754             self._mgr.insert(len(self._info_axis), key, value)
   3755         else:
   3756             self._iset_item_mgr(loc, value)

/usr/local/lib/python3.9/site-packages/pandas/core/internals/array_manager.py in insert(self, loc, item, value)
    872                 value = value[0, :]  # type: ignore[index]
    873             else:
--> 874                 raise ValueError(
    875                     f"Expected a 1D array, got an array with shape {value.shape}"
    876                 )

ValueError: Expected a 1D array, got an array with shape (10, 1)
```

Issue Description

Using the array manager, a 2d array with only one non-singleton dimension causes errors when assigned to a dataframe.

This does not happen with the BlockManager. I would assume the same thing would work, since these should be pretty equivalent.

Expected Behavior

This works fine:

The behavior with the block manager is expected.

import os
os.environ["PANDAS_DATA_MANAGER"] = "block"

import pandas as pd, numpy as np

df = pd.DataFrame(index=np.arange(10))
df["foo"] = np.ones((10, 1))

Installed Versions

INSTALLED VERSIONS
------------------
commit           : 945c9ed766a61c7d2c0a7cbb251b6edebf9cb7d5
python           : 3.9.9.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 20.6.0
Version          : Darwin Kernel Version 20.6.0: Tue Oct 12 18:33:42 PDT 2021; root:xnu-7195.141.8~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.UTF-8

pandas           : 1.3.4
numpy            : 1.20.3
pytz             : 2021.3
dateutil         : 2.8.2
pip              : 21.3.1
setuptools       : 59.0.1
Cython           : None
pytest           : 6.2.5
hypothesis       : None
sphinx           : 4.1.2
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.0.3
IPython          : 7.29.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 2021.11.1
fastparquet      : None
gcsfs            : None
matplotlib       : 3.5.0
numexpr          : 2.7.3
odfpy            : None
openpyxl         : 3.0.9
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.7.3
sqlalchemy       : None
tables           : 3.6.1
tabulate         : None
xarray           : 0.20.1
xlrd             : 1.2.0
xlwt             : None
numba            : 0.54.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions