Skip to content

BUG: Regression in 1.1.0. "invalid slicing for a 1-ndim ExtensionArray" #37631

Closed
@buhrmann

Description

@buhrmann
  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

s1 = pd.Series(list("abc")).astype("category").iloc[[0]]
s2 = pickle.loads(pickle.dumps(s1))
print(s1.dropna())
print(s2.dropna())
0    a
dtype: category
Categories (3, object): ['a', 'b', 'c']

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-1-c7b1204ccdb7> in <module>
      6 s2 = pickle.loads(pickle.dumps(s1))
      7 print(s1.dropna())
----> 8 print(s2.dropna())

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in dropna(self, axis, inplace, how)
   4883 
   4884         if self._can_hold_na:
-> 4885             result = remove_na_arraylike(self)
   4886             if inplace:
   4887                 self._update_inplace(result)

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/dtypes/missing.py in remove_na_arraylike(arr)
    564     """
    565     if is_extension_array_dtype(arr):
--> 566         return arr[notna(arr)]
    567     else:
    568         return arr[notna(np.asarray(arr))]

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in __getitem__(self, key)
    904             key = check_bool_indexer(self.index, key)
    905             key = np.asarray(key, dtype=bool)
--> 906             return self._get_values(key)
    907 
    908         return self._get_with(key)

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in _get_values(self, indexer)
    966     def _get_values(self, indexer):
    967         try:
--> 968             return self._constructor(self._mgr.get_slice(indexer)).__finalize__(self,)
    969         except ValueError:
    970             # mpl compat if we look up e.g. ser[:, np.newaxis];

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/internals/managers.py in get_slice(self, slobj, axis)
   1559 
   1560         blk = self._block
-> 1561         array = blk._slice(slobj)
   1562         block = blk.make_block_same_class(array, placement=slice(0, len(array)))
   1563         return type(self)(block, self.index[slobj])

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/internals/blocks.py in _slice(self, slicer)
   1756             if not isinstance(first, slice):
   1757                 raise AssertionError(
-> 1758                     "invalid slicing for a 1-ndim ExtensionArray", first
   1759                 )
   1760             # GH#32959 only full-slicers along fake-dim0 are valid

AssertionError: ('invalid slicing for a 1-ndim ExtensionArray', array([ True]))

Problem description

Not sure what changes in the serialization roundtrip through pickle, but it seems the copied Series cannot be indexed with a Boolean slice anymore, tripping up dropna() as a result. The following code more directly exposes the error:

s1 = pd.Series(list("abc")).astype("category").iloc[[0]]
s2 = pickle.loads(pickle.dumps(s1))
s2[[True]]

The error happens e.g. when processing multiple Series in parallel (triggering serialization with pickle), and when a categorical Series has been filtered down to a single row. With another dtype, or more than one row, this error doesn't get triggered.

The regression must been introduced in version 1.1.0, as in 1.0.5 the above code works as expected.

Expected Output

Behaviour of slicing, dropna etc. should be same before and after pickling a Series, and independent of the number of rows.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 67a3d42
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.1.4
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 20.0.2
setuptools : 46.1.1.post20200322
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 1.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.46.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions