Skip to content

BUG: series.reindex(mi) behaves different for series with Index and MultiIndex #60923

Open
@ssche

Description

@ssche

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

  • Create a series with Index and a MultiIndex to use for reindexing later
>>> series = pd.Series(
...   [26.7300, 24.2550],
...   index=pd.Index([81, 82], name='a')
... )
>>> series
a
81    26.730
82    24.255
dtype: float64
>>> series.index
Index([81, 82], dtype='int64', name='a')
>>> other_index = pd.MultiIndex(
...   levels=[
...     pd.Index([81, 82], name='a'),
...     pd.Index([np.nan], name='b'),
...     pd.Index([
...       '2018-06-01', '2018-07-01'
...     ], name='c')
...   ],
...   codes=[
...     [0, 0, 1, 1],
...     [0, 0, 0, 0],
...     [0, 1, 0, 1]
...   ],
...   names=['a', 'b', 'c']
... )
>>> other_index
MultiIndex([(81, nan, '2018-06-01'),
            (81, nan, '2018-07-01'),
            (82, nan, '2018-06-01'),
            (82, nan, '2018-07-01')],
           names=['a', 'b', 'c'])
  • reindex to MultiIndex (other_index) which expands series.index by two more levels.
  • unfortunately the reindex sets all values of the original series to NaN which can be fixed by turning series.index into a 1-level MultiIndex first
>>> series.reindex(other_index) # this removes all values of the series
a   b    c         
81  NaN  2018-06-01   NaN
         2018-07-01   NaN
82  NaN  2018-06-01   NaN
         2018-07-01   NaN
dtype: float64
  • apply to_mi(...) to turn the series.index into a 1-level MultiIndex
  • rerun reindex on the new series with MultiIndex and the values are maintained/filled as expected
>>> def to_mi(series):
...   if isinstance(series.index, pd.MultiIndex):
...     series_mi = series.index
...   else:
...     level_names = [series.index.name]
...     level_values = [series.index]
...     series_mi = pd.MultiIndex.from_arrays(level_values, names=level_names)
...   series_with_mi = pd.Series(series.values, index=series_mi, name=series.name)
...   return series_with_mi
... 
>>> series_mi = to_mi(series)
>>> series_mi
a 
81    26.730
82    24.255
dtype: float64
>>> series_mi.index
MultiIndex([(81,),
            (82,)],
           names=['a'])
>>> series_mi.reindex(other_index)
a   b    c         
81  NaN  2018-06-01    26.730
         2018-07-01    26.730
82  NaN  2018-06-01    24.255
         2018-07-01    24.255
dtype: float64

Issue Description

In the above case, series.reindex(multi_index) will turn the series values to NaN when the series has a single Index. However when the series index is converted to a 1-level MultiIndex prior to the reindex, the values are maintained and filled as expected.

In my opinion it shouldn't matter if a 1-level MultiIndex or an Index is used for a reindex - the outcomes should be the same.

As a further discussion point (here or elsewhere), this issue (and others) also begs the question why a distinction between Index and MultiIndex is necessary (I suspect there are historic reasons). I would imagine that many issues (and code) would go away if MultiIndex was used exclusively (even for 1-dimensional indices).

Expected Behavior

The missing levels in series_mi (compared to other_index) are added and the values of the partial index from the original series are used to fill the places of the added indices.

>>> series_mi.reindex(other_index)
a   b    c         
81  NaN  2018-06-01    26.730 # from index <81> of `series` (`series_mi`)
         2018-07-01    26.730 # from index <81> of `series` (`series_mi`)
82  NaN  2018-06-01    24.255 # from index <82> of `series` (`series_mi`)
         2018-07-01    24.255 # from index <82> of `series` (`series_mi`)
dtype: float64

Installed Versions

INSTALLED VERSIONS

commit : 3979e95
python : 3.11.11
python-bits : 64
OS : Linux
OS-release : 6.12.11-200.fc41.x86_64
Version : #1 SMP PREEMPT_DYNAMIC Fri Jan 24 04:59:58 UTC 2025
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8

pandas : 3.0.0.dev0+1909.g3979e954a3.dirty
numpy : 1.26.4
dateutil : 2.9.0.post0
pip : 24.2
Cython : 3.0.11
sphinx : 8.1.3
IPython : 8.32.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.3
blosc : None
bottleneck : 1.4.2
fastparquet : 2024.11.0
fsspec : 2025.2.0
html5lib : 1.1
hypothesis : 6.125.2
gcsfs : 2025.2.0
jinja2 : 3.1.5
lxml.etree : 5.3.0
matplotlib : 3.10.0
numba : 0.61.0
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.5
psycopg2 : 2.9.10
pymysql : 1.4.6
pyarrow : 19.0.0
pyreadstat : 1.2.8
pytest : 8.3.4
python-calamine : None
pytz : 2025.1
pyxlsb : 1.0.10
s3fs : 2025.2.0
scipy : 1.15.1
sqlalchemy : 2.0.38
tables : 3.10.2
tabulate : 0.9.0
xarray : 2024.9.0
xlrd : 2.0.1
xlsxwriter : 3.2.2
zstandard : 0.23.0
tzdata : 2025.1
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexRelated to the Index class or subclassesMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions