Skip to content

BUG: Subtracting two series with unordered index and all-nan index produces unexpected result #38439

Closed
@ssche

Description

@ssche
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

>>> import pandas as pd
>>> import numpy as np
>>> a_index = pd.MultiIndex.from_tuples([
...   (81.0,  np.nan, '2018-06-01'),
...   (81.0,  np.nan, '2018-07-01'),
...   (82.0,  np.nan, '2018-07-01'),
...   (82.0,  np.nan, '2018-08-01'),
...   (np.nan,np.nan, np.nan)],
...    names=['id', 'sub_ix', 'data']
... )
>>> a_values = [25, 22, 20, 21, np.nan]
>>> b_index = pd.MultiIndex.from_tuples([
...   (81.0, np.nan,  '2018-06-01'),
...   (np.nan, np.nan, np.nan),
...   (81.0, np.nan,  '2018-07-01'),
...   (82.0, np.nan,  '2018-07-01'),
...   (82.0, np.nan,  '2018-08-01')],
...   names=['id', 'sub_ix', 'data']
... )
>>> b_values = [28.28, np.nan, 28.28, 25.25, 25.25]
>>> a = pd.Series(a_values, index=a_index)
>>> b = pd.Series(b_values, index=b_index)


>>> a
id    sub_ix  data      
81.0  NaN     2018-06-01    25.0
              2018-07-01    22.0
82.0  NaN     2018-07-01    20.0
              2018-08-01    21.0
NaN   NaN     NaN            NaN
dtype: float64


>>> b
id    sub_ix  data      
81.0  NaN     2018-06-01    28.28
NaN   NaN     NaN             NaN
81.0  NaN     2018-07-01    28.28
82.0  NaN     2018-07-01    25.25
              2018-08-01    25.25
dtype: float64



>>> a - b
id    sub_ix  data      
81.0  NaN     2018-06-01   -3.28
              2018-07-01     NaN <-- this shouldn't be NaN, the index (81.0, NaN, 2018-07-01) exists in both `a` and `b` (it's just not ordered in `b`)
82.0  NaN     2018-07-01   -8.28 <-- also wrong, expected -5.25
              2018-08-01   -4.25
NaN   NaN     NaN            NaN
dtype: float64
>>> a - b.sort_index()
id    sub_ix  data      
81.0  NaN     2018-06-01   -3.28
              2018-07-01   -6.28 <-- expected value
82.0  NaN     2018-07-01   -5.25
              2018-08-01   -4.25
NaN   NaN     NaN            NaN
dtype: float64

Problem description

When combining two series with both the same index and with an all-nan index row at different positions, the result of the arithmetic operation (+, -, /) is not as expected. The issue can be worked around by sorting both indices (series.sort_index). I tried a different example with unordered indices, but without the all-nan index row and the result is as expected (so it's not an issue of the unsorted indices).

id    sub_ix  data      
81.0  NaN     2018-07-01    25
              2018-06-01    22
82.0  NaN     2018-07-01    20
              2018-08-01    21
dtype: int64
>>> b
id    sub_ix  data      
81.0  NaN     2018-06-01    1
              2018-07-01    2
82.0  NaN     2018-07-01    3
              2018-08-01    4
dtype: int64
>>> a - b
id    sub_ix  data      
81.0  NaN     2018-06-01    21
              2018-07-01    23
82.0  NaN     2018-07-01    17
              2018-08-01    17
dtype: int64

Expected Output

Operands should be aligned as per index (despite all nan-rows in the index).

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 65f0463
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.9.11-200.fc33.x86_64
Version : #1 SMP Tue Nov 24 18:18:01 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8

pandas : 1.2.0.dev0+1441.g65f0463d3
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.2
Cython : 0.29.21
pytest : 5.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 0.9.6
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : 1.3.1
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : 2.7.1
odfpy : None
openpyxl : 1.8.6
pandas_gbq : None
pyarrow : 1.0.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

Metadata

Metadata

Assignees

Labels

Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateMultiIndexNeeds TestsUnit test(s) needed to prevent regressionsNumeric OperationsArithmetic, Comparison, and Logical operationsgood first issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions