Skip to content

BUG: Inconsistent handling of None in indices #40366

Open
@maroth96

Description

@maroth96
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


@jreback

This is an issue I came across while working on #40127.

Most Index objects can be constructed with None values:

pd.Index(['a', 'b', None], dtype='category')
pd.Index([1.0, 2.0, None], dtype='float64')
pd.Index(['2000-01', '2000-02', None], dtype='datetime64[ns]')

But (U)Int64Index and cannot accept None values:

>>> pd.Index([1, 2, None], dtype='int64')
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

This seems pretty reasonable. However, inside of a MultiIndex, None values and the int dtype can actually coexist:

>>> pd.MultiIndex.from_arrays([[1, 2, None]]).levels[0]
Int64Index([1, 2], dtype='int64')

So if we construct a DataFrame like so:

>>> df = pd.DataFrame([10, 20, 30], index=pd.MultiIndex.from_arrays([[1, 2, None]]))
      0
1    10
2    20
NaN  30

Indeed, the dtype of the first and only level is correct:

>>> df.index.levels[0]
Int64Index([1, 2], dtype='int64')

The reason this works is because of codes, which can encode None values as -1.

>>> df.index.codes
FrozenList([[0, 1, -1]])

More examples of None behavior in various scenarios:

>>> df = pd.DataFrame(np.zeros((3, 3)), columns=pd.MultiIndex.from_arrays([['a', 'b', 'c'], [1, 2, None]]))
     a    b    c
     1    2  NaN
0  0.0  0.0  0.0
1  0.0  0.0  0.0
2  0.0  0.0  0.0
>>> df.stack(0).columns
Int64Index([1, 2], dtype='int64')
>>> df.columns.droplevel(0)
Float64Index([1.0, 2.0, nan], dtype='float64')
>>> pd.Index([1, 2, None])
Index([1, 2, None], dtype='object')

I'm not sure what the best solution is to normalize the treatment of Nones inside indices. So I thought I would raise this issue and see what others think about this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsIndexRelated to the Index class or subclasses

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions