Skip to content

API/ERR/ENH: Allow MultiIndex.from_tuples to handle NaNs #23578

Closed
@h-vetinari

Description

@h-vetinari

This is the origin of #23558 and #23677, but IMO worthy to fix in its own right:

Trying to create a MultiIndex from a list of tuples containing NaNs (like Index.str.partition does if there are NaNs present) yields:

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), np.nan, ('d', '', '')])
[...]
TypeError: object of type 'float' has no len()

However, it works easily when passing a tuple of NaNs

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), (np.nan,) * 3, ('d', '', '')])
MultiIndex(levels=[['a', 'd'], ['', 'b'], ['', 'c']],
           labels=[[0, -1, 1], [1, -1, 0], [1, -1, 0]])

In fact, the length of the tuple is irrelevant:

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), (np.nan,), ('d', '', '')])
MultiIndex(levels=[['a', 'd'], ['', 'b'], ['', 'c']],
           labels=[[0, -1, 1], [1, -1, 0], [1, -1, 0]])

Aside from the inconvenience, it is also a real problem to set elements of an array (say if there are several NaNs) to tuples, because they almost always get interpreted as another axis/list-like and then give creative errors.

I'm thinking this extra fail-safe should not be controversial, and I've got a fix prepared already, which is blocked by #23582 (at least in terms of adding tests).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions