Skip to content

DataFrame._init_dict handles columns with nan incorrectly if columns passed separately #16894

Closed
@kernc

Description

@kernc

Code Sample, a copy-pastable example if possible

>>> df = pd.DataFrame({np.nan: [1, 2]})
>>> df[np.nan]   # Arguably expectedly, nan matches nan
0    1
1    2
Name: nan, dtype: int64

>>> df = pd.DataFrame({np.nan: [1, 2], 2: [2, 3]}, columns=[np.nan, 2])
>>> df   # nan from dict didn't match nan from ensured Float64Index
  NaN    2.0
0  NaN     2
1  NaN     3

Problem description

When DataFrame is initialized from dict, if columns are passed, nan isn't recognized and retrieved from dict correctly. The problem is in loops like:

columns = _ensure_index(columns)  # Float64Index
for c in columns:  # c = np.float64(np.nan)  (is not np.nan)
    if c in data_dict:  # c is not in dict
        ....

If columns aren't passed separately, initialization works as expected.

>>> pd.DataFrame({np.nan: [1, 2], 2: [2, 3]})
   NaN    2.0
0     1     2
1     2     3

Consistentcy would be nice.

Expected Output

>>> df = pd.DataFrame({np.nan: [1, 2], 2: [2, 3]}, columns=[np.nan, 2])
>>> df   # nan from dict matches nan from Float64Index
  NaN    2.0
0  1     2
1  2     3

Output of pd.show_versions()

pandas 0.21.0.dev+225.gb55b1a2fe

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions