Skip to content

BUG: DataFrame(MaskedRecords) inconsistent with Series behavior #38399

Closed
@jbrockmendel

Description

@jbrockmendel

Example based on tests.frame.test_constructors.TestDataFrameConstructors.test_constructor_mrecarray, which has comment "Ensure mrecarray produces frame identical to dict of masked arrays from GH3479"

import numpy as np
from numpy.ma import mrecords
import pandas as pd

arrays = [
    ("float", np.array([1.5, 2.0])),
    ("int", np.array([1, 2])),
    ("str", np.array(["abc", "def"])),
]
for name, arr in arrays[:]:
    arrays.append(
        ("masked1_" + name, np.ma.masked_array(arr, mask=[False, True]))
    )

arrays.append(("masked_all", np.ma.masked_all((2,))))
arrays.append(("masked_none", np.ma.masked_array([1.0, 2.5], mask=False)))

comb = [arrays[0], arrays[1], arrays[3]]
names, data = zip(*comb)
mrecs = mrecords.fromarrays(data, names=names)


result = pd.DataFrame(mrecs)

alt = {k: pd.Series(v) for k, v in comb}
expected = pd.DataFrame(alt, columns=names)

tm.assert_frame_equal(result, expected)   # <-- nope!

>>> result
   float  int  masked1_float
0    1.5    1   1.500000e+00
1    2.0    2   1.000000e+20

>>> expected
   float  int  masked1_float
0    1.5    1            1.5
1    2.0    2            NaN

I would expect these two to match.

Note that alt["masked1_float"]._values is a masked_array, so that is non-lossy (though alt["masked1_float"]'s repr does not make that obvious). expected["masked1_float"]._values is a regular ndarray, so that is lossy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorBugConstructorsSeries/DataFrame/Index/pd.array Constructors

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions