Closed
Description
Example based on tests.frame.test_constructors.TestDataFrameConstructors.test_constructor_mrecarray, which has comment "Ensure mrecarray produces frame identical to dict of masked arrays from GH3479"
import numpy as np
from numpy.ma import mrecords
import pandas as pd
arrays = [
("float", np.array([1.5, 2.0])),
("int", np.array([1, 2])),
("str", np.array(["abc", "def"])),
]
for name, arr in arrays[:]:
arrays.append(
("masked1_" + name, np.ma.masked_array(arr, mask=[False, True]))
)
arrays.append(("masked_all", np.ma.masked_all((2,))))
arrays.append(("masked_none", np.ma.masked_array([1.0, 2.5], mask=False)))
comb = [arrays[0], arrays[1], arrays[3]]
names, data = zip(*comb)
mrecs = mrecords.fromarrays(data, names=names)
result = pd.DataFrame(mrecs)
alt = {k: pd.Series(v) for k, v in comb}
expected = pd.DataFrame(alt, columns=names)
tm.assert_frame_equal(result, expected) # <-- nope!
>>> result
float int masked1_float
0 1.5 1 1.500000e+00
1 2.0 2 1.000000e+20
>>> expected
float int masked1_float
0 1.5 1 1.5
1 2.0 2 NaN
I would expect these two to match.
Note that alt["masked1_float"]._values
is a masked_array, so that is non-lossy (though alt["masked1_float"]
's repr does not make that obvious). expected["masked1_float"]._values
is a regular ndarray, so that is lossy.