Description
I filed #13415 in which it was said that DataFrame(recarray, columns=MultiIndex)
does reindexing and so only selects matching columns to be in the resultant frame. I can see how this might be a backward compatibility constraint. However, I have discovered a similar but different case which still seems broken:
arr = np.zeros(3, [('q', [('x',float), ('y',int)])])
ind = pd.MultiIndex.from_tuples([('q','x'),('q','y')])
pd.DataFrame(arr, columns=ind)
This creates a 3x2 array of zeros, but results in a 3x2 DataFrame of NaNs. Note that the column names basically match: the NumPy array has a top-level q
with subitems x
and y
, and so does the MultiIndex
. If the top-level name in the MultiIndex is changed to something other than q
it results in an empty DataFrame, meaning that there is some recognized correspondence between the input data and the requested columns. But the data is lost nevertheless, putting NaNs where should be zeros.
Either the columns are considered non-matching, in which case the result should be an empty DataFrame, or they do match, in which case the result should be a DataFrame with contents from the input array.