Closed
Description
xref #13421 with a MultiIndex
as the columns
Hi,
I encountered an edge case in DataFrame initialization with something like the following:
In [1]: from pandas import *
In [2]: s = Series(1, name='foo')
In [3]: df = DataFrame(s, columns=['bar'])
In [4]: df
Empty DataFrame
Columns: [bar]
Index: []
This happens in both 0.14.1 and 0.13.1, but this isn't really a bug as the docs exclude Series
as a valid type for data=
. That being said, this casting appears to work whenever .name
is None
or when .name
equals what's passed to columns=
, so failure in this particular case is rather surprising.
The mechanism appears to be:
DataFrame.__init__
upgrades.name
to the column name, if it is notNone
- Then, the data columns are sliced with the list passed to
columns=
, resulting in an empty data set when the two differ. - This seems to only occur when a
Series
is directly passed asdata=
. I can't get this to occur with[Series, ...]
or a dict ofSeries
.
The options I see are (in order of my personal preference):
- do the implicit rename (only occurs with single
Series
, so no ambiguity) - just don't allow a
Series
being passed asdata=
. - throw an exception due to the ambiguity
I don't see just documenting this behavior as being viable, as this edge case effectively leads to data loss.