Skip to content

Regression in casting Series to DataFrame with .name='foo' and columns=['bar'] #7893

Closed
@qwhelan

Description

@qwhelan

xref #13421 with a MultiIndex as the columns

Hi,

I encountered an edge case in DataFrame initialization with something like the following:

In [1]: from pandas import *
In [2]: s = Series(1, name='foo')
In [3]: df = DataFrame(s, columns=['bar'])
In [4]: df

Empty DataFrame
Columns: [bar]
Index: []

This happens in both 0.14.1 and 0.13.1, but this isn't really a bug as the docs exclude Series as a valid type for data=. That being said, this casting appears to work whenever .name is None or when .name equals what's passed to columns=, so failure in this particular case is rather surprising.

The mechanism appears to be:

  • DataFrame.__init__ upgrades .name to the column name, if it is not None
  • Then, the data columns are sliced with the list passed to columns=, resulting in an empty data set when the two differ.
  • This seems to only occur when a Series is directly passed as data=. I can't get this to occur with [Series, ...] or a dict of Series.

The options I see are (in order of my personal preference):

  • do the implicit rename (only occurs with single Series, so no ambiguity)
  • just don't allow a Series being passed as data=.
  • throw an exception due to the ambiguity

I don't see just documenting this behavior as being viable, as this edge case effectively leads to data loss.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ConstructorsSeries/DataFrame/Index/pd.array ConstructorsNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions