Skip to content

Creating a column with a set replicates the set n times #32582

Closed
@AlexanderNixon

Description

@AlexanderNixon

Code Sample

If we try to define a dataframe using a dictionary containing a set, we get:

pd.DataFrame({'a':{1,2,3}})

       a
0  {1, 2, 3}
1  {1, 2, 3}
2  {1, 2, 3}

Problem description

The set is being replicated n times, n being the length of the actual set.
While defining a column with a set directly might not make a lot of sense given that they are by definition unordered collections, the behaviour in any case seems clearly unexpected.

Expected Output

In the case of a list, in order to obtain a single row containing a list, we would have to define a nested list, such as pd.DataFrame({'a':[[1,2,3]]}).
So similarly, with sets I would expect the same behaviour by defining the row with pd.DataFrame({'a':[{1,2,3}]}).

In the case of a single set, even if the order is not guaranteed to be preserved, I'd see more reasonable the same output that we would obtain with:

pd.DataFrame({'a':[1,2,3]})

   a
0  1
1  2
2  3

So:

pd.DataFrame({'a':{1,2,3}})

   a
0  1
1  2
2  3

Where:

pd.__version__
# '1.0.0'

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugConstructorsSeries/DataFrame/Index/pd.array Constructors

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions