Skip to content

ENH: Add the observed parameter to get_dummies #60585

Open
@alonme

Description

@alonme

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The get_dummies function creates columns for all possible values of categorical series and not the ones that are observed, or are actually in the passed dataframe

Example:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'letter':['a','b','c']})

In [3]: pd.get_dummies(df[df['letter'] == 'a'])
Out[3]:
   letter_a
0         1

In [4]: df['letter'] = df['letter'].astype("category")

In [5]: pd.get_dummies(df[df['letter'] == 'a'])
Out[5]:
   letter_a  letter_b  letter_c
0         1         0         0

Feature Description

Add the observed parameter to the get_dummies function, which will have the same behavior as the parameter with the same name in the groupby functions

Alternative Solutions

  1. Change the behavior to always use the observed values only
  2. Document this behavior so its clear to users (users can remove the unneeded columns later if they want to)

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeEnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions