Open
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
The get_dummies
function creates columns for all possible values of categorical series and not the ones that are observed, or are actually in the passed dataframe
Example:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'letter':['a','b','c']})
In [3]: pd.get_dummies(df[df['letter'] == 'a'])
Out[3]:
letter_a
0 1
In [4]: df['letter'] = df['letter'].astype("category")
In [5]: pd.get_dummies(df[df['letter'] == 'a'])
Out[5]:
letter_a letter_b letter_c
0 1 0 0
Feature Description
Add the observed
parameter to the get_dummies
function, which will have the same behavior as the parameter with the same name in the groupby functions
Alternative Solutions
- Change the behavior to always use the observed values only
- Document this behavior so its clear to users (users can remove the unneeded columns later if they want to)
Additional Context
No response