Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Compared to pd.get_dummies()
, Series.str.get_dummies()
behaves so differently and has much more limited functionality. Such differences would not be user-friendly.
Feature Description
-
The dtype of the return DataFrame of
Series.str.get_dummies()
should bebool
, notint64
.s = pd.Series(list('abca')) s.str.get_dummies()
before:
a b c 0 1 0 0 1 0 1 0 2 0 0 1 3 1 0 0
after (same as
pd.get_dummies(s)
):a b c 0 True False False 1 False True False 2 False False True 3 True False False
-
prefix=
,prefix_sep=
,dummy_na=
,sparse=
, anddtype=
arguments should be added toSeries.str.get_dummies()
.s = pd.Series(['a', 'b', np.nan]) s.str.get_dummies(prefix="dummy", prefix_sep="=", dummy_na=True, dtype=float)
after (same as
pd.get_dummies(s, prefix="dummy", prefix_sep="=", dummy_na=True, dtype=float)
):dummy=a dummy=b dummy=nan 0 1.0 0.0 0.0 1 0.0 1.0 0.0 2 0.0 0.0 1.0
Note: Among the arguments of
pd.get_dummies()
, thecolumns=
argument is obviously not needed forSeries.str.get_dummies()
. WhetherSeries.str.get_dummies()
needs adrop_first=
argument is debatable sinceSeries.str.get_dummies()
can yieldTrue
in multiple columns unlikepd.get_dummies()
.
Alternative Solutions
While there are countless alternatives to obtaining DataFrames that yield the same result, there is no alternative that would bring consistency to the two methods. The only alternative might be to simply deprecate Series.str.get_dummies()
.
Additional Context
No response