Closed
Description
If one uses series.value_counts(dropna=False)
it includes NaN
values, but crosstab(series1, series2, dropna=False)
does not include NaN
values. I think of cross_tab
kind of like 2-dimensional value_counts()
, so this is surprising.
In [31]: x = pd.Series(['a', 'b', 'a', None, 'a'])
In [32]: y = pd.Series(['c', 'd', 'd', 'c', None])
In [33]: y.value_counts(dropna=False)
Out[33]:
d 2
c 2
NaN 1
dtype: int64
In [34]: pd.crosstab(x, y, dropna=False)
Out[34]:
c d
row_0
a 1 1
b 0 1
I believe what crosstab(..., dropna=False)
really does is just include rows/columns that would otherwise have all zeros, but that doesn't seem the same as dropna=False
to me. I would have expected to see a row and column entry for NaN
, something like:
c d NaN
row_0
a 1 1 1
b 0 1 0
NaN 1 0 0
Thoughts on this?
This is on current master (almost 0.17).