Closed
Description
Right now, the output shape and dtype of DataFrame.describe for object columns depends on whether the DataFrame is empty.
Code Sample, a copy-pastable example if possible
In [75]: x = pd.DataFrame({"A": ['a', np.nan, np.nan]})
In [76]: x.describe()
Out[76]:
A
count 1
unique 1
top a
freq 1
In [77]: x.iloc[:0].describe()
Out[77]:
A
count 0
unique 0
Problem description
This leads to instability in the output dtypes and shape.
Would people prefer that we use np.NaN
or None
for the top and freq in this case? I believe there's no ambiguity, since we drop missing values before computing.
While the output consistency would be nice, it's not clear to me what's actually best for users here.