Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html
Documentation problem
The Notes section for describe
states the following (emphasis mine):
For object data (e.g. strings or timestamps), the result’s index will include
count
,unique
,top
, andfreq
. Thetop
is the most common value. Thefreq
is the most common value’s frequency. Timestamps also include thefirst
andlast
items.
Since pandas 2.0 began treating Timestamps as numeric data, as far as I can tell, calling describe
on a Series/DF with Timestamp data no longer yields the first
or last
rows. In fact, the example included in the documentation also has this behavior:
>>> s = pd.Series([
... np.datetime64("2000-01-01"),
... np.datetime64("2010-01-01"),
... np.datetime64("2010-01-01")
... ])
>>> s.describe()
count 3
mean 2006-09-01 08:00:00
min 2000-01-01 00:00:00
25% 2004-12-31 12:00:00
50% 2010-01-01 00:00:00
75% 2010-01-01 00:00:00
max 2010-01-01 00:00:00
dtype: object
Suggested fix for documentation
Assuming this behavior is intended: remove mention of the first
and last
columns, and of timestamps as object data.
For object data (such as strings), the result’s index will include
count
,unique
,top
, andfreq
. Thetop
is the most common value. Thefreq
is the most common value’s frequency.