Skip to content

DOC: Clarify df.describe() behavior with Timestamp columns #56918

Closed
@sfc-gh-joshi

Description

@sfc-gh-joshi

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html

Documentation problem

The Notes section for describe states the following (emphasis mine):

For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.

Since pandas 2.0 began treating Timestamps as numeric data, as far as I can tell, calling describe on a Series/DF with Timestamp data no longer yields the first or last rows. In fact, the example included in the documentation also has this behavior:

>>> s = pd.Series([
...     np.datetime64("2000-01-01"),
...     np.datetime64("2010-01-01"),
...     np.datetime64("2010-01-01")
... ])
>>> s.describe()
count                      3
mean     2006-09-01 08:00:00
min      2000-01-01 00:00:00
25%      2004-12-31 12:00:00
50%      2010-01-01 00:00:00
75%      2010-01-01 00:00:00
max      2010-01-01 00:00:00
dtype: object

Suggested fix for documentation

Assuming this behavior is intended: remove mention of the first and last columns, and of timestamps as object data.

For object data (such as strings), the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions