Skip to content

ENH: Allowing plotting for non-numeric and non-date datatypes in DataFrame.hist #53595

Open
@Rylie-W

Description

@Rylie-W

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

When using matplotlib as the backend, pandas.DataFrame.hist() only supports plotting for numeric and datetype data,

def hist_frame(
data,
column=None,
by=None,
grid: bool = True,
xlabelsize: int | None = None,
xrot=None,
ylabelsize: int | None = None,
yrot=None,
ax=None,
sharex: bool = False,
sharey: bool = False,
figsize: tuple[float, float] | None = None,
layout=None,
bins: int = 10,
legend: bool = False,
**kwds,
):
if legend and "label" in kwds:
raise ValueError("Cannot use both legend and label")
if by is not None:
axes = _grouped_hist(
data,
column=column,
by=by,
ax=ax,
grid=grid,
figsize=figsize,
sharex=sharex,
sharey=sharey,
layout=layout,
bins=bins,
xlabelsize=xlabelsize,
xrot=xrot,
ylabelsize=ylabelsize,
yrot=yrot,
legend=legend,
**kwds,
)
return axes
if column is not None:
if not isinstance(column, (list, np.ndarray, ABCIndex)):
column = [column]
data = data[column]
# GH32590
data = data.select_dtypes(
include=(np.number, "datetime64", "datetimetz"), exclude="timedelta"
)

whereas matplotlib.axes.Axes.hist() and pandas.Series.hist() support a wider range of data types. We can consider allowing users to specify the numeric_only parameter to determine whether to plot only numeric columns or all columns in the chart.

For reproducing:

import pandas as pd
import numpy as np

df = pd.DataFrame(dict(a=np.random.normal(size=100), b=np.random.normal(size=100)+100, c=np.random.choice(['A', 'B'], size=100), d=np.random.choice(['C', 'D'], size=100)))

df['d'].hist() # <AxesSubplot: title={'center': 'b'}>
df.hist() # array([[<AxesSubplot: title={'center': 'a'}>, <AxesSubplot: title={'center': 'b'}>]], dtype=object)

Besides, the title for the histogram of df['d'].hist() is wrong and it could be fixed in my PR for issue #53281
I have completed a draft implementation, and if necessary, I can create a pull request.

Feature Description

Allowing plotting for a wider range of datatypes in DataFrame.hist

Alternative Solutions

Haven't found one.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions