Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import requests
# URL for data from University of California Irvine
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00492/Metro_Interstate_Traffic_Volume.csv.gz'
file = 'Metro_Interstate_Traffic_Volume.csv.gz'
content = requests.get(url).content
with open(file, 'wb') as f:
f.write(content)
df = pd.read_csv(file, compression='gzip')
holiday temp rain_1h snow_1h clouds_all weather_main weather_description date_time traffic_volume
0 NaN 288.28 0.0 0.0 40 Clouds scattered clouds 2012-10-02 09:00:00 5545
1 NaN 289.36 0.0 0.0 75 Clouds broken clouds 2012-10-02 10:00:00 4516
2 NaN 289.58 0.0 0.0 90 Clouds overcast clouds 2012-10-02 11:00:00 4767
3 NaN 290.13 0.0 0.0 90 Clouds overcast clouds 2012-10-02 12:00:00 5026
4 NaN 291.14 0.0 0.0 75 Clouds broken clouds 2012-10-02 13:00:00 4918
temp rain_1h snow_1h clouds_all traffic_volume
count 48204.000000 48204.000000 48204.000000 48204.000000 48204.000000
mean 281.205870 0.334264 0.000222 49.362231 3259.818355
std 13.338232 44.789133 0.008168 39.015750 1986.860670
min 0.000000 0.000000 0.000000 0.000000 0.000000
25% 272.160000 0.000000 0.000000 1.000000 1193.000000
50% 282.450000 0.000000 0.000000 64.000000 3380.000000
75% 291.806000 0.000000 0.000000 90.000000 4933.000000
max 310.070000 9831.300000 0.510000 100.000000 7280.000000
Plotting
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
# pandas.DataFrame.plot / pandas.DataFrame.plot.hist
df.plot(kind='hist', column='traffic_volume', legend=False, ax=ax1)
ax1.bar_label(ax1.containers[0])
ax1.set_title('pandas.DataFrame.plot does not\nseem to correctly determine df.traffic_volume.max()')
# pandas.Series.plot / pandas.Series.plot.hist
df.traffic_volume.plot(kind='hist', ax=ax2)
_ = ax2.bar_label(ax2.containers[0])
ax2.set_title('pandas.Series.plot produces the expected result')
Issue Description
The plot produced by pandas.DataFrame.plot
does not match the plot produced by pandas.Series.plot
. kind='hist'
for both.
Also see Different results when plotting histogram using DataFrame.plot.hist
and Series.plot.hist
df[['traffic_volume', 'clouds_all']].plot(kind='hist', column='traffic_volume')
produces the expected plot
df[['traffic_volume']].plot(kind='hist', column='traffic_volume')
produces the expected plot
df[['traffic_volume', 'rain_1h']].plot(kind='hist', column='traffic_volume')
produces the incorrect plot.
It seems def _calculate_bins
applies to the entire numeric portion of the DataFrame, even if column='traffic_volume'
is specified.
Expected Behavior
The expected behavior is the plot produced by pandas.Series.plot
or matplotlib.axes.Axes.hist
pandas.DataFrame.hist
and pandas.Series.hist
also both produce the expected plot.
fig, ax = plt.subplots()
ax.hist(df.traffic_volume, bins=10)
_ = ax.bar_label(ax.containers[0])
I expect def _calculate_bins
to calculate only on the specified columns, if column=
is used. The column
parameter here states If passed, will be used to limit data
to a subset of columns.
Installed Versions
INSTALLED VERSIONS
commit : 478d340
python : 3.11.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22621
machine : AMD64
processor : Intel64 Family 6 Model 167 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 2.0.0
numpy : 1.24.3
pytz : 2022.7
dateutil : 2.8.2
setuptools : 66.0.0
pip : 23.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.12.0
pandas_datareader: 0.10.0
bs4 : 4.12.2
bottleneck : None
brotli :
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None