Skip to content

BUG: DataFrame.rolling(axis=1) operations drop/ignore float16 and float32 columns #41779

Closed
@benchittle

Description

@benchittle
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
import numpy as np

# Generate a 4x6 DataFrame
df = pd.DataFrame(np.arange(24).reshape(4, 6), columns=list("abcdef"))
# Make each column a different data type
df = df.astype({"a":"float16", "b":"float32", "c":"float64", "d":"int8", "e":"int16", "f":"int32"})

print(df)
# Output:
#      a     b     c   d   e   f
# 0   0.0   1.0   2.0   3   4   5
# 1   6.0   7.0   8.0   9  10  11
# 2  12.0  13.0  14.0  15  16  17
# 3  18.0  19.0  20.0  21  22  23
print(df.dtypes)
# Output:
# a    float16
# b    float32
# c    float64
# d       int8
# e      int16
# f      int32
# dtype: object

# Rolling minimum across rows
print(df.rolling(window=2, min_periods=1, axis=1).min())
# Output. Notice how the float16 and float32 columns were removed:
#       c     d     e     f
# 0   2.0   2.0   3.0   4.0
# 1   8.0   8.0   9.0  10.0
# 2  14.0  14.0  15.0  16.0
# 3  20.0  20.0  21.0  22.0

Problem description

It seems that rolling operations along rows (axis=1) incorrectly omit columns containing float16s and float32s. The same operations work as expected along columns (axis=0), however.

Expected Output

# Convert float16 and float32 columns to float64s as a workaround
df = df.astype({"a":"float64", "b":"float64"})
# Rolling minimum across rows again
print(df.rolling(window=2, min_periods=1, axis=1).min())
# Output:
#       a     b     c     d     e     f
# 0   0.0   0.0   1.0   2.0   3.0   4.0
# 1   6.0   6.0   7.0   8.0   9.0  10.0
# 2  12.0  12.0  13.0  14.0  15.0  16.0
# 3  18.0  18.0  19.0  20.0  21.0  22.0

Possible Cause

A change made in #36458, specifically this line.
It seems that "float" is an alias specifically for np.float64, not np.float32 or np.float16. Changing that line to
obj = obj.select_dtypes(include="number", exclude=["timedelta"])
to include all numeric values seemed to fix the issue in this case. I can open a PR if there don't seem to be any issues with this.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.9.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : AMD64 Family 23 Model 8 Stepping 2, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_Canada.1252

pandas : 1.2.4
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.2
setuptools : 57.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.24.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : 0.18.2
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

Labels

BugWindowrolling, ewma, expanding

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions