Description
Code Sample, a copy-pastable example if possible
Minimal example:
import pandas as pd
df = pd.DataFrame({"foo": [pd.Timestamp("2020", tz="UTC")]}, dtype="object")
df.apply(lambda col: col.copy()) # raises exception below
Real-life usage:
def filter_dataframe_by_dict(df, filters):
"""
Filter the specified dataframe to only those rows which match the specified filters
Parameters
----------
df : pd.DataFrame
filters : Mapping
dict, keyed by a subset of `df.columns`
Returns
-------
pd.DataFrame
Same columns as `df`, including only those rows which match `filters` on all specified values.
"""
filters = pd.Series(filters, dtype="object")
mask = df[filters.index].apply(
# astype("object") calls `copy()` internally, and is necessary to ensure dtype-agnostic
# comparisons.
lambda row: row.astype("object").equals(filters), axis="columns"
)
return df[mask]
records = pd.DataFrame(columns = ["foo", "bar", "baz"])
records.loc[0] = {"foo": pd.Timestamp("2019", tz="UTC"), "bar": 1, "baz": 6.283}
records.loc[1] = {"foo": pd.Timestamp("2020", tz="UTC"), "bar": 2, "baz": 6.283}
filters = {"foo": pd.Timestamp("2020", tz="UTC")}
filter_dataframe_by_dict(records, filters) # raises below exception
Exception:
Traceback (most recent call last):
File "pandas_bug.py", line 29, in <module>
df.apply(lambda col: col.copy())
File ".venv/lib/python3.6/site-packages/pandas/core/frame.py", line 6875, in apply
return op.get_result()
File ".venv/lib/python3.6/site-packages/pandas/core/apply.py", line 186, in get_result
return self.apply_standard()
File ".venv/lib/python3.6/site-packages/pandas/core/apply.py", line 296, in apply_standard
values, self.f, axis=self.axis, dummy=dummy, labels=labels
File "pandas/_libs/reduction.pyx", line 617, in pandas._libs.reduction.compute_reduction
File "pandas/_libs/reduction.pyx", line 127, in pandas._libs.reduction.Reducer.get_result
File "pandas_bug.py", line 29, in <lambda>
df.apply(lambda row: row.copy(), axis="columns")
File ".venv/lib/python3.6/site-packages/pandas/core/generic.py", line 5810, in copy
data = self._data.copy(deep=deep)
File ".venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 794, in copy
res = self.apply("copy", deep=deep)
File ".venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 442, in apply
applied = getattr(b, f)(**kwargs)
File ".venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 696, in copy
return self.make_block_same_class(values, ndim=self.ndim)
File ".venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 281, in make_block_same_class
return make_block(values, placement=placement, ndim=ndim, klass=type(self))
File ".venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 3028, in make_block
return klass(values, ndim=ndim, placement=placement)
File ".venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 1723, in __init__
values = self._maybe_coerce_values(values)
File ".venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 2306, in _maybe_coerce_values
raise ValueError("cannot create a DatetimeTZBlock without a tz")
ValueError: cannot create a DatetimeTZBlock without a tz
Problem description
Essentially, the problem arises when .copy()
is called from with an .apply
call, on a row/column of a DataFrame which:
- has dtype
object
- consists solely of tz-aware
Timestamp
objects
While this seems like a fairly artificial set of conditions, as the "real world example" above is intended to demonstrate, it can indeed occur "organically". Appending rows to an initially empty DataFrame
results in the dtype defaulting to object
for all columns, so if the filters
passed to filter_dataframe_by_dict
happen to only consist of timestamp-valued columns, the error conditions are met.
Note that the error only seems to arise in .apply
calls; making the same calls on the rows return using iterrrows()
or iloc
works just fine.
This problem only occurs as of the latest pandas 1.0.0 release.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-74-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.0.0
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 41.6.0
Cython : None
pytest : 5.3.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.0
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.3.11
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None