Description
-
[X ] I have checked that this issue has not already been reported.
-
[ X] I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Two different dates, one within the range of what pd.Timestamp
can handle, the other outside of that range:
import pandas as pd
import datetime
df = pd.DataFrame({'A': ['X', 'Y'], 'B': [datetime.datetime(2005, 1, 1, 10, 30, 23, 540000),
datetime.datetime(3005, 1, 1, 10, 30, 23, 540000)]})
print(df.groupby('A').B.max())
Problem description
pd.Timestamp
can't deal with a too big date like the year 3005, so to represent such a date I need to use the datetime.datetime
type. Before 1.1.1 (1.1.0?) this hasn't been an issue, but now this code throws an assertion error:
Traceback (most recent call last):
File "<ipython-input-38-8b8ec5e4e179>", line 5, in <module>
print(df.groupby('A').B.max())
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\groupby.py", line 1558, in max
numeric_only=numeric_only, min_count=min_count, alias="max", npfunc=np.max
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\groupby.py", line 1015, in _agg_general
result = self.aggregate(lambda x: npfunc(x, axis=self.axis))
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\generic.py", line 261, in aggregate
func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\groupby.py", line 1083, in _python_agg_general
result, counts = self.grouper.agg_series(obj, f)
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\ops.py", line 644, in agg_series
return self._aggregate_series_fast(obj, func)
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\ops.py", line 669, in _aggregate_series_fast
result, counts = grouper.get_result()
File "pandas\_libs\reduction.pyx", line 256, in pandas._libs.reduction.SeriesGrouper.get_result
File "pandas\_libs\reduction.pyx", line 74, in pandas._libs.reduction._BaseGrouper._apply_to_group
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\groupby.py", line 1060, in <lambda>
f = lambda x: func(x, *args, **kwargs)
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\groupby.py", line 1015, in <lambda>
result = self.aggregate(lambda x: npfunc(x, axis=self.axis))
File "<__array_function__ internals>", line 6, in amax
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\numpy\core\fromnumeric.py", line 2706, in amax
keepdims=keepdims, initial=initial, where=where)
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\numpy\core\fromnumeric.py", line 85, in _wrapreduction
return reduction(axis=axis, out=out, **passkwargs)
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\generic.py", line 11460, in stat_func
func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\series.py", line 4220, in _reduce
delegate = self._values
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\series.py", line 572, in _values
return self._mgr.internal_values()
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\internals\managers.py", line 1615, in internal_values
return self._block.internal_values()
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\internals\blocks.py", line 2019, in internal_values
return self.array_values()
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\internals\blocks.py", line 2022, in array_values
return self._holder._simple_new(self.values)
File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\arrays\datetimes.py", line 290, in _simple_new
assert values.dtype == "i8"
AssertionError
From testing with mixing pd.Timestamp
and datetime.datetime
types I presume pandas is converting applicable dates (first line in the example) to pd.Timestamp
while leaving the others as datetime.datetime
leading to a mixed-type result column and the assertion error.
Expected Output
Since I'm explicitely operating with datatype datetime.datetime
there should be no implicit conversion to pd.Timestamp
if it's not assured that all values are within the range that pd.Timestamp
allows.
Output of pd.show_versions()
commit : f2ca0a2
python : 3.7.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None
pandas : 1.1.1
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 50.0.0.post20200830
Cython : 0.29.21
pytest : None
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.3
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : 0.8.0
fastparquet : 0.4.1
gcsfs : None
matplotlib : 3.3.1
numexpr : None
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : None
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.51.1