Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
Code Sample, a copy-pastable example
import pandas as pd
# Example-1:
s = pd.Series([1, 2], index=pd.period_range('2012-01-01',
freq='A',
periods=2))
print(s)
# 2012 1
# 2013 2
# Freq: A-DEC, dtype: int64
print(s.resample('Q', kind='period', convention='start').count())
# 2012Q1 1.0
# 2012Q2 NaN
# 2012Q3 NaN
# 2012Q4 NaN
# 2013Q1 2.0
# 2013Q2 NaN
# 2013Q3 NaN
# 2013Q4 NaN
# Example-2: the result is all NaN, but the data is actually grouped
idx = pd.period_range(start='1/10/2000', periods=2, freq='D')
series = pd.Series(range(2,4), index=idx)
print(series.resample('12H', convention='end').count())
# 2000-01-10 12:00 NaN
# 2000-01-11 00:00 NaN
# 2000-01-11 12:00 NaN
print(series.resample('12H', convention='start').count())
# 2000-01-10 00:00 2.0
# 2000-01-10 12:00 NaN
# 2000-01-11 00:00 3.0
# 2000-01-11 12:00 NaN
# Freq: 12H, dtype: float64
for idx, ss in rs:
if ss.empty == False: print(ss)
# 2000-01-10 2
# Freq: D, dtype: int64
# 2000-01-11 3
# Freq: D, dtype: int64
Problem description
Try to up-resample a time series with PeriodIndex
as the index of the series, and then count the number of records in each time-group. However, the output is not the "count", but it seems to be the only record's value in the group.
In the second example with convention='end'
, the result is all NaN
. If switches to 'convention='start'', then the result is similar to that of the Example-1. In addition, from the grouped details, we can see that the data has been correctly grouped, while the aggregate functions (e.g., count
) behave unexpected.
Expected Output
The number of samples for each group (call the count()
method).
As a comparison of Example-1, I make another example, which tries to up-sample a time series with Timestamp
as index. It behaves as expected.
s = pd.Series([1, 2], index=pd.date_range('2012-01-01',
freq='A',
periods=2))
display(s)
# 2012-12-31 1
# 2013-12-31 2
# Freq: A-DEC, dtype: int64
display(s.resample('Q', convention='start').count())
# 2012-12-31 1
# 2013-03-31 0
# 2013-06-30 0
# 2013-09-30 0
# 2013-12-31 1
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : c7f7443c1bad8262358114d5e88cd9c8a308e8aa
python : 3.7.10.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.3.1
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.1
setuptools : 49.6.0.post20210108
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.25.0
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : 1.4.22
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None