Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
def make_df(size):
index = pd.period_range(freq="A", start="1/1/2001", periods=size)
return pd.DataFrame({"x": np.arange(0, size)}, index=index)
df1 = make_df(2)
df2 = make_df(8)
res1 = df1.x.to_timestamp()
res2 = df2.x.to_timestamp()
print(f"{df1.x.index=}")
print(f"before: {df1.x.index.freq=}")
print(f"after: {res1.index.freq=}")
print()
print(f"{df2.x.index=}")
print(f"before: {df2.x.index.freq=}")
print(f"after: {res2.index.freq=}")
Issue Description
I'm calling to_timestamp
on Series. Initially, Series has a PeriodIndex
, with freq='A'
. However, the Series returned from to_timestamp
doesn't preserve the same freq
. The behavior depends on Series size. If it's less than 3 records, the resulting Series has freq=None
. If it's 3 and over, the resulting Series has freq=YearBegin
.
The snippet above creates 2 dataframes, one with 2 records, and one with 8 records, and converts the series to_timestamp
. It outputs index freq
before and after conversion:
df1.x.index=PeriodIndex(['2001', '2002'], dtype='period[A-DEC]')
before: df1.x.index.freq=<YearEnd: month=12>
after: res1.index.freq=None
df2.x.index=PeriodIndex(['2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008'], dtype='period[A-DEC]')
before: df2.x.index.freq=<YearEnd: month=12>
after: res2.index.freq=<YearBegin: month=1>
The frequency information is lost, but in a different way.
Possibly related (not the same):
- BUG: DatetimeIndex.to_period().to_timestamp() forgets freq value #38885
- BUG: Converting period index to datetime index fails when the desired frequency is at the start of a period. #48318
Expected Behavior
Since the PeriodIndex
already has freq
information, it should be preserved when converting to DatetimeIndex
.
Installed Versions
pandas : 2.0.0.dev0+1448.gfcb8b809e9
numpy : 1.23.5
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 66.1.1
pip : 23.0
Cython : None
pytest : 7.2.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.9.0
pandas_datareader: None
bs4 : 4.11.2
bottleneck : None
brotli :
fastparquet : 2023.1.0
fsspec : 2023.1.0
gcsfs : None
matplotlib : None
numba : 0.56.4
numexpr : 2.8.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2023.1.0
scipy : 1.10.0
snappy :
sqlalchemy : 1.4.46
tables : 3.7.0
tabulate : None
xarray : 2023.1.0
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None