Skip to content

BUG: Period and period_range behaviour is inconsistent. #47622

Open
@jaheba

Description

@jaheba

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pandas._libs.tslibs import to_offset

# 1
# Freq argument is ignored when using different multiple
hourly = to_offset("H")
p = pd.Period("2020-01-01", freq="24H")
assert pd.Period(p, hourly).freq == to_offset("24H")


# 2
# asfreq shifts value, even when using same frequency
p = pd.Period("2020-01-01", freq="24H")
assert p != p.asfreq(p.freq)

# also, consider this example

dr = pd.date_range("2020", freq="2d", periods=3)

s1 = dr.to_series().asfreq(dr.freq).to_period()
s2 = dr.to_series().to_period().asfreq(dr.freq)

# one would expect s1 and s2 to be the same, but of course not!
>>> s1
2020-01-01   2020-01-01
2020-01-03   2020-01-03
2020-01-05   2020-01-05
Freq: 2D, dtype: datetime64[ns]

>>> s2
2020-01-02   2020-01-01
2020-01-04   2020-01-03
2020-01-06   2020-01-05
Freq: 2D, dtype: datetime64[ns]


# 3
# When providing two periods in period_range, only start of end is taken into consideration
pr = pd.period_range(pd.Period("2020-01-01 00:00", "6H"), pd.Period("2020-01-01 18:00", "6H"), freq="H")
pr[0] == "2020-01-01 0:00"
pr[-1] == "2020-01-01 18:00" # why not 23:00?
len(pr) == 19

# which of course is inconsistent with
pr = pd.period_range(pd.Period("2020Q1", "Q"), pd.Period("2020Q2", "Q"), freq="M")
pr[0] == "2020-03" # why not 2020-01?
pr[-1] == "2020-06"

# which then again behaves differently from
dr = pd.date_range(pd.Timestamp("2020Q1", "Q"), pd.Timestamp("2020Q2", "Q"), freq="M")
dr[0] == "2020-01-31"
dr[-1] == "'2020-03-31'

Issue Description

The behaviour of Period and period_range is just very surprising and inconsistent.

Is is inconsistent in itself but also when comparing period_range with date_range.

See also: #47465

Expected Behavior

I naively would expect that a Period represents a time-range. There is a start where the period begins and an end where it ends:

p = pd.Period("2020-01-01", "2d")

Here p represents everything on the first two days of 2020.

If I use period_range, I would expect it to take the entire range of start and end into account:

start = p
end = p + 1
pr = pd.period_range(start, end, freq="2D")
assert pr[-1].end_time == end.end_time

So far so good. Let's try a different frequency:

pr2 = pd.period_range(start, end, freq="D")
assert pr2[-1].end_time == end.end_time # Fails

How naive of me! Of course the second argument is neither inclusive nor exclusive when generating the range, but a happy mix of both:

pr2[-1] == end.asfreq("D", "S") # note how neither using Period nor .asfreq("D") would work

The new range includes everything from start.start_time until pd.Period(end.start_time, "D").end_time just one would expect.

The rules are clear now.

So let's just try a different example.

start = pd.Period("2020Q1", "Q")
end = pd.Period("2020Q2", "Q")

pr = pd.period_range(start, end, freq="M")

We know: The start of pr should be start.start_time and end should be pd.Period(end.start_time, "D").end_time:

pr3[0] == "2020-03"
pr3[-1] == "2020-06"

🤯

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions