Skip to content

Regr/period range large value/issue 36430 #36535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ Fixed regressions
- Fixed regression in :meth:`read_excel` with ``engine="odf"`` caused ``UnboundLocalError`` in some cases where cells had nested child nodes (:issue:`36122`,:issue:`35802`)
- Fixed regression in :class:`DataFrame` and :class:`Series` comparisons between numeric arrays and strings (:issue:`35700`,:issue:`36377`)
- Fixed regression when setting empty :class:`DataFrame` column to a :class:`Series` in preserving name of index in frame (:issue:`36527`)
- Fixed regression in :class:`Period` incorrect value for ordinal over the maximum timestamp (:issue:`36430`)

.. ---------------------------------------------------------------------------

Expand Down
3 changes: 2 additions & 1 deletion pandas/_libs/tslibs/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -861,6 +861,7 @@ cdef int64_t get_time_nanos(int freq, int64_t unix_date, int64_t ordinal) nogil:
"""
cdef:
int64_t sub, factor
int64_t nanos_in_day = 24 * 3600 * 10**9

freq = get_freq_group(freq)

Expand All @@ -886,7 +887,7 @@ cdef int64_t get_time_nanos(int freq, int64_t unix_date, int64_t ordinal) nogil:
# We must have freq == FR_HR
factor = 10**9 * 3600

sub = ordinal - unix_date * 24 * 3600 * 10**9 / factor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the trouble is that there is an overflow going on somewhere in this expression?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I am fairly certain that 24 * 3600 * 10**9 will overflow - these are likely interpreted by the compiler to just be of type int, but that multiplication could very well exceed the limits of an int type. Adding the ULL suffix I think would be ideal

More details on how decimal literals are assigned types here:
https://stackoverflow.com/a/41407498/621736

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using uint64_t instead of int64_t will work for the given example, but will then fail for date range earlier than the epoch with an integer overflow, so we must stick with signed integer here.
About how this change work, it just change the order of the operation so that unix_date is not multiplied by 24 * 3600 * 10**9, but by 24 * 3600 * 10**9 / factor, which is smaller and does not result into an integer overflow (except for value in the really far futur for the use case described in the issue, after the year 2*10**15)
So the real fix to do here is maybe just to add parenthesis in the right place, see new commit shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [64]: 24 * 3600 * 10**9                                                                                                                                
Out[64]: 86400000000000

In [65]: np.iinfo(np.int64).max                                                                                                                           
Out[65]: 9223372036854775807

this is a fairly standard number, i agree if multiplied by a large number this could overflow, but ok here.

sub = ordinal - unix_date * (nanos_in_day / factor)
return sub * factor


Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/scalar/period/test_period.py
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,13 @@ def test_period_cons_combined(self):
with pytest.raises(ValueError, match=msg):
Period("2011-01", freq="1D1W")

@pytest.mark.parametrize("hour", range(24))
def test_period_large_ordinal(self, hour):
# Issue #36430
# Integer overflow for Period over the maximum timestamp
p = pd.Period(ordinal=2562048 + hour, freq="1H")
assert p.hour == hour


class TestPeriodMethods:
def test_round_trip(self):
Expand Down