Skip to content

API / BUG: How do we differentiate between -9223372036854775808 and iNaT? #16674

Open
@gfyoung

Description

@gfyoung

From #3707 (at 028188):

from datetime import datetime
from pandas import DataFrame

import numpy as np

max_int = np.iinfo(np.int64).max
min_int = np.iinfo(np.int64).min

df = DataFrame([max_int, min_int], index=[datetime(2013, 1, 1), datetime(2013, 1, 1)])
assert df.resample("M").apply(np.sum)[0][0] == -1
...
AssertionError

The assertion error occurs because during the aggregation, pandas checks in cython_operation in core/groupby.py via _is_cython_func from core/base.py whether there are any "missing" integer values (assuming the data is integer) before and after the aggregation, which are defined as iNaT = -9223372036854775808. If there are any such values, we automatically cast the data to float.

This logic is quite prevalent in the codebase, but it does seem quite fraught with pitfalls. For example, what if the output of a computation got the value -9223372036854775808 ? Also, what if the user intended to use -9223372036854775808 as a legitimate data point?

Unlikely, sure. But reasonable, absolutely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds TestsUnit test(s) needed to prevent regressionsResampleresample method

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions