Skip to content

Incorrect result when resampling DatetimeIndex due to hidden cast of ints to floats #3707

Closed
@gerdemb

Description

@gerdemb

To summarize: should raise when loss of precision when casting from ints to floats for computations (everywhere!)

The wrong result is returned in the following case because of unnecessary casting of ints to floats.

max_int = np.iinfo(np.int64).max
min_int = np.iinfo(np.int64).min
assert max_int + min_int == -1
assert DataFrame([max_int,min_int], index=[datetime(2013,1,1),datetime(2013,1,1)]).resample('M', how=np.sum)[0][0] == -1

This is the offending code in groupby.py that does the casting:

class NDFrameGroupBy(GroupBy):
    def _cython_agg_blocks(self, how, numeric_only=True):
        .....
        if is_numeric:
            values = com.ensure_float(values)

and even worse, sneakily casts the floats back to integers hiding the problem!

        # see if we can cast the block back to the original dtype
        result = block._try_cast_result(result)

It should be possible to perform this calculation without any casting. For example, the sum() of a DataFrame returns the correct result:

assert DataFrame([max_int, min_int]).sum()[0] == -1

I am working with financial data that needs to stay as int to protect its precision. In my opinion, any operation that casts values to floats do a calculation, but then returns results as ints is very wrong. At the very least, the results should be returned as floats to show that a cast has been done. Even better would be to perform the calculations with ints whenever possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions