Description
To summarize: should raise when loss of precision when casting from ints to floats for computations (everywhere!)
The wrong result is returned in the following case because of unnecessary casting of ints to floats.
max_int = np.iinfo(np.int64).max
min_int = np.iinfo(np.int64).min
assert max_int + min_int == -1
assert DataFrame([max_int,min_int], index=[datetime(2013,1,1),datetime(2013,1,1)]).resample('M', how=np.sum)[0][0] == -1
This is the offending code in groupby.py
that does the casting:
class NDFrameGroupBy(GroupBy):
def _cython_agg_blocks(self, how, numeric_only=True):
.....
if is_numeric:
values = com.ensure_float(values)
and even worse, sneakily casts the floats back to integers hiding the problem!
# see if we can cast the block back to the original dtype
result = block._try_cast_result(result)
It should be possible to perform this calculation without any casting. For example, the sum()
of a DataFrame returns the correct result:
assert DataFrame([max_int, min_int]).sum()[0] == -1
I am working with financial data that needs to stay as int to protect its precision. In my opinion, any operation that casts values to floats do a calculation, but then returns results as ints is very wrong. At the very least, the results should be returned as floats to show that a cast has been done. Even better would be to perform the calculations with ints whenever possible.