Skip to content

BUG: groupby.cummin/max changing passed values to nan uncesessarily #15109

Closed
@jreback

Description

@jreback

after #15053 is failing. (puzzled why its not actually failing on master though).

This is local in osx (3.5), so same as the travis test (though that used numpy 1.10.4), its possible that is the cause.

(pandas) bash-3.2$ nosetests pandas/tests/groupby/test_groupby.py  -s -m cummin_cummax
F
======================================================================
FAIL: test_cummin_cummax (test_groupby.TestGroupBy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jreback/pandas/pandas/tests/groupby/test_groupby.py", line 5798, in test_cummin_cummax
    tm.assert_frame_equal(result, expected)
  File "/Users/jreback/pandas/pandas/util/testing.py", line 1313, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "/Users/jreback/pandas/pandas/util/testing.py", line 1154, in assert_series_equal
    assert_attr_equal('dtype', left, right)
  File "/Users/jreback/pandas/pandas/util/testing.py", line 878, in assert_attr_equal
    left_attr, right_attr)
  File "/Users/jreback/pandas/pandas/util/testing.py", line 1018, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: Attributes are different

Attribute "dtype" are different
[left]:  float64
[right]: int64

----------------------------------------------------------------------
Ran 1 test in 0.046s

FAILED (failures=1)

because the inf values are == iNaT, and we convert these on exit from the function as we are using these to mark NaT in all int64, not just for datetimelike.

jreback@2af5806 fixes, but is kludgey. I pre-check for iNaT an if so, convert to float.

A better soln is to pass in the is_datetimelike flag so that the check could be done internally in cython. IOW, if is_datetimelike==True, then go ahead and treat iNaT as null, otherwise ignore it. Ideally we would actually be able to return a correctly dtyped array in the first place, but the templating code won't allow this, e.g. we have a 1-1 mapping between in-out and they are done a-priori.

part of this fix could also remove the need for passing in accum as its wholly internal to the group by cython functions (as part of this could remove the fused typed, numeric and replace with templated code).

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions