Skip to content

TST/BUG in test_categorical.py: test_constructor_unsortable breaks after recent commit #13714

Closed
@pijucha

Description

@pijucha

This is a follow up to #13514 (safe sort of mixed-int arrays).
After merging this commit, test_constructor_unsortable in test_categorical.py breaks.

According to the code there, numpy.sort should sort a mixed int-datetime array in python2 and numpy >= 1.10. But it doesn't.

In [3]: arr = np.array([1, 2, datetime.now(), 0, 3], dtype='O')

In [4]: np.sort(arr)
/home/users/piotr/workspace/pandas-pijucha/pandas_dev_python2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  a.sort(axis, kind, order)
Out[4]: array([1, 2, datetime.datetime(2016, 7, 19, 9, 49, 28, 214675), 0, 3], dtype=object)

In [6]: np.__version__
Out[6]: '1.11.0'

Ipython probably interferes here because in pure python2.7 I'm getting

TypeError: can't compare datetime.datetime to int

In the old code in factorize, there was a list comprehension similar to this:

ordered = [np.sort(np.array([e for e in arr if f(e)], dtype=object))
           for f in [lambda x: True, lambda x: False]]

I haven't caught it precisely but it looks as if it sometimes swallowed an exception. (New code in safe_sort is simpler - sorts each of the two arrays separately, but still with np.sort.)

It looks to me that Categorical.from_array(arr, ordered=True) should always raise now. And maybe test_constructor_unsortable from test_categorical.py needs to be rewritten.


Weird numpy behaviour

I tested numpy behaviour for several versions between 1.7 and 1.11 in python 2.7, both in a script and interactive python (not ipython).

Script

Running the following script:

from datetime import datetime
import numpy as np
import sys

print sys.version
print np.__version__

arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object)
arr2 = np.sort(arr)
print arr2

gives for numpy < 1.10:

2.7.11 (default, Mar 30 2016, 15:33:06) 
[GCC 5.3.0 20151204 (release)]
1.9.1
Traceback (most recent call last):
  File "/home/users/piotr/workspace/pandas-tests/unsortable2.py", line 9, in <module>
    arr2 = np.sort(arr)
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
    a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int

and for numpy >= 1.10:

2.7.11 (default, Mar 30 2016, 15:33:06) 
[GCC 5.3.0 20151204 (release)]
1.11.0
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  a.sort(axis, kind, order)
Traceback (most recent call last):
  File "/home/users/piotr/workspace/pandas-tests/unsortable2.py", line 10, in <module>
    print arr2
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/numeric.py", line 1869, in array_str
    return array2string(a, max_line_width, precision, suppress_small, ' ', "", str)
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/arrayprint.py", line 442, in array2string
    elif reduce(product, a.shape) == 0:
TypeError: can't compare datetime.datetime to int

The exception is raised in the line following arr2 = np.sort(arr).

When I remove print arr2 from the script, I'm getting "exception ignored":

2.7.11 (default, Mar 30 2016, 15:33:06) 
[GCC 5.3.0 20151204 (release)]
1.11.0
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  a.sort(axis, kind, order)
Exception TypeError: "can't compare datetime.datetime to int" in <module 'threading' from '/usr/lib64/python2.7/threading.pyc'> ignored

Interactive mode

In the interactive mode (not Ipython):

For numpy < 1.10:

>>> sys.version
'2.7.11 (default, Mar 30 2016, 15:33:06) \n[GCC 5.3.0 20151204 (release)]'
>>> np.__version__
'1.9.1'

>>> arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object)

>>> np.sort(arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
    a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int

>>> arr2 = np.sort(arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
    a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int

>>> order = [np.sort(np.array([e for e in arr if f(e)], dtype=object)) for f in [lambda x: True, lambda x: False]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
    a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int

For numpy >= 1.10:

>>> sys.version
'2.7.11 (default, Mar 30 2016, 15:33:06) \n[GCC 5.3.0 20151204 (release)]
>>> np.__version__
'1.11.0'

>>> arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object)

>>> np.sort(arr)
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  a.sort(axis, kind, order)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/numeric.py", line 1807, in array_repr
    ', ', "array(")
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/arrayprint.py", line 442, in array2string
    elif reduce(product, a.shape) == 0:
TypeError: can't compare datetime.datetime to int

>>> arr2 = np.sort(arr)
>>> arr2
TypeError: can't compare datetime.datetime to int
>>> print(arr2)
[1 2 datetime.datetime(2016, 7, 20, 0, 24, 6, 50903) 0 3]
>>> arr2
array([1, 2, datetime.datetime(2016, 7, 20, 0, 24, 6, 50903), 0, 3], dtype=object)

>>> order = [np.sort(np.array([e for e in arr if f(e)], dtype=object)) for f in [lambda x: True, lambda x: False]]
>>> order
[array([1, 2, datetime.datetime(2016, 7, 20, 0, 24, 6, 50903), 0, 3], dtype=object), array([], dtype=object)]

(I pasted literally from a console, line by line, adding only empty lines for clarity. Calls to arr2 puzzle me.)

A behaviour for the above list comprehension may depend on whether np.sort raises or not on the second array in the list (here, empty).

safe_sort

Calling safe_sort on arr always raises. (But I don't really know why.)

>>> np.__version__
'1.11.0'
>>> sys.version
'2.7.11 (default, Mar 30 2016, 15:33:06) \n[GCC 5.3.0 20151204 (release)]'
>>> pd.__version__
u'0.18.1+221.g8acfad3'

>>> pd.core.algorithms.safe_sort(arr)
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/pandas-0.18.1+221.g8acfad3-py2.7-linux-x86_64.egg/pandas/core/algorithms.py:223: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  sorter = values.argsort()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/pandas-0.18.1+221.g8acfad3-py2.7-linux-x86_64.egg/pandas/core/algorithms.py", line 227, in safe_sort
    ordered = sort_mixed(values)
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/pandas-0.18.1+221.g8acfad3-py2.7-linux-x86_64.egg/pandas/core/algorithms.py", line 214, in sort_mixed
    strs = np.sort(values[str_pos])
  File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 824, in sort
    a = asanyarray(a).copy(order="K")
TypeError: can't compare datetime.datetime to int

I didn't test much in ipython, but it also (at least sometimes) swallows an exception and returns a partially sorted arr2, but prints a warning (as above).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Compatpandas objects compatability with Numpy or Python functions

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions