Description
This is a follow up to #13514 (safe sort of mixed-int arrays).
After merging this commit, test_constructor_unsortable
in test_categorical.py breaks.
According to the code there, numpy.sort
should sort a mixed int-datetime array in python2 and numpy >= 1.10. But it doesn't.
In [3]: arr = np.array([1, 2, datetime.now(), 0, 3], dtype='O')
In [4]: np.sort(arr)
/home/users/piotr/workspace/pandas-pijucha/pandas_dev_python2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
a.sort(axis, kind, order)
Out[4]: array([1, 2, datetime.datetime(2016, 7, 19, 9, 49, 28, 214675), 0, 3], dtype=object)
In [6]: np.__version__
Out[6]: '1.11.0'
Ipython probably interferes here because in pure python2.7 I'm getting
TypeError: can't compare datetime.datetime to int
In the old code in factorize, there was a list comprehension similar to this:
ordered = [np.sort(np.array([e for e in arr if f(e)], dtype=object))
for f in [lambda x: True, lambda x: False]]
I haven't caught it precisely but it looks as if it sometimes swallowed an exception. (New code in safe_sort
is simpler - sorts each of the two arrays separately, but still with np.sort.)
It looks to me that Categorical.from_array(arr, ordered=True)
should always raise now. And maybe test_constructor_unsortable from test_categorical.py needs to be rewritten.
Weird numpy behaviour
I tested numpy behaviour for several versions between 1.7 and 1.11 in python 2.7, both in a script and interactive python (not ipython).
Script
Running the following script:
from datetime import datetime
import numpy as np
import sys
print sys.version
print np.__version__
arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object)
arr2 = np.sort(arr)
print arr2
gives for numpy < 1.10:
2.7.11 (default, Mar 30 2016, 15:33:06)
[GCC 5.3.0 20151204 (release)]
1.9.1
Traceback (most recent call last):
File "/home/users/piotr/workspace/pandas-tests/unsortable2.py", line 9, in <module>
arr2 = np.sort(arr)
File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int
and for numpy >= 1.10:
2.7.11 (default, Mar 30 2016, 15:33:06)
[GCC 5.3.0 20151204 (release)]
1.11.0
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
a.sort(axis, kind, order)
Traceback (most recent call last):
File "/home/users/piotr/workspace/pandas-tests/unsortable2.py", line 10, in <module>
print arr2
File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/numeric.py", line 1869, in array_str
return array2string(a, max_line_width, precision, suppress_small, ' ', "", str)
File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/arrayprint.py", line 442, in array2string
elif reduce(product, a.shape) == 0:
TypeError: can't compare datetime.datetime to int
The exception is raised in the line following arr2 = np.sort(arr)
.
When I remove print arr2
from the script, I'm getting "exception ignored":
2.7.11 (default, Mar 30 2016, 15:33:06)
[GCC 5.3.0 20151204 (release)]
1.11.0
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
a.sort(axis, kind, order)
Exception TypeError: "can't compare datetime.datetime to int" in <module 'threading' from '/usr/lib64/python2.7/threading.pyc'> ignored
Interactive mode
In the interactive mode (not Ipython):
For numpy < 1.10:
>>> sys.version
'2.7.11 (default, Mar 30 2016, 15:33:06) \n[GCC 5.3.0 20151204 (release)]'
>>> np.__version__
'1.9.1'
>>> arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object)
>>> np.sort(arr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int
>>> arr2 = np.sort(arr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int
>>> order = [np.sort(np.array([e for e in arr if f(e)], dtype=object)) for f in [lambda x: True, lambda x: False]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 791, in sort
a.sort(axis, kind, order)
TypeError: can't compare datetime.datetime to int
For numpy >= 1.10:
>>> sys.version
'2.7.11 (default, Mar 30 2016, 15:33:06) \n[GCC 5.3.0 20151204 (release)]
>>> np.__version__
'1.11.0'
>>> arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object)
>>> np.sort(arr)
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:825: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
a.sort(axis, kind, order)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/numeric.py", line 1807, in array_repr
', ', "array(")
File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/arrayprint.py", line 442, in array2string
elif reduce(product, a.shape) == 0:
TypeError: can't compare datetime.datetime to int
>>> arr2 = np.sort(arr)
>>> arr2
TypeError: can't compare datetime.datetime to int
>>> print(arr2)
[1 2 datetime.datetime(2016, 7, 20, 0, 24, 6, 50903) 0 3]
>>> arr2
array([1, 2, datetime.datetime(2016, 7, 20, 0, 24, 6, 50903), 0, 3], dtype=object)
>>> order = [np.sort(np.array([e for e in arr if f(e)], dtype=object)) for f in [lambda x: True, lambda x: False]]
>>> order
[array([1, 2, datetime.datetime(2016, 7, 20, 0, 24, 6, 50903), 0, 3], dtype=object), array([], dtype=object)]
(I pasted literally from a console, line by line, adding only empty lines for clarity. Calls to arr2
puzzle me.)
A behaviour for the above list comprehension may depend on whether np.sort
raises or not on the second array in the list (here, empty).
safe_sort
Calling safe_sort
on arr
always raises. (But I don't really know why.)
>>> np.__version__
'1.11.0'
>>> sys.version
'2.7.11 (default, Mar 30 2016, 15:33:06) \n[GCC 5.3.0 20151204 (release)]'
>>> pd.__version__
u'0.18.1+221.g8acfad3'
>>> pd.core.algorithms.safe_sort(arr)
/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/pandas-0.18.1+221.g8acfad3-py2.7-linux-x86_64.egg/pandas/core/algorithms.py:223: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
sorter = values.argsort()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/pandas-0.18.1+221.g8acfad3-py2.7-linux-x86_64.egg/pandas/core/algorithms.py", line 227, in safe_sort
ordered = sort_mixed(values)
File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/pandas-0.18.1+221.g8acfad3-py2.7-linux-x86_64.egg/pandas/core/algorithms.py", line 214, in sort_mixed
strs = np.sort(values[str_pos])
File "/home/users/piotr/workspace/numpy/numpy_dev_py2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 824, in sort
a = asanyarray(a).copy(order="K")
TypeError: can't compare datetime.datetime to int
I didn't test much in ipython, but it also (at least sometimes) swallows an exception and returns a partially sorted arr2
, but prints a warning (as above).