API/REGR: (re-)allow neg/pos unary operation on object dtype #21590

jorisvandenbossche · 2018-06-22T07:54:40Z

This is an easy fix to simply be more forgiveable and just try the operation:

For object dtype, this is IMO the better way: simply let the scalar values decide if it can do those operations or not.
The scalar values (eg strings) already raise an informative error message. Eg for a series with string you get TypeError: bad operand type for unary -: 'str' for unary negative operation.
But with two caveats to discuss:
- For datetime64 data, the error message is not as user friendly as our custom one: "TypeError: ufunc 'negative' did not contain a loop with signature matching types dtype('<M8[ns]') dtype('<M8[ns]')"
- For the unary positive operation, numpy has apparently different rules: it does allow datetime64 data and returns itself (the same for object string data). Because of this, the current tests fail (it expects TypeError for +s_string)

Details with examples

In [14]: s_string = pd.Series(['a', 'b'])

In [15]: -s_string
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-91308cf04ad2> in <module>()
----> 1 -s_string

~/scipy/pandas/pandas/core/generic.py in __neg__(self)
   1119             arr = operator.inv(values)
   1120         else:
-> 1121             arr = operator.neg(values)
   1122         return self.__array_wrap__(arr)
   1123 

TypeError: bad operand type for unary -: 'str'

In [16]: +s_string
Out[16]: 
0    a
1    b
dtype: object

In [17]: s_datetime = pd.Series(pd.date_range('2012', periods=3))

In [18]: -s_datetime
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-b460860ba74c> in <module>()
----> 1 -s_datetime

~/scipy/pandas/pandas/core/generic.py in __neg__(self)
   1119             arr = operator.inv(values)
   1120         else:
-> 1121             arr = operator.neg(values)
   1122         return self.__array_wrap__(arr)
   1123 

TypeError: ufunc 'negative' did not contain a loop with signature matching types dtype('<M8[ns]') dtype('<M8[ns]')

In [19]: +s_datetime
Out[19]: 
0   2012-01-01
1   2012-01-02
2   2012-01-03
dtype: datetime64[ns]

In [20]: 

In [20]: from decimal import Decimal
    ...: s_decimal = pd.Series([Decimal(1)])

In [21]: -s_decimal
Out[21]: 
0    -1
dtype: object

In [22]: +s_decimal
Out[22]: 
0    1
dtype: object

jorisvandenbossche · 2018-06-22T07:56:59Z

cc @deniederhut

For the unary positive, if we care I can check for datetime64 and raise a TypeError for that (although we can also follow numpy's example). For the object dtype (eg strings), that's not really possible since it is exactly the idea to allow it for those types in an object column that allow it.

jreback · 2018-06-22T23:06:05Z

code change looks good (though you simply may need to add is_object_type to the check as we are strict on it).

jbrockmendel · 2018-06-23T04:11:04Z

+1. Thoughts on applying the same pass-through approach to Index.__sub__?

jorisvandenbossche · 2018-06-25T07:52:52Z

though you simply may need to add is_object_type to the check as we are strict on it

Do you mean to only specifically call operator.neg(values) for object dtype (as we do for numeric), so it still fails for datetime64?

jreback · 2018-06-25T22:40:21Z

Do you mean to only specifically call operator.neg(values) for object dtype (as we do for numeric), so it still fails for datetime64?

I meant more just to extend the if from the original code, like

        elif (is_numeric_dtype(values) or is_timedelta64_dtype(values) or is_object_type(values)):
            arr = operator.neg(values)
       else:
            raise TypeError

jreback · 2018-06-27T10:18:26Z

@jorisvandenbossche I pushed a fix to make this work

jorisvandenbossche · 2018-06-27T10:22:37Z

yeah, sorry for the delay, getting to it today

I am fine with the is_object_dtype check (to explicitly not allow the datetime64), but personally I don't think we should infer anything if it is object. The error message is already fine in case of __neg__, and I don't really care about +s_string working (it is consistent with numpy). So I would rather do what you posted above in #21590 (comment)

jreback · 2018-06-27T10:31:39Z

I am fine with the is_object_dtype check (to explicitly not allow the datetime64), but personally I don't think we should infer anything if it is object. The error message is already fine in case of neg, and I don't really care about +s_string working (it is consistent with numpy). So I would rather do what you posted above in #21590 (comment)

I tried that and it failed your tests :> So maybe they are too strict. I didn't dive too deep here.

jreback · 2018-06-27T10:33:33Z

in particular this (on master)

In [9]: df = pd.DataFrame({'A': ['a', 'b']})

In [10]: + df
Out[10]: 
   A
0  a
1  b

In [11]: - df
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-42e20adaf419> in <module>()
----> 1 - df

~/pandas/pandas/core/generic.py in __neg__(self)
   1125               not is_string_like_dtype(np.array(values))):#lib.infer_dtype(values) not in ['string', 'bytes', 'unicode']):
   1126             # explicity allow object dtypes that are not strings, gh-21380
-> 1127             arr = operator.neg(values)
   1128         else:
   1129             raise TypeError("Unary negative expects numeric dtype, not {}"

TypeError: bad operand type for unary -: 'str'

jorisvandenbossche · 2018-06-27T11:36:11Z

Yes, not inferring strings means that we need to update the test (but only for __pos__), because +s_strings would now work while it fails on master.
I didn't do that initially because I first wanted to get an idea if you were OK with allowing something like +s_strings.

codecov · 2018-06-27T12:06:12Z

Codecov Report

❗ No coverage uploaded for pull request base (master@c45bb0b). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #21590   +/-   ##
=========================================
  Coverage          ?    91.9%           
=========================================
  Files             ?      154           
  Lines             ?    49555           
  Branches          ?        0           
=========================================
  Hits              ?    45542           
  Misses            ?     4013           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.27% <100%> (?)`
#single	`42.03% <0%> (?)`

Impacted Files	Coverage Δ
pandas/core/generic.py	`96.21% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c45bb0b...453b9bd. Read the comment docs.

jorisvandenbossche · 2018-06-27T21:29:49Z

Thoughts on applying the same pass-through approach to Index.sub?

Yes, I think that is a good idea, but let's leave that for a separate PR / 0.24.0

jreback · 2018-06-28T00:24:11Z

seems numpy allows this which I find odd.

In [1]: arr = np.array(['foo'])

In [2]: +arr
Out[2]: array(['foo'], dtype='<U3')

In [3]: arr = np.array(['foo'], dtype=object)

In [4]: +arr                                 
Out[4]: array(['foo'], dtype=object)

ok with matching numpy (for pos only).

jorisvandenbossche · 2018-06-28T12:49:29Z

I think the policy of numpy regarding + is simply to return the original object regardless whether it would otherwise support the notion of negative/positive or addition.

ok with matching numpy (for pos only).

Yes (for object dtype now, for datetime we still raise), and for neg our behaviour was already consistent with numpy before.

jreback · 2018-06-29T00:38:43Z

thank you sir!

…dev#21590) (cherry picked from commit 8cb6be0)

(cherry picked from commit 8cb6be0)

…dev#21590)

API/REGR: (re-)allow neg/pos unary operation on object dtype

37f7826

jorisvandenbossche added Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version labels Jun 22, 2018

jorisvandenbossche added this to the 0.23.2 milestone Jun 22, 2018

jreback added 2 commits June 27, 2018 05:59

Merge branch 'master' into PR_TOOL_MERGE_PR_21590

92ba5c6

allow non-string dtypes

5068ebb

simplify object dtype check

8a81dee

jorisvandenbossche and others added 2 commits June 28, 2018 14:51

fix unused import

1dcd4cf

Merge branch 'master' into PR_TOOL_MERGE_PR_21590

453b9bd

jreback merged commit 8cb6be0 into pandas-dev:master Jun 29, 2018

jorisvandenbossche deleted the unary-op-object branch June 29, 2018 08:36

jorisvandenbossche added Needs Backport and removed Needs Backport labels Jun 29, 2018

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Jul 2, 2018

API/REGR: (re-)allow neg/pos unary operation on object dtype (pandas-…

ee93f61

…dev#21590) (cherry picked from commit 8cb6be0)

jorisvandenbossche added a commit that referenced this pull request Jul 5, 2018

API/REGR: (re-)allow neg/pos unary operation on object dtype (#21590)

2c00914

(cherry picked from commit 8cb6be0)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

API/REGR: (re-)allow neg/pos unary operation on object dtype (pandas-…

2bee20f

…dev#21590)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API/REGR: (re-)allow neg/pos unary operation on object dtype #21590

API/REGR: (re-)allow neg/pos unary operation on object dtype #21590

jorisvandenbossche commented Jun 22, 2018

jorisvandenbossche commented Jun 22, 2018

jreback commented Jun 22, 2018

jbrockmendel commented Jun 23, 2018

jorisvandenbossche commented Jun 25, 2018

jreback commented Jun 25, 2018

jreback commented Jun 27, 2018

jorisvandenbossche commented Jun 27, 2018

jreback commented Jun 27, 2018

jreback commented Jun 27, 2018

jorisvandenbossche commented Jun 27, 2018

codecov bot commented Jun 27, 2018 •

edited

Loading

jorisvandenbossche commented Jun 27, 2018

jreback commented Jun 28, 2018

jorisvandenbossche commented Jun 28, 2018

jreback commented Jun 29, 2018

API/REGR: (re-)allow neg/pos unary operation on object dtype #21590

API/REGR: (re-)allow neg/pos unary operation on object dtype #21590

Conversation

jorisvandenbossche commented Jun 22, 2018

jorisvandenbossche commented Jun 22, 2018

jreback commented Jun 22, 2018

jbrockmendel commented Jun 23, 2018

jorisvandenbossche commented Jun 25, 2018

jreback commented Jun 25, 2018

jreback commented Jun 27, 2018

jorisvandenbossche commented Jun 27, 2018

jreback commented Jun 27, 2018

jreback commented Jun 27, 2018

jorisvandenbossche commented Jun 27, 2018

codecov bot commented Jun 27, 2018 • edited Loading

Codecov Report

jorisvandenbossche commented Jun 27, 2018

jreback commented Jun 28, 2018

jorisvandenbossche commented Jun 28, 2018

jreback commented Jun 29, 2018

codecov bot commented Jun 27, 2018 •

edited

Loading