BUG: Issues with DatetimeTZ values in where and combine_first (#21469 + #21546) #21660

Liam3851 · 2018-06-27T20:20:46Z

closes Series (but not DataFrame) combine_first() loses timezone information #21469
closes Where method does not properly handle values with datetimes with TZ #21546
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

CC: @jreback. Replacement for PR #21544; fixes for #21469 (Series.combine_first fails on tz values input) and #21546 (where fails on tz values input). Series.combine_first now delegates to where for the bulk of its operations.

…andas-dev#21546.

Liam3851 · 2018-06-27T22:49:51Z

One note: in the old code, combining first a datetime-valued Series with a string-valued Series would return a datetime-valued Series:

ser = pd.Series(pd.date_range('20110101', '20110102'))
mask = np.array([True, False])
ser.where(mask).combine_first(pd.Series(['20120101', '20120202']))
# result: 
# 0   2011-01-01
# 1   2012-02-02
# dtype: datetime64[ns]

There was a unit test to this effect, so I kept the behavior.

However, combining a datetime-valued Series with an int-valued Series did not promote:

ser = pd.Series(pd.date_range('20110101', '20110102'))  
mask = np.array([True, False])                          
ser.where(mask).combine_first(pd.Series([1, 2]))        
# result:
# 0    2011-01-01 00:00:00
# 1                      2
# dtype: object

There was no unit test to this effect, however, and it seems inconsistent with the string-promoting behavior above.

A side effect of my current implementation is that now combining first a datetime-valued Series with an int-valued Series now generates a datetime index:

ser = pd.Series(pd.date_range('20110101', '20110102'))  
mask = np.array([True, False])                          
ser.where(mask).combine_first(pd.Series([1, 2]))        
# result: 
# 0   2011-01-01 00:00:00.000000000
# 1   1970-01-01 00:00:00.000000002
# dtype: datetime64[ns]

Arguably I think combine_first shouldn't be in the business of type promotion at all in either the string or integer case-- I'd just leave it as object.

Thoughts? Should we:

promote on both strings and ints
promote on strings but not on ints
leave combining datetimes with either strings or ints as object dtype?

codecov · 2018-06-27T23:48:57Z

Codecov Report

❗ No coverage uploaded for pull request base (master@823478c). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #21660   +/-   ##
=========================================
  Coverage          ?    91.9%           
=========================================
  Files             ?      154           
  Lines             ?    49651           
  Branches          ?        0           
=========================================
  Hits              ?    45633           
  Misses            ?     4018           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.28% <100%> (?)`
#single	`41.9% <20%> (?)`

Impacted Files	Coverage Δ
pandas/core/internals.py	`95.59% <100%> (ø)`
pandas/core/series.py	`94.2% <100%> (ø)`
pandas/core/common.py	`92.19% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 823478c...c97b951. Read the comment docs.

jreback

can you add a whatnew note in 0.24.0

jreback · 2018-06-29T00:29:26Z

pandas/tests/frame/test_indexing.py

@@ -2936,6 +2936,20 @@ def test_where_callable(self):
        tm.assert_frame_equal(result,
                              (df + 2).where((df + 2) > 8, (df + 2) + 10))

+    def test_where_tz_values(self):


can you use the tz_naive_fixture here

jreback · 2018-06-29T00:30:21Z

pandas/tests/series/indexing/test_boolean.py

@@ -551,6 +551,19 @@ def test_where_datetime_conversion():
    assert_series_equal(rs, expected)


+def test_where_dt_tz_values():


can you use the tz_naive_fixture here

jreback · 2018-06-29T00:30:33Z

pandas/tests/series/test_combine_concat.py

@@ -170,6 +170,19 @@ def get_result_type(dtype, dtype2):
                                    ]).dtype
                assert result.kind == expected

+    def test_combine_first_dt_tz_values(self):


can you use the tz_naive_fixture here

jreback · 2018-06-29T00:30:50Z

pandas/tests/indexing/test_coercion.py

@@ -580,12 +580,11 @@ def test_where_series_datetime64(self, fill_val, exp_dtype):
        values = pd.Series(pd.date_range(fill_val, periods=4))
        if fill_val.tz:
            exp = pd.Series([pd.Timestamp('2011-01-01'),
-                             pd.Timestamp('2012-01-02 05:00'),


what changed here?

Previous behavior was was treating other for DatetimeTZ as ints (see the writeup for #21546). As a result was getting incorrectly merged with naive datetime Series as naive UTC timestamps. Fixing the behavior of tz-aware other also fixed this and so now the result of combine_first is a promotion of the whole Series to object (as was desired per the deleted xfail/TODO).

jreback · 2018-06-29T00:31:41Z

pandas/tests/frame/test_indexing.py

+        mask = DataFrame(True, index=df1.index, columns=df2.columns)
+        mask.iloc[3:] = False
+        result = df1.where(mask, df2)
+        dts3 = DatetimeIndex(['20150101', '20150102', '20150103',


ideally just contruct what you can w/o using temporaries.

ideally call this expected

jreback · 2018-06-29T00:31:54Z

pandas/tests/series/indexing/test_boolean.py

+    dts2 = pd.date_range('20150103', '20150107', tz='America/New_York')
+    df2 = pd.DataFrame({'date': dts2})
+    result = df1.date.where(df1.date < df1.date[3], df2.date)
+    exp_vals = pd.DatetimeIndex(['20150101', '20150102', '20150103',


same comment as above

jreback · 2018-06-29T00:31:59Z

pandas/tests/series/test_combine_concat.py

+        dts2 = pd.date_range('20160514', '20160518', tz='America/New_York')
+        df2 = pd.DataFrame({'date': dts2}, index=range(3, 8))
+        result = df1.date.combine_first(df2.date)
+        exp_vals = pd.DatetimeIndex(['20150101', '20150102', '20150103',


jreback · 2018-06-29T00:32:13Z

pandas/core/series.py

@@ -2303,10 +2305,12 @@ def combine_first(self, other):
        new_index = self.index.union(other.index)
        this = self.reindex(new_index, copy=False)
        other = other.reindex(new_index, copy=False)
+        if is_datetimelike(this) and not is_datetimelike(other):
+            other = to_datetime(other)
+
        # TODO: do we need name?


is this comment still relevant?

Comment no longer relevant, I'll remove.

What do you think of datetime coercion of other? As I noted my (too long) comment above, using to_datetime coerces ints as well as strings. Old code used the Series constructor which only coerced strings but not ints. Alternatively, we could remove the datetimelike coercion entirely.

jreback · 2018-06-29T00:32:55Z

pandas/core/internals.py

        cond = getattr(cond, 'values', cond)

        # If the default broadcasting would go in the wrong direction, then
        # explicitly reshape other instead
        if getattr(other, 'ndim', 0) >= 1:
            if values.ndim - 1 == other.ndim and axis == 1:
                other = other.reshape(tuple(other.shape + (1, )))
+            elif transpose and values.ndim == self.ndim - 1:


we really need this?

Without lines 1487-1488 cond and values have opposite dimension, which is the root cause of (the DataFrame part of) #21544.

The issue is DatetimeTZBlock._try_coerce_args returns values as a 2-D array. I've updated _try_coerce_args so that other also is a 2-D array for consistency, which fixes part of the issue.

However, we don't run _try_coerce_args until the interior of func at 1499. Thus lines 1476-1477 (values = values.T) is a no-op given a DatetimeTZBlock (since at that point values is 1-D). other._values is also a 1-D DatetimeIndex. So to make cond consistent with the values and other we get out of _try_coerce_args it needs to be transposed.

An alternative would be changing `DatetimeTZBlock._try_coerce_args to return 1-D arrays for values and values_mask (changing 2875-2877). But if I do that it breaks setitem.

jreback · 2018-06-29T00:33:03Z

pandas/core/common.py

@@ -410,19 +410,6 @@ def _apply_if_callable(maybe_callable, obj, **kwargs):
    return maybe_callable


-def _where_compat(mask, arr1, arr2):


jreback · 2018-07-02T23:54:36Z

thanks @Liam3851

nice patch! keep em coming!

…-dev#21469 + pandas-dev#21546) (pandas-dev#21660)

Liam3851 added 4 commits June 27, 2018 15:09

Minimal modifications that pass new tests and fix pandas-dev#21469 + p…

ac2f836

…andas-dev#21546.

Remove unused import.

770a22e

Remove explicit DatetimeTZBlock check.

04a0538

Update test_coercion; xfailing case now fixed.

6e6c619

gfyoung added Bug Indexing Related to indexing on series/frames, not to indexes themselves Timedelta Timedelta data type MultiIndex Timezones Timezone data dtype and removed Timedelta Timedelta data type labels Jun 27, 2018

gfyoung requested a review from jreback June 27, 2018 23:07

Remove orphaned private _where_compat method.

8c022d9

Liam3851 added 2 commits June 27, 2018 21:09

Remove newly-unused import

96a59f6

Merge remote-tracking branch 'upstream/master' into where_tz

deda82c

jreback requested changes Jun 29, 2018

View reviewed changes

Liam3851 and others added 7 commits June 29, 2018 09:51

Remove no-longer-relevant comment and no-op code line.

71a4fd4

Clean up unit tests.

7c14643

Fix unneeded imports.

3d072c5

Add whatsnew.

f812e78

Merge remote-tracking branch 'upstream/master' into where_tz

f177ec5

Merge branch 'master' into PR_TOOL_MERGE_PR_21660

09fe18f

doc

c97b951

jreback added this to the 0.24.0 milestone Jul 2, 2018

jreback approved these changes Jul 2, 2018

View reviewed changes

jreback merged commit 8b2070a into pandas-dev:master Jul 2, 2018

mroeschke mentioned this pull request Jul 28, 2018

BUG: Series.cumin/cummax fails with datetime64[ns, tz] dtype #15553

Closed

4 tasks

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

BUG: Issues with DatetimeTZ values in where and combine_first (pandas…

a537353

…-dev#21469 + pandas-dev#21546) (pandas-dev#21660)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Issues with DatetimeTZ values in where and combine_first (#21469 + #21546) #21660

BUG: Issues with DatetimeTZ values in where and combine_first (#21469 + #21546) #21660

Liam3851 commented Jun 27, 2018

Liam3851 commented Jun 27, 2018 •

edited

Loading

codecov bot commented Jun 27, 2018 •

edited

Loading

jreback left a comment

jreback Jun 29, 2018

jreback Jun 29, 2018

jreback Jun 29, 2018

jreback Jun 29, 2018

Liam3851 Jun 29, 2018

jreback Jun 29, 2018

jreback Jun 29, 2018

jreback Jun 29, 2018

jreback Jun 29, 2018

Liam3851 Jun 29, 2018

jreback Jun 29, 2018

Liam3851 Jun 29, 2018

jreback Jun 29, 2018

jreback commented Jul 2, 2018

		@@ -551,6 +551,19 @@ def test_where_datetime_conversion():
		assert_series_equal(rs, expected)


		def test_where_dt_tz_values():

		@@ -410,19 +410,6 @@ def _apply_if_callable(maybe_callable, obj, **kwargs):
		return maybe_callable


		def _where_compat(mask, arr1, arr2):

BUG: Issues with DatetimeTZ values in where and combine_first (#21469 + #21546) #21660

BUG: Issues with DatetimeTZ values in where and combine_first (#21469 + #21546) #21660

Conversation

Liam3851 commented Jun 27, 2018

Liam3851 commented Jun 27, 2018 • edited Loading

codecov bot commented Jun 27, 2018 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jul 2, 2018

Liam3851 commented Jun 27, 2018 •

edited

Loading

codecov bot commented Jun 27, 2018 •

edited

Loading