Skip to content

BUG: Fix concat series loss of timezone #24027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Dec 5, 2018
Merged
Changes from 14 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
d80afa0
BUG: Fix concat series loss of timezone
jakezimmer Nov 30, 2018
f4b751d
Merge branch 'master' of https://github.com/pandas-dev/pandas
evangelinehl Dec 1, 2018
159c4e6
Fixed naming error for is_datetimetz since this function is no longer…
evangelinehl Dec 1, 2018
2450097
Attempted to use _concat_compat to rectify the timezone bug
jakezimmer Dec 1, 2018
9cb20c4
Merge remote-tracking branch 'origin/master'
jakezimmer Dec 1, 2018
7f9dd52
Attempt to fix tz error with concat compat instead of union
jakezimmer Dec 1, 2018
6cb2022
changing behavior to be based on tz
jakezimmer Dec 1, 2018
a4da449
Attempting to fix differing dimensions bug
evangelinehl Dec 2, 2018
fe83e6d
Another attempt to fix dimensions bug
evangelinehl Dec 2, 2018
2cbb533
Just trying to test different versions here
evangelinehl Dec 2, 2018
f527dcc
Trying to fix dimensions bug now that Travis CI is passing but others…
evangelinehl Dec 2, 2018
583ce49
tests failed so changing it back to when travis ci succeeded
evangelinehl Dec 2, 2018
01a2c10
Changing it back because we're trying to figure out if concat_compat …
evangelinehl Dec 2, 2018
683dccf
Reverting back to version when all tests passed
evangelinehl Dec 3, 2018
857c6be
Restored blank lines
evangelinehl Dec 3, 2018
64da4c0
Added test case for the new tz output
evangelinehl Dec 3, 2018
9e699e4
Fixed style issues
evangelinehl Dec 3, 2018
64182c5
Fixed the whitespace issue in linting
jakezimmer Dec 3, 2018
b630d58
Merge branch 'master' into PR_TOOL_MERGE_PR_24027
jreback Dec 4, 2018
c7dcdb4
fix up
jreback Dec 4, 2018
165689e
updated whatsnew (v0.24.0) to reflect changes
jakezimmer Dec 4, 2018
43b2e2a
Merge branch 'master' into master
jakezimmer Dec 4, 2018
634c736
no changes since @jreback's fix up commit
jakezimmer Dec 5, 2018
0b86ef9
Update v0.24.0.rst
jakezimmer Dec 5, 2018
1867b3a
removed trailing whitespace
jakezimmer Dec 5, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 3 additions & 33 deletions pandas/core/dtypes/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ def get_dtype_kinds(l):
Parameters
----------
l : list of arrays

Returns
-------
a set of kinds that exist in this list of arrays
Expand Down Expand Up @@ -107,12 +106,10 @@ def _concat_compat(to_concat, axis=0):
'normalized' dtypes (in that for example, if it's object, then it is a
non-datetimelike and provide a combined dtype for the resulting array that
preserves the overall dtype if possible)

Parameters
----------
to_concat : array of arrays
axis : axis to provide concatenation

Returns
-------
a single array, preserving the combined dtypes
Expand Down Expand Up @@ -177,14 +174,12 @@ def is_nonempty(x):
def _concat_categorical(to_concat, axis=0):
"""Concatenate an object/categorical array of arrays, each of which is a
single dtype

Parameters
----------
to_concat : array of arrays
axis : int
Axis to provide concatenation in the current implementation this is
always 0, e.g. we only have 1D categoricals

Returns
-------
Categorical
Expand All @@ -193,7 +188,8 @@ def _concat_categorical(to_concat, axis=0):

def _concat_asobject(to_concat):
to_concat = [x.get_values() if is_categorical_dtype(x.dtype)
else np.asarray(x).ravel() for x in to_concat]
else np.asarray(x).ravel() if not is_datetime64tz_dtype(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the way
_concat_conpat already handles this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t realize we do call _concat_compat from here... I think we’ve already lost the tz by the time we call it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do these if checks not even need to be there? Not sure what this means for the change we had here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback after closer examination, I'm not sure if _concat_compat actually already handles this. If the type is also a category, it simply calls _concat_categorical which then summons _concat_asobject. I think that therefore the change to the code should happen inside _concat_categorical or _concat_asobject.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback calling _concat_compat before the list is modified like this causes _concat_compat to just call _concat_categorical again and back and forth until they reach the maximum recursion depth. We have to modify the array somehow before that happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not sure why we aren't doing

to_concat = [np.asarray(x.astype(object)) for x in to_concat]

The function name is _concat_asobject, so let's convert to object and concat them. Can you see if that works?

Please write the test first though, and ensure that the test fails with master. Then implement the fix and make sure it works.

Copy link
Contributor Author

@evangelinehl evangelinehl Dec 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I tried this previously but the build fails (I think that was the dimensions bug we had before) @TomAugspurger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we have some inconsistencies in concat that's breaking this approach. IIRC there's an outstanding PR fixing some of these up.

I was going by the loose rule of "different dtypes means the result type is object". But things like

In [1]: import pandas as pd; import numpy as np; import pickle

In [2]: a = pd.Series([1, 2], dtype='category')

In [3]: b = pd.Series([1, None], dtype='category')

In [4]: pd.concat([a, b], ignore_index=True)
Out[4]:
0    1.0
1    2.0
2    1.0
3    NaN
dtype: float64

mess that up. Not sure what's best here...

else np.asarray(x.astype(object)) for x in to_concat]
res = _concat_compat(to_concat)
if axis == 1:
return res.reshape(1, len(res))
Expand Down Expand Up @@ -221,9 +217,7 @@ def union_categoricals(to_union, sort_categories=False, ignore_order=False):
"""
Combine list-like of Categorical-like, unioning categories. All
categories must have the same dtype.

.. versionadded:: 0.19.0

Parameters
----------
to_union : list-like of Categorical, CategoricalIndex,
Expand All @@ -234,13 +228,10 @@ def union_categoricals(to_union, sort_categories=False, ignore_order=False):
ignore_order : boolean, default False
If true, the ordered attribute of the Categoricals will be ignored.
Results in an unordered categorical.

.. versionadded:: 0.20.0

Returns
-------
result : Categorical

Raises
------
TypeError
Expand All @@ -250,69 +241,52 @@ def union_categoricals(to_union, sort_categories=False, ignore_order=False):
- sort_categories=True and Categoricals are ordered
ValueError
Empty list of categoricals passed

Notes
-----

To learn more about categories, see `link
<http://pandas.pydata.org/pandas-docs/stable/categorical.html#unioning>`__

Examples
--------

>>> from pandas.api.types import union_categoricals

If you want to combine categoricals that do not necessarily have
the same categories, `union_categoricals` will combine a list-like
of categoricals. The new categories will be the union of the
categories being combined.

>>> a = pd.Categorical(["b", "c"])
>>> b = pd.Categorical(["a", "b"])
>>> union_categoricals([a, b])
[b, c, a, b]
Categories (3, object): [b, c, a]

By default, the resulting categories will be ordered as they appear
in the `categories` of the data. If you want the categories to be
lexsorted, use `sort_categories=True` argument.

>>> union_categoricals([a, b], sort_categories=True)
[b, c, a, b]
Categories (3, object): [a, b, c]

`union_categoricals` also works with the case of combining two
categoricals of the same categories and order information (e.g. what
you could also `append` for).

>>> a = pd.Categorical(["a", "b"], ordered=True)
>>> b = pd.Categorical(["a", "b", "a"], ordered=True)
>>> union_categoricals([a, b])
[a, b, a, b, a]
Categories (2, object): [a < b]

Raises `TypeError` because the categories are ordered and not identical.

>>> a = pd.Categorical(["a", "b"], ordered=True)
>>> b = pd.Categorical(["a", "b", "c"], ordered=True)
>>> union_categoricals([a, b])
TypeError: to union ordered Categoricals, all categories must be the same

New in version 0.20.0

Ordered categoricals with different categories or orderings can be
combined by using the `ignore_ordered=True` argument.

>>> a = pd.Categorical(["a", "b", "c"], ordered=True)
>>> b = pd.Categorical(["c", "b", "a"], ordered=True)
>>> union_categoricals([a, b], ignore_order=True)
[a, b, c, c, b, a]
Categories (3, object): [a, b, c]

`union_categoricals` also works with a `CategoricalIndex`, or `Series`
containing categorical data, but note that the resulting array will
always be a plain `Categorical`

>>> a = pd.Series(["b", "c"], dtype='category')
>>> b = pd.Series(["a", "b"], dtype='category')
>>> union_categoricals([a, b])
Expand Down Expand Up @@ -403,13 +377,11 @@ def _concat_datetime(to_concat, axis=0, typs=None):
"""
provide concatenation of an datetimelike array of arrays each of which is a
single M8[ns], datetimet64[ns, tz] or m8[ns] dtype

Parameters
----------
to_concat : array of arrays
axis : axis to provide concatenation
typs : set of to_concat dtypes

Returns
-------
a single array, preserving the combined dtypes
Expand Down Expand Up @@ -509,13 +481,11 @@ def _concat_sparse(to_concat, axis=0, typs=None):
"""
provide concatenation of an sparse/dense array of arrays each of which is a
single dtype

Parameters
----------
to_concat : array of arrays
axis : axis to provide concatenation
typs : set of to_concat dtypes

Returns
-------
a single array, preserving the combined dtypes
Expand Down Expand Up @@ -579,4 +549,4 @@ def _concat_rangeindex_same_dtype(indexes):

# Here all "indexes" had 0 length, i.e. were empty.
# In this case return an empty range index.
return RangeIndex(0, 0)
return RangeIndex(0, 0)