-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix concat series loss of timezone #24027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
d80afa0
f4b751d
159c4e6
2450097
9cb20c4
7f9dd52
6cb2022
a4da449
fe83e6d
2cbb533
f527dcc
583ce49
01a2c10
683dccf
857c6be
64da4c0
9e699e4
64182c5
b630d58
c7dcdb4
165689e
43b2e2a
634c736
0b86ef9
1867b3a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,6 @@ def get_dtype_kinds(l): | |
Parameters | ||
---------- | ||
l : list of arrays | ||
|
||
Returns | ||
------- | ||
a set of kinds that exist in this list of arrays | ||
|
@@ -107,12 +106,10 @@ def _concat_compat(to_concat, axis=0): | |
'normalized' dtypes (in that for example, if it's object, then it is a | ||
non-datetimelike and provide a combined dtype for the resulting array that | ||
preserves the overall dtype if possible) | ||
|
||
Parameters | ||
---------- | ||
to_concat : array of arrays | ||
axis : axis to provide concatenation | ||
|
||
Returns | ||
------- | ||
a single array, preserving the combined dtypes | ||
|
@@ -177,14 +174,12 @@ def is_nonempty(x): | |
def _concat_categorical(to_concat, axis=0): | ||
"""Concatenate an object/categorical array of arrays, each of which is a | ||
single dtype | ||
|
||
Parameters | ||
---------- | ||
to_concat : array of arrays | ||
axis : int | ||
Axis to provide concatenation in the current implementation this is | ||
always 0, e.g. we only have 1D categoricals | ||
|
||
Returns | ||
------- | ||
Categorical | ||
|
@@ -193,7 +188,8 @@ def _concat_categorical(to_concat, axis=0): | |
|
||
def _concat_asobject(to_concat): | ||
to_concat = [x.get_values() if is_categorical_dtype(x.dtype) | ||
else np.asarray(x).ravel() for x in to_concat] | ||
else np.asarray(x).ravel() if not is_datetime64tz_dtype(x) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not the way There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn’t realize we do call _concat_compat from here... I think we’ve already lost the tz by the time we call it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So do these if checks not even need to be there? Not sure what this means for the change we had here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback after closer examination, I'm not sure if _concat_compat actually already handles this. If the type is also a category, it simply calls _concat_categorical which then summons _concat_asobject. I think that therefore the change to the code should happen inside _concat_categorical or _concat_asobject. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback calling _concat_compat before the list is modified like this causes _concat_compat to just call _concat_categorical again and back and forth until they reach the maximum recursion depth. We have to modify the array somehow before that happens. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess I'm not sure why we aren't doing to_concat = [np.asarray(x.astype(object)) for x in to_concat] The function name is Please write the test first though, and ensure that the test fails with master. Then implement the fix and make sure it works. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe I tried this previously but the build fails (I think that was the dimensions bug we had before) @TomAugspurger There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like we have some inconsistencies in I was going by the loose rule of "different dtypes means the result type is object". But things like In [1]: import pandas as pd; import numpy as np; import pickle
In [2]: a = pd.Series([1, 2], dtype='category')
In [3]: b = pd.Series([1, None], dtype='category')
In [4]: pd.concat([a, b], ignore_index=True)
Out[4]:
0 1.0
1 2.0
2 1.0
3 NaN
dtype: float64 mess that up. Not sure what's best here... |
||
else np.asarray(x.astype(object)) for x in to_concat] | ||
res = _concat_compat(to_concat) | ||
if axis == 1: | ||
return res.reshape(1, len(res)) | ||
|
@@ -221,9 +217,7 @@ def union_categoricals(to_union, sort_categories=False, ignore_order=False): | |
""" | ||
Combine list-like of Categorical-like, unioning categories. All | ||
categories must have the same dtype. | ||
|
||
.. versionadded:: 0.19.0 | ||
|
||
Parameters | ||
---------- | ||
to_union : list-like of Categorical, CategoricalIndex, | ||
|
@@ -234,13 +228,10 @@ def union_categoricals(to_union, sort_categories=False, ignore_order=False): | |
ignore_order : boolean, default False | ||
If true, the ordered attribute of the Categoricals will be ignored. | ||
Results in an unordered categorical. | ||
|
||
.. versionadded:: 0.20.0 | ||
|
||
Returns | ||
------- | ||
result : Categorical | ||
|
||
Raises | ||
------ | ||
TypeError | ||
|
@@ -250,69 +241,52 @@ def union_categoricals(to_union, sort_categories=False, ignore_order=False): | |
- sort_categories=True and Categoricals are ordered | ||
ValueError | ||
Empty list of categoricals passed | ||
|
||
Notes | ||
----- | ||
|
||
To learn more about categories, see `link | ||
<http://pandas.pydata.org/pandas-docs/stable/categorical.html#unioning>`__ | ||
|
||
Examples | ||
-------- | ||
|
||
>>> from pandas.api.types import union_categoricals | ||
|
||
If you want to combine categoricals that do not necessarily have | ||
the same categories, `union_categoricals` will combine a list-like | ||
of categoricals. The new categories will be the union of the | ||
categories being combined. | ||
|
||
>>> a = pd.Categorical(["b", "c"]) | ||
>>> b = pd.Categorical(["a", "b"]) | ||
>>> union_categoricals([a, b]) | ||
[b, c, a, b] | ||
Categories (3, object): [b, c, a] | ||
|
||
By default, the resulting categories will be ordered as they appear | ||
in the `categories` of the data. If you want the categories to be | ||
lexsorted, use `sort_categories=True` argument. | ||
|
||
>>> union_categoricals([a, b], sort_categories=True) | ||
[b, c, a, b] | ||
Categories (3, object): [a, b, c] | ||
|
||
`union_categoricals` also works with the case of combining two | ||
categoricals of the same categories and order information (e.g. what | ||
you could also `append` for). | ||
|
||
>>> a = pd.Categorical(["a", "b"], ordered=True) | ||
>>> b = pd.Categorical(["a", "b", "a"], ordered=True) | ||
>>> union_categoricals([a, b]) | ||
[a, b, a, b, a] | ||
Categories (2, object): [a < b] | ||
|
||
Raises `TypeError` because the categories are ordered and not identical. | ||
|
||
>>> a = pd.Categorical(["a", "b"], ordered=True) | ||
>>> b = pd.Categorical(["a", "b", "c"], ordered=True) | ||
>>> union_categoricals([a, b]) | ||
TypeError: to union ordered Categoricals, all categories must be the same | ||
|
||
New in version 0.20.0 | ||
|
||
Ordered categoricals with different categories or orderings can be | ||
combined by using the `ignore_ordered=True` argument. | ||
|
||
>>> a = pd.Categorical(["a", "b", "c"], ordered=True) | ||
>>> b = pd.Categorical(["c", "b", "a"], ordered=True) | ||
>>> union_categoricals([a, b], ignore_order=True) | ||
[a, b, c, c, b, a] | ||
Categories (3, object): [a, b, c] | ||
|
||
`union_categoricals` also works with a `CategoricalIndex`, or `Series` | ||
containing categorical data, but note that the resulting array will | ||
always be a plain `Categorical` | ||
|
||
>>> a = pd.Series(["b", "c"], dtype='category') | ||
>>> b = pd.Series(["a", "b"], dtype='category') | ||
>>> union_categoricals([a, b]) | ||
|
@@ -403,13 +377,11 @@ def _concat_datetime(to_concat, axis=0, typs=None): | |
""" | ||
provide concatenation of an datetimelike array of arrays each of which is a | ||
single M8[ns], datetimet64[ns, tz] or m8[ns] dtype | ||
|
||
Parameters | ||
---------- | ||
to_concat : array of arrays | ||
axis : axis to provide concatenation | ||
typs : set of to_concat dtypes | ||
|
||
Returns | ||
------- | ||
a single array, preserving the combined dtypes | ||
|
@@ -509,13 +481,11 @@ def _concat_sparse(to_concat, axis=0, typs=None): | |
""" | ||
provide concatenation of an sparse/dense array of arrays each of which is a | ||
single dtype | ||
|
||
Parameters | ||
---------- | ||
to_concat : array of arrays | ||
axis : axis to provide concatenation | ||
typs : set of to_concat dtypes | ||
|
||
Returns | ||
------- | ||
a single array, preserving the combined dtypes | ||
|
@@ -579,4 +549,4 @@ def _concat_rangeindex_same_dtype(indexes): | |
|
||
# Here all "indexes" had 0 length, i.e. were empty. | ||
# In this case return an empty range index. | ||
return RangeIndex(0, 0) | ||
return RangeIndex(0, 0) |
Uh oh!
There was an error while loading. Please reload this page.