Skip to content

BUG: Coerce to object for mixed concat #20799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 24, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion pandas/core/dtypes/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from pandas.core.dtypes.common import (
is_categorical_dtype,
is_sparse,
is_extension_array_dtype,
is_datetimetz,
is_datetime64_dtype,
is_timedelta64_dtype,
Expand Down Expand Up @@ -173,6 +174,10 @@ def is_nonempty(x):
elif 'sparse' in typs:
return _concat_sparse(to_concat, axis=axis, typs=typs)

extensions = [is_extension_array_dtype(x) for x in to_concat]
if any(extensions):
to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this is not correct

what about categorical? which is EA
what about DTI which is not?
do they hit this path ?

you need much more comprehensive tests here
I don’t think you need to convert to object for internal EA types (only external ones)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both categorical and datetimes cases are already filtered out before (couple of lines above) and use the _concat_categorical and _concat_datetime special cased functions. So the only case that is left here are actual external EAs, for which the only option is converting to object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh ok

some extra comments would be helpful
also test when u have those plus extension types in a frame would be nice


if not nonempty:
# we have all empties, but may need to coerce the result dtype to
# object if we have non-numeric type operands (numpy would otherwise
Expand Down Expand Up @@ -210,7 +215,7 @@ def _concat_categorical(to_concat, axis=0):

def _concat_asobject(to_concat):
to_concat = [x.get_values() if is_categorical_dtype(x.dtype)
else x.ravel() for x in to_concat]
else np.asarray(x).ravel() for x in to_concat]
res = _concat_compat(to_concat)
if axis == 1:
return res.reshape(1, len(res))
Expand Down Expand Up @@ -548,6 +553,8 @@ def convert_sparse(x, axis):
# coerce to native type
if isinstance(x, SparseArray):
x = x.get_values()
else:
x = np.asarray(x)
x = x.ravel()
if axis > 0:
x = np.atleast_2d(x)
Expand Down
23 changes: 23 additions & 0 deletions pandas/tests/extension/base/reshaping.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,29 @@ def test_concat_all_na_block(self, data_missing, in_frame):
expected = pd.Series(data_missing.take([1, 1, 0, 0]))
self.assert_series_equal(result, expected)

def test_concat_mixed_dtypes(self, data):
# https://github.com/pandas-dev/pandas/issues/20762
df1 = pd.DataFrame({'A': data[:3]})
df2 = pd.DataFrame({"A": [1, 2, 3]})
df3 = pd.DataFrame({"A": ['a', 'b', 'c']}).astype('category')
df4 = pd.DataFrame({"A": pd.SparseArray([1, 2, 3])})
dfs = [df1, df2, df3, df4]

# dataframes
result = pd.concat(dfs)
expected = pd.concat([x.astype(object) for x in dfs])
self.assert_frame_equal(result, expected)

# series
result = pd.concat([x['A'] for x in dfs])
expected = pd.concat([x['A'].astype(object) for x in dfs])
self.assert_series_equal(result, expected)

# simple test for just EA and one other
result = pd.concat([df1, df2])
expected = pd.concat([df1.astype('object'), df2.astype('object')])
self.assert_frame_equal(result, expected)

def test_align(self, data, na_value):
a = data[:3]
b = data[2:5]
Expand Down