-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: Coerce to object for mixed concat #20799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Coerce to object for mixed concat #20799
Conversation
pandas/core/dtypes/concat.py
Outdated
@@ -173,6 +174,10 @@ def is_nonempty(x): | |||
elif 'sparse' in typs: | |||
return _concat_sparse(to_concat, axis=axis, typs=typs) | |||
|
|||
extensions = [is_extension_array_dtype(x) for x in to_concat] | |||
if any(extensions) and not all(extensions): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have an actual case where the "and not all(extensions)" is hit?
Because if that is the case, I also think the np.concatenate(to_concat, axis=axis)
will fail as it has the wrong dimensions to both work at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhmm I suppose I don't have an example, because the all extension dtype case should never get here, right? It should be going down the is_uniform_join_units
path. I'll change it, to remove the all(extensions)
condition, so that it makes sense locally.
Codecov Report
@@ Coverage Diff @@
## master #20799 +/- ##
==========================================
- Coverage 91.85% 91.85% -0.01%
==========================================
Files 153 153
Lines 49310 49322 +12
==========================================
+ Hits 45292 45303 +11
- Misses 4018 4019 +1
Continue to review full report at Codecov.
|
Thanks! |
@@ -173,6 +174,10 @@ def is_nonempty(x): | |||
elif 'sparse' in typs: | |||
return _concat_sparse(to_concat, axis=axis, typs=typs) | |||
|
|||
extensions = [is_extension_array_dtype(x) for x in to_concat] | |||
if any(extensions): | |||
to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this is not correct
what about categorical? which is EA
what about DTI which is not?
do they hit this path ?
you need much more comprehensive tests here
I don’t think you need to convert to object for internal EA types (only external ones)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both categorical and datetimes cases are already filtered out before (couple of lines above) and use the _concat_categorical
and _concat_datetime
special cased functions. So the only case that is left here are actual external EAs, for which the only option is converting to object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh ok
some extra comments would be helpful
also test when u have those plus extension types in a frame would be nice
git diff upstream/master -u -- "*.py" | flake8 --diff