-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG/TST: Fix infer_dtype for Period array-likes and general ExtensionArrays #37367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
a3b7ad7
6aa4475
d6f6c0a
50d7425
ed36be2
ce64b64
0e15a9a
688104f
f5ea183
4c539e3
0504ad1
88dcd81
b53f46e
100b83b
fbb713d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1073,6 +1073,7 @@ _TYPE_MAP = { | |
"M": "datetime64", | ||
"timedelta64[ns]": "timedelta64", | ||
"m": "timedelta64", | ||
"period": "period", | ||
"interval": "interval", | ||
} | ||
|
||
|
@@ -1226,9 +1227,14 @@ cdef object _try_infer_map(object dtype): | |
object val | ||
str attr | ||
for attr in ["name", "kind", "base"]: | ||
val = getattr(dtype, attr) | ||
val = getattr(dtype, attr, None) | ||
if val in _TYPE_MAP: | ||
return _TYPE_MAP[val] | ||
# also check base name for parametrized dtypes (eg period[D]) | ||
if isinstance(val, str): | ||
val = val.split("[")[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we explicity add the Period[D] types to the type_map (there aren't that many) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Going from the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we were to get that specific, at some point it makes more sense to just return a dtype object There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right here why can't we try to convert to an EA type using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because we already have a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to import those in lib.pyx ? (might be fine, eg there are already imports from tslibs.period, but at the moment only If that's OK, I am fine with changing it. I don't necessarily find it less "hacky" than the current solution, but I just want some solution that is acceptable for all There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we not already import all of the scalar EA types? +1 on using the existing machinery There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
100% agree on both points
not ideal, but i dont think it will harm anything ATM There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
exactly you are missing the entire point of the dtype abstraction you avoid parsing strings in the first place i will be blocking this util / unless a good soln is done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
lib.pyx doesn't know anything about EAs. It only imports helper functions like
different than what? |
||
if val in _TYPE_MAP: | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return _TYPE_MAP[val] | ||
return None | ||
|
||
|
||
|
@@ -1267,6 +1273,7 @@ def infer_dtype(value: object, skipna: bool = True) -> str: | |
- time | ||
- period | ||
- mixed | ||
- unknown-array | ||
|
||
Raises | ||
------ | ||
|
@@ -1279,6 +1286,9 @@ def infer_dtype(value: object, skipna: bool = True) -> str: | |
specialized | ||
- 'mixed-integer-float' are floats and integers | ||
- 'mixed-integer' are integers mixed with non-integers | ||
- 'unknown-array' is the catchall for something that *is* an array (has | ||
a dtype attribute), but has a dtype unknown to pandas (e.g. external | ||
extension array) | ||
|
||
Examples | ||
-------- | ||
|
@@ -1347,12 +1357,10 @@ def infer_dtype(value: object, skipna: bool = True) -> str: | |
# e.g. categoricals | ||
dtype = value.dtype | ||
if not isinstance(dtype, np.dtype): | ||
value = _try_infer_map(value.dtype) | ||
if value is not None: | ||
return value | ||
|
||
# its ndarray-like but we can't handle | ||
raise ValueError(f"cannot infer type for {type(value)}") | ||
inferred = _try_infer_map(value.dtype) | ||
if inferred is not None: | ||
return inferred | ||
return "unknown-array" | ||
|
||
# Unwrap Series/Index | ||
values = np.asarray(value) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -208,11 +208,7 @@ def _validate(data): | |
if isinstance(values.dtype, StringDtype): | ||
return "string" | ||
|
||
try: | ||
inferred_dtype = lib.infer_dtype(values, skipna=True) | ||
except ValueError: | ||
# GH#27571 mostly occurs with ExtensionArray | ||
inferred_dtype = None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there's another case almost identical to this in dtypes.cast.convert_dtypes |
||
inferred_dtype = lib.infer_dtype(values, skipna=True) | ||
|
||
if inferred_dtype not in allowed_types: | ||
raise AttributeError("Can only use .str accessor with string values!") | ||
|
Uh oh!
There was an error while loading. Please reload this page.