Skip to content

BUG: identity checking NA in map incorrect #58392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
193 commits
Select commit Hold shift + click to select a range
68b6c7c
Remove cast to numpy for series supporting NA as na_value in map func…
droussea2001 Apr 21, 2024
dcc8dab
Add test for map operation applied on series supporting NA as na_value
droussea2001 Apr 21, 2024
19215b7
Adapt test_map test to take into account series containing pd.NA
droussea2001 Apr 23, 2024
b916372
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Apr 23, 2024
539bf7e
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Apr 26, 2024
616620c
Add an entry in Conversion section (issue 57390)
droussea2001 Apr 26, 2024
8473e73
Correct whatsnew order with pre commit
droussea2001 Apr 26, 2024
1d49ac0
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 May 6, 2024
a5bf510
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 May 16, 2024
a17d8b5
Add the possibility to process pd.NA values
droussea2001 May 27, 2024
1f2965c
Add the possibility to process pd.NA values
droussea2001 May 27, 2024
70c2b8a
Remove test ambiguity with pd.NA processing
droussea2001 May 27, 2024
584c1ca
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 May 27, 2024
c7fe27b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 27, 2024
492d167
Code clean up
droussea2001 May 27, 2024
557bce1
Merge branch 'BUG-57390/Identity-checking-NA-in-map-incorrect' of htt…
droussea2001 May 27, 2024
49596c9
Limit NA management to BooleanArray, FloatingArray and IntegerArray t…
droussea2001 May 29, 2024
5eb85ff
Try to correct BaseMaskedArray cast error detected by mypy
droussea2001 May 29, 2024
4f5cfe5
Correct typo error: missing else condition
droussea2001 May 29, 2024
32ceaa3
Try to correct mypy error with mask parameter in map_infer
droussea2001 May 30, 2024
816a0d2
Code clean up: simplify map_infer calls
droussea2001 May 31, 2024
d2acb4a
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 May 31, 2024
7cb37df
Correct values input type for map_infer_mask
droussea2001 May 31, 2024
5d7ad8b
Remove unnecessary cast with to_numpy before map_array call
droussea2001 May 31, 2024
b9631a3
Manage ExtensionArray and convert to nullable dtype
droussea2001 Jun 11, 2024
6c85b64
Add convert_to_nullable_dtype to map_infer (used in maybe_convert_obj…
droussea2001 Jun 11, 2024
c84932f
Add convert_to_nullable_dtype to map_infer_mask (used in maybe_conver…
droussea2001 Jun 12, 2024
421e779
Conversion to numpy object is not necessary anymore
droussea2001 Jun 12, 2024
25c2b90
Tests results are verified as ExtensionArray
droussea2001 Jun 12, 2024
23c48d2
Tests was extended to Int64, Float64 and boolean
droussea2001 Jun 12, 2024
d498b54
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 Jun 12, 2024
a206c94
convert to nullable dtype only if there are nullable value
droussea2001 Jun 12, 2024
b36b581
Manage date and time dtype pyarrow as object
droussea2001 Jun 19, 2024
6375701
Manage pyarrow string
droussea2001 Jun 19, 2024
eda7702
Manage pyarrow string
droussea2001 Jun 19, 2024
9ab6602
Manage BasedMaskedArray
droussea2001 Jun 19, 2024
0da8920
Test directly ExtensionArray
droussea2001 Jun 19, 2024
e8bce29
pyarrow data keep their original type if possible
droussea2001 Jun 19, 2024
14e8973
if map return only pd.NA values their type is double pyarrow
droussea2001 Jun 19, 2024
d0d8ea2
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 Jun 19, 2024
5e3ad28
Add storage to map_infer_mask
droussea2001 Jun 23, 2024
c9dd068
Add storage to map_infer_mask
droussea2001 Jun 23, 2024
d1b6a28
Add empty dict as NA value for JSONArray extension
droussea2001 Jun 23, 2024
3ccc4fd
Add storage parameter to map_infer_mask
droussea2001 Jun 23, 2024
996d99a
Cast result to an extension array
droussea2001 Jun 23, 2024
a60b23a
Cast result to a NumpyExtensionArray an extension array
droussea2001 Jun 23, 2024
505bdec
Cast result to an extension array
droussea2001 Jun 23, 2024
17f46c2
Remove dtype test
droussea2001 Jun 23, 2024
528c6ab
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 Jun 23, 2024
fa9a2f2
Take into account UserDict in checknull
droussea2001 Jun 24, 2024
92ed4ef
Take into na_value in in map_infer_mask
droussea2001 Jun 24, 2024
ff28d74
Manage IntervalDtype
droussea2001 Jun 27, 2024
ee088d4
Manage ArrowDType int64
droussea2001 Jun 27, 2024
2f84261
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 Jun 27, 2024
259b423
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Jun 27, 2024
0067edf
Correct error in empty mapper management
droussea2001 Jun 27, 2024
ac6d324
Merge branch 'BUG-57390/Identity-checking-NA-in-map-incorrect' of htt…
droussea2001 Jun 27, 2024
547662d
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 Jun 27, 2024
e94997a
Manage IntervalDtype
droussea2001 Jul 2, 2024
f93dc66
Try to manage date with pyarrow
droussea2001 Jul 14, 2024
963f99a
Manage timedelta, datetimetz and date
droussea2001 Jul 25, 2024
e92152e
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Jul 25, 2024
f8deed6
pylint fix
droussea2001 Jul 25, 2024
b90af02
Code simplification
droussea2001 Jul 25, 2024
26a6fb7
Correct values initialization problem
droussea2001 Jul 25, 2024
fa46a96
Manage pyarrow and python storage
droussea2001 Jul 26, 2024
d6264e6
Manage pyarrow and python storage in map dict like
droussea2001 Jul 28, 2024
237926d
Correct wrong default storage type
droussea2001 Jul 28, 2024
247e9d8
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Jul 30, 2024
0fc4b60
Add convert_non_numeric as map_infer_mask parameter
droussea2001 Aug 1, 2024
d60b7e9
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Aug 1, 2024
6b5c8db
pyarrow data are sent to map_infer as iterator
droussea2001 Aug 2, 2024
8578b1e
Add method _maybe_convert_pyarrow_objects
droussea2001 Aug 2, 2024
3fb8b0d
Remove check_dtype
droussea2001 Aug 2, 2024
b7de292
Code simplification
droussea2001 Aug 2, 2024
a42048f
Manage default storage value
droussea2001 Aug 2, 2024
4c61857
ord(x) return a TypeError if x is a pyarrow.lib.LargeStringScalar
droussea2001 Aug 2, 2024
2d92818
Manage str.encode for pyarrow.lib.LargeStringScalar
droussea2001 Aug 3, 2024
56f8f16
Manage string convertible to nullable dtype
droussea2001 Aug 4, 2024
88a54f7
Manage Based masked dtype
droussea2001 Aug 5, 2024
6dbbf13
Code clean up
droussea2001 Aug 5, 2024
72eca60
Code simplification
droussea2001 Aug 7, 2024
d8a70b4
Manage pyarrow string
droussea2001 Aug 7, 2024
ec16f75
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Aug 7, 2024
69269d7
Manage json and decimal extension array
droussea2001 Aug 9, 2024
d7d0614
Manage na_value in python string
droussea2001 Aug 9, 2024
0f242b2
Cast to BasedMasked is limited to array containing one type
droussea2001 Aug 9, 2024
d293ce6
numpy dtype is extracted from the identified types in object
droussea2001 Aug 12, 2024
d19fb2c
Correct typo in exception
droussea2001 Aug 12, 2024
9363be6
Correct typo in based masked array conversion
droussea2001 Aug 12, 2024
58de9ac
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Aug 12, 2024
434cb7e
Integrate map evolution without to_numpy() conversion
droussea2001 Apr 21, 2024
adc8493
Add test for map operation applied on series supporting NA as na_value
droussea2001 Apr 21, 2024
83d7093
Adapt test_map test to take into account series containing pd.NA
droussea2001 Apr 23, 2024
df473d7
Add an entry in Conversion section (issue 57390)
droussea2001 Apr 26, 2024
4cb8fcf
Add the possibility to process pd.NA values
droussea2001 May 27, 2024
20dd8e1
Add the possibility to process pd.NA values
droussea2001 May 27, 2024
45bc299
Remove test ambiguity with pd.NA processing
droussea2001 May 27, 2024
e38dac0
Code clean up
droussea2001 May 27, 2024
5fb2d6c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 27, 2024
a92fb75
Limit NA management to BooleanArray, FloatingArray and IntegerArray t…
droussea2001 May 29, 2024
d8005ed
Try to correct BaseMaskedArray cast error detected by mypy
droussea2001 May 29, 2024
9900710
Correct typo error: missing else condition
droussea2001 May 29, 2024
dfdf6ed
Try to correct mypy error with mask parameter in map_infer
droussea2001 May 30, 2024
451f055
Code clean up: simplify map_infer calls
droussea2001 May 31, 2024
a836ad1
Correct values input type for map_infer_mask
droussea2001 May 31, 2024
01a8d88
Manage ExtensionArray and convert to nullable dtype
droussea2001 Jun 11, 2024
0f3bfaa
Add convert_to_nullable_dtype to map_infer (used in maybe_convert_obj…
droussea2001 Jun 11, 2024
1221069
Add convert_to_nullable_dtype to map_infer_mask (used in maybe_conver…
droussea2001 Jun 12, 2024
d70be9d
Conversion to numpy object is not necessary anymore
droussea2001 Jun 12, 2024
b77da73
Tests results are verified as ExtensionArray
droussea2001 Jun 12, 2024
ad8494a
Tests was extended to Int64, Float64 and boolean
droussea2001 Jun 12, 2024
0ad45c7
convert to nullable dtype only if there are nullable value
droussea2001 Jun 12, 2024
f73e7b6
Manage date and time dtype pyarrow as object
droussea2001 Jun 19, 2024
972957c
Manage pyarrow string
droussea2001 Jun 19, 2024
4f6fd09
Manage pyarrow string
droussea2001 Jun 19, 2024
59d4c3e
Manage BasedMaskedArray
droussea2001 Jun 19, 2024
b8f8e23
Test directly ExtensionArray
droussea2001 Jun 19, 2024
41c13f3
pyarrow data keep their original type if possible
droussea2001 Jun 19, 2024
f87ee61
if map return only pd.NA values their type is double pyarrow
droussea2001 Jun 19, 2024
48c2dd5
Add storage to map_infer_mask
droussea2001 Jun 23, 2024
d5aeef2
Add storage to map_infer_mask
droussea2001 Jun 23, 2024
9cd640f
Add empty dict as NA value for JSONArray extension
droussea2001 Jun 23, 2024
05c01e6
Add storage parameter to map_infer_mask
droussea2001 Jun 23, 2024
1244406
Cast result to an extension array
droussea2001 Jun 23, 2024
47e3c24
Cast result to a NumpyExtensionArray an extension array
droussea2001 Jun 23, 2024
18ae900
Cast result to an extension array
droussea2001 Jun 23, 2024
1df0396
Remove dtype test
droussea2001 Jun 23, 2024
20de040
Take into account UserDict in checknull
droussea2001 Jun 24, 2024
7798eee
Take into na_value in in map_infer_mask
droussea2001 Jun 24, 2024
a88295f
Manage IntervalDtype
droussea2001 Jun 27, 2024
e09f878
Manage ArrowDType int64
droussea2001 Jun 27, 2024
55b8992
Correct error in empty mapper management
droussea2001 Jun 27, 2024
f7d875c
Manage IntervalDtype
droussea2001 Jul 2, 2024
4f7574f
Try to manage date with pyarrow
droussea2001 Jul 14, 2024
77965c3
Manage timedelta, datetimetz and date
droussea2001 Jul 25, 2024
a01b55e
pylint fix
droussea2001 Jul 25, 2024
f5efabc
Code simplification
droussea2001 Jul 25, 2024
d8c0219
Correct values initialization problem
droussea2001 Jul 25, 2024
a89373c
Manage pyarrow and python storage
droussea2001 Jul 26, 2024
89b872f
Manage pyarrow and python storage in map dict like
droussea2001 Jul 28, 2024
d9f9319
Correct wrong default storage type
droussea2001 Jul 28, 2024
038cfb8
Add convert_non_numeric as map_infer_mask parameter
droussea2001 Aug 1, 2024
a885b7a
pyarrow data are sent to map_infer as iterator
droussea2001 Aug 2, 2024
d344841
Add method _maybe_convert_pyarrow_objects
droussea2001 Aug 2, 2024
0b56b9c
Remove check_dtype
droussea2001 Aug 2, 2024
93bb8d7
Code simplification
droussea2001 Aug 2, 2024
bc04ec7
Manage default storage value
droussea2001 Aug 2, 2024
a03357e
ord(x) return a TypeError if x is a pyarrow.lib.LargeStringScalar
droussea2001 Aug 2, 2024
646a85d
Manage str.encode for pyarrow.lib.LargeStringScalar
droussea2001 Aug 3, 2024
b4adcad
Manage string convertible to nullable dtype
droussea2001 Aug 4, 2024
efc2600
Manage Based masked dtype
droussea2001 Aug 5, 2024
d4b5396
Code clean up
droussea2001 Aug 5, 2024
5c1c726
Code simplification
droussea2001 Aug 7, 2024
7c6fdb2
Manage pyarrow string
droussea2001 Aug 7, 2024
8f18e41
Manage json and decimal extension array
droussea2001 Aug 9, 2024
1945ce6
Manage na_value in python string
droussea2001 Aug 9, 2024
e2f2482
Cast to BasedMasked is limited to array containing one type
droussea2001 Aug 9, 2024
a5d3b74
numpy dtype is extracted from the identified types in object
droussea2001 Aug 12, 2024
e6d9f48
Correct typo in exception
droussea2001 Aug 12, 2024
3447e1a
Correct typo in based masked array conversion
droussea2001 Aug 12, 2024
6f0beb6
Remove check_dtype filter for tests
droussea2001 Aug 12, 2024
d6ae469
Resolve merge
droussea2001 Aug 12, 2024
d47c9b6
Resolve merge
droussea2001 Aug 19, 2024
a9cce25
Resolve merge
droussea2001 Aug 19, 2024
c673338
Resolve merge problem
droussea2001 Aug 19, 2024
ebcce09
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Aug 19, 2024
43b6b5a
take into account unsigned type
droussea2001 Aug 19, 2024
dcab913
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Aug 23, 2024
2801cc0
manage native python type and numpy scalar type
droussea2001 Aug 23, 2024
04df901
correct merge problem
droussea2001 Aug 23, 2024
7bbc88b
take into account NaT value
droussea2001 Aug 24, 2024
3d3e473
code clean up: pyarrow management simplification in maybe_convert_obj…
droussea2001 Aug 24, 2024
183b6e6
code clean up: pyarrow management simplification in maybe_convert_obj…
droussea2001 Aug 24, 2024
8c3050c
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Aug 31, 2024
25a9d35
BooleanArray does not support pd.NaT
droussea2001 Aug 31, 2024
c10a244
code clean up
droussea2001 Aug 31, 2024
ebb8d26
code clean up: _convert_to_pyarrow simplification
droussea2001 Aug 31, 2024
53a885c
Code refacto and clean up
droussea2001 Sep 1, 2024
acfe152
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Sep 9, 2024
f84cd8b
Code clean up (restore iterator in map_infer_mask
droussea2001 Sep 9, 2024
3784b5e
Code simplification
droussea2001 Sep 9, 2024
f1fb54e
Correct pyarrow cast explanation in comment
droussea2001 Sep 9, 2024
e54dcdb
Code simplification and new comments about scalar type interpretation
droussea2001 Sep 9, 2024
ba7d37d
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 Sep 25, 2024
1ab81c0
Code simplification
droussea2001 Sep 25, 2024
985598b
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Oct 11, 2024
c23f65b
Remove unnecessary test
droussea2001 Oct 11, 2024
d1a8190
static check correction
droussea2001 Oct 11, 2024
f7f6578
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Oct 31, 2024
39fd9bc
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Nov 4, 2024
7f45b06
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Nov 11, 2024
651d37f
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 Dec 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -647,6 +647,8 @@ Numeric

Conversion
^^^^^^^^^^
- Bug in :meth:`BaseMaskedArray.map` was casting ``pd.NA`` to ``np.nan``. (:issue:`57390`)
- Bug in :meth:`BaseMaskedArray.map` was casting ``pd.NA`` to ``np.nan``. (:issue:`57390`)
- Bug in :meth:`DataFrame.astype` not casting ``values`` for Arrow-based dictionary dtype correctly (:issue:`58479`)
- Bug in :meth:`DataFrame.update` bool dtype being converted to object (:issue:`55509`)
- Bug in :meth:`Series.astype` might modify read-only array inplace when casting to a string dtype (:issue:`57212`)
Expand Down
20 changes: 20 additions & 0 deletions pandas/_libs/lib.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,10 @@ def map_infer(
*,
convert: Literal[False],
ignore_na: bool = ...,
mask: npt.NDArray[np.bool_] | None = ...,
na_value: Any = ...,
convert_to_nullable_dtype: bool = ...,
storage: str | None = ...,
) -> np.ndarray: ...
@overload
def map_infer(
Expand All @@ -82,6 +86,10 @@ def map_infer(
*,
convert: bool = ...,
ignore_na: bool = ...,
mask: npt.NDArray[np.bool_] | None = ...,
na_value: Any = ...,
convert_to_nullable_dtype: bool = ...,
storage: str | None = ...,
) -> ArrayLike: ...
@overload
def maybe_convert_objects(
Expand All @@ -93,6 +101,8 @@ def maybe_convert_objects(
convert_non_numeric: Literal[False] = ...,
convert_to_nullable_dtype: Literal[False] = ...,
dtype_if_all_nat: DtypeObj | None = ...,
storage: str | None = ...,
na_value: Any = ...,
) -> npt.NDArray[np.object_ | np.number]: ...
@overload
def maybe_convert_objects(
Expand All @@ -104,6 +114,8 @@ def maybe_convert_objects(
convert_non_numeric: bool = ...,
convert_to_nullable_dtype: Literal[True] = ...,
dtype_if_all_nat: DtypeObj | None = ...,
storage: str | None = ...,
na_value: Any = ...,
) -> ArrayLike: ...
@overload
def maybe_convert_objects(
Expand All @@ -115,6 +127,8 @@ def maybe_convert_objects(
convert_non_numeric: bool = ...,
convert_to_nullable_dtype: bool = ...,
dtype_if_all_nat: DtypeObj | None = ...,
storage: str | None = ...,
na_value: Any = ...,
) -> ArrayLike: ...
@overload
def maybe_convert_numeric(
Expand Down Expand Up @@ -178,6 +192,9 @@ def map_infer_mask(
convert: Literal[False],
na_value: Any = ...,
dtype: np.dtype = ...,
convert_to_nullable_dtype: bool = ...,
convert_non_numeric: bool = ...,
storage: str | None = ...,
) -> np.ndarray: ...
@overload
def map_infer_mask(
Expand All @@ -188,6 +205,9 @@ def map_infer_mask(
convert: bool = ...,
na_value: Any = ...,
dtype: np.dtype = ...,
convert_to_nullable_dtype: bool = ...,
convert_non_numeric: bool = ...,
storage: str | None = ...,
) -> ArrayLike: ...
def indices_fast(
index: npt.NDArray[np.intp],
Expand Down
151 changes: 145 additions & 6 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -2525,6 +2525,80 @@ def maybe_convert_numeric(
return (ints, None)


@cython.boundscheck(False)
@cython.wraparound(False)
def _convert_to_pyarrow(
ndarray[object] objects,
ndarray[uint8_t] mask,
object na_value=None) -> "ArrayLike":
from pandas.core.dtypes.dtypes import ArrowDtype

from pandas.core.arrays.string_ import StringDtype

# pa.array does not support na_value as pd.NA,
# so we replace them by None and then restore them after
objects[mask] = None
pa_array = pa.array(objects)

# Pyarrow large string are StringDtype (not ArrowDtype)
if pa.types.is_large_string(pa_array.type):
dtype = StringDtype(storage="pyarrow", na_value=na_value)
else:
dtype = ArrowDtype(pa_array.type)
return dtype.construct_array_type()._from_sequence(pa_array, dtype=dtype)


@cython.boundscheck(False)
@cython.wraparound(False)
def _convert_to_based_masked(
ndarray[object] objects,
object numpy_dtype) -> "ArrayLike":
from pandas.core.dtypes.dtypes import BaseMaskedDtype

from pandas.core.construction import array as pd_array

dtype = BaseMaskedDtype.from_numpy_dtype(numpy_dtype)
return pd_array(objects, dtype=dtype)


@cython.boundscheck(False)
@cython.wraparound(False)
def _maybe_get_numpy_dtype(Seen seen, object scalar_type):
# Numpy scalar type
if issubclass(scalar_type, np.generic):
return np.dtype(scalar_type)
# Native python type
elif seen.bool_:
return np.dtype(bool)
elif seen.uint_:
return np.dtype(np.uint)
elif seen.int_ or seen.sint_:
return np.dtype(int)
elif seen.float_:
return np.dtype(float)
else:
return None


@cython.boundscheck(False)
@cython.wraparound(False)
def _maybe_get_based_masked_scalar_numpy_dtype(
val_types,
seen,
convert_to_nullable_dtype):
# If we have no type or more than one type we cannot build a based masked array
if not val_types or len(val_types) > 1:
return None

numpy_dtype = _maybe_get_numpy_dtype(seen, val_types.pop())
if (
numpy_dtype and numpy_dtype.kind in "biuf"
and convert_to_nullable_dtype):
return numpy_dtype
else:
return None


@cython.boundscheck(False)
@cython.wraparound(False)
def maybe_convert_objects(ndarray[object] objects,
Expand All @@ -2534,7 +2608,9 @@ def maybe_convert_objects(ndarray[object] objects,
bint convert_numeric=True, # NB: different default!
bint convert_to_nullable_dtype=False,
bint convert_non_numeric=False,
object dtype_if_all_nat=None) -> "ArrayLike":
object dtype_if_all_nat=None,
str storage=None,
object na_value=None) -> "ArrayLike":
"""
Type inference function-- convert object array to proper dtype

Expand All @@ -2557,6 +2633,8 @@ def maybe_convert_objects(ndarray[object] objects,
Whether to convert datetime, timedelta, period, interval types.
dtype_if_all_nat : np.dtype, ExtensionDtype, or None, default None
Dtype to cast to if we have all-NaT.
storage : {None, "python", "pyarrow", "pyarrow_numpy"}, default None
Backend storage

Returns
-------
Expand Down Expand Up @@ -2592,9 +2670,14 @@ def maybe_convert_objects(ndarray[object] objects,
uints = cnp.PyArray_EMPTY(1, objects.shape, cnp.NPY_UINT64, 0)
bools = cnp.PyArray_EMPTY(1, objects.shape, cnp.NPY_UINT8, 0)
mask = np.full(n, False)
val = None
val_types = set()

for i in range(n):
val = objects[i]
if not checknull(val):
val_types.add(type(val))

if itemsize_max != -1:
itemsize = get_itemsize(val)
if itemsize > itemsize_max or itemsize == -1:
Expand Down Expand Up @@ -2728,6 +2811,17 @@ def maybe_convert_objects(ndarray[object] objects,
seen.object_ = True
break

if storage == "pyarrow":
return _convert_to_pyarrow(objects, mask, na_value)

based_masked_scalar_numpy_dtype = _maybe_get_based_masked_scalar_numpy_dtype(
val_types,
seen,
convert_to_nullable_dtype)

if based_masked_scalar_numpy_dtype:
return _convert_to_based_masked(objects, based_masked_scalar_numpy_dtype)

# we try to coerce datetime w/tz but must all have the same tz
if seen.datetimetz_:
if is_datetime_with_singletz_array(objects):
Expand Down Expand Up @@ -2791,6 +2885,12 @@ def maybe_convert_objects(ndarray[object] objects,
dtype = StringDtype(na_value=np.nan)
return dtype.construct_array_type()._from_sequence(objects, dtype=dtype)

elif storage == "python":
from pandas.core.arrays.string_ import StringDtype

dtype = StringDtype(storage=storage, na_value=na_value)
return dtype.construct_array_type()._from_sequence(objects, dtype=dtype)

seen.object_ = True
elif seen.interval_:
if is_interval_array(objects):
Expand Down Expand Up @@ -2944,7 +3044,10 @@ def map_infer_mask(
*,
bint convert=True,
object na_value=no_default,
cnp.dtype dtype=np.dtype(object)
bint convert_to_nullable_dtype=False,
convert_non_numeric=False,
cnp.dtype dtype=np.dtype(object),
str storage=None,
) -> "ArrayLike":
"""
Substitute for np.vectorize with pandas-friendly dtype inference.
Expand All @@ -2960,8 +3063,12 @@ def map_infer_mask(
na_value : Any, optional
The result value to use for masked values. By default, the
input value is used.
convert_non_numeric : bool, default False
Whether to convert datetime, timedelta, period, interval types.
dtype : numpy.dtype
The numpy dtype to use for the result ndarray.
storage : {None, "python", "pyarrow", "pyarrow_numpy"}, default None
Backend storage

Returns
-------
Expand Down Expand Up @@ -2997,15 +3104,29 @@ def map_infer_mask(
PyArray_ITER_NEXT(result_it)

if convert:
return maybe_convert_objects(result)
return maybe_convert_objects(
result,
convert_to_nullable_dtype=convert_to_nullable_dtype,
convert_non_numeric=convert_non_numeric,
storage=storage,
na_value=na_value,
)
else:
return result


@cython.boundscheck(False)
@cython.wraparound(False)
def map_infer(
ndarray arr, object f, *, bint convert=True, bint ignore_na=False
ndarray arr,
object f,
*,
bint convert=True,
bint ignore_na=False,
const uint8_t[:] mask=None,
object na_value=None,
bint convert_to_nullable_dtype=False,
str storage=None,
) -> "ArrayLike":
"""
Substitute for np.vectorize with pandas-friendly dtype inference.
Expand All @@ -3017,6 +3138,15 @@ def map_infer(
convert : bint
ignore_na : bint
If True, NA values will not have f applied
mask : ndarray, optional
uint8 dtype ndarray indicating na_value to apply `f` to.
na_value : Any, optional
The input value to use for masked values.
convert_to_nullable_dtype : bool, default False
If an array-like object contains only integer or boolean values (and NaN) is
encountered, whether to convert and return an Boolean/IntegerArray.
storage : {None, "python", "pyarrow", "pyarrow_numpy"}, default None
Backend storage

Returns
-------
Expand All @@ -3033,7 +3163,10 @@ def map_infer(
if ignore_na and checknull(arr[i]):
result[i] = arr[i]
continue
val = f(arr[i])
elif mask is not None and na_value is not None and mask[i]:
val = f(na_value)
else:
val = f(arr[i])

if cnp.PyArray_IsZeroDim(val):
# unbox 0-dim arrays, GH#690
Expand All @@ -3042,7 +3175,13 @@ def map_infer(
result[i] = val

if convert:
return maybe_convert_objects(result)
return maybe_convert_objects(
result,
convert_to_nullable_dtype=convert_to_nullable_dtype,
convert_non_numeric=True,
storage=storage,
na_value=na_value,
)
else:
return result

Expand Down
10 changes: 9 additions & 1 deletion pandas/_libs/missing.pyx
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from collections import UserDict
from decimal import Decimal
import numbers
from sys import maxsize
Expand Down Expand Up @@ -148,6 +149,7 @@ cpdef bint checknull(object val):
- np.timedelta64 representation of NaT
- NA
- Decimal("NaN")
- {} empty dict or UserDict

Parameters
----------
Expand All @@ -157,7 +159,12 @@ cpdef bint checknull(object val):
-------
bool
"""
if val is None or val is NaT or val is C_NA:
if (
val is None
or val is NaT
or val is C_NA
or (isinstance(val, (dict, UserDict)) and not val)
):
return True
elif util.is_float_object(val) or util.is_complex_object(val):
if val != val:
Expand Down Expand Up @@ -191,6 +198,7 @@ cpdef ndarray[uint8_t] isnaobj(ndarray arr):
- np.timedelta64 representation of NaT
- NA
- Decimal("NaN")
- {} empty dict or UserDict

Parameters
----------
Expand Down
Loading
Loading