-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: identity checking NA in map incorrect #58392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
droussea2001
wants to merge
193
commits into
pandas-dev:main
from
droussea2001:BUG-57390/Identity-checking-NA-in-map-incorrect
Closed
Changes from 20 commits
Commits
Show all changes
193 commits
Select commit
Hold shift + click to select a range
68b6c7c
Remove cast to numpy for series supporting NA as na_value in map func…
droussea2001 dcc8dab
Add test for map operation applied on series supporting NA as na_value
droussea2001 19215b7
Adapt test_map test to take into account series containing pd.NA
droussea2001 b916372
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 539bf7e
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 616620c
Add an entry in Conversion section (issue 57390)
droussea2001 8473e73
Correct whatsnew order with pre commit
droussea2001 1d49ac0
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 a5bf510
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 a17d8b5
Add the possibility to process pd.NA values
droussea2001 1f2965c
Add the possibility to process pd.NA values
droussea2001 70c2b8a
Remove test ambiguity with pd.NA processing
droussea2001 584c1ca
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 c7fe27b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 492d167
Code clean up
droussea2001 557bce1
Merge branch 'BUG-57390/Identity-checking-NA-in-map-incorrect' of htt…
droussea2001 49596c9
Limit NA management to BooleanArray, FloatingArray and IntegerArray t…
droussea2001 5eb85ff
Try to correct BaseMaskedArray cast error detected by mypy
droussea2001 4f5cfe5
Correct typo error: missing else condition
droussea2001 32ceaa3
Try to correct mypy error with mask parameter in map_infer
droussea2001 816a0d2
Code clean up: simplify map_infer calls
droussea2001 d2acb4a
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 7cb37df
Correct values input type for map_infer_mask
droussea2001 5d7ad8b
Remove unnecessary cast with to_numpy before map_array call
droussea2001 b9631a3
Manage ExtensionArray and convert to nullable dtype
droussea2001 6c85b64
Add convert_to_nullable_dtype to map_infer (used in maybe_convert_obj…
droussea2001 c84932f
Add convert_to_nullable_dtype to map_infer_mask (used in maybe_conver…
droussea2001 421e779
Conversion to numpy object is not necessary anymore
droussea2001 25c2b90
Tests results are verified as ExtensionArray
droussea2001 23c48d2
Tests was extended to Int64, Float64 and boolean
droussea2001 d498b54
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 a206c94
convert to nullable dtype only if there are nullable value
droussea2001 b36b581
Manage date and time dtype pyarrow as object
droussea2001 6375701
Manage pyarrow string
droussea2001 eda7702
Manage pyarrow string
droussea2001 9ab6602
Manage BasedMaskedArray
droussea2001 0da8920
Test directly ExtensionArray
droussea2001 e8bce29
pyarrow data keep their original type if possible
droussea2001 14e8973
if map return only pd.NA values their type is double pyarrow
droussea2001 d0d8ea2
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 5e3ad28
Add storage to map_infer_mask
droussea2001 c9dd068
Add storage to map_infer_mask
droussea2001 d1b6a28
Add empty dict as NA value for JSONArray extension
droussea2001 3ccc4fd
Add storage parameter to map_infer_mask
droussea2001 996d99a
Cast result to an extension array
droussea2001 a60b23a
Cast result to a NumpyExtensionArray an extension array
droussea2001 505bdec
Cast result to an extension array
droussea2001 17f46c2
Remove dtype test
droussea2001 528c6ab
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 fa9a2f2
Take into account UserDict in checknull
droussea2001 92ed4ef
Take into na_value in in map_infer_mask
droussea2001 ff28d74
Manage IntervalDtype
droussea2001 ee088d4
Manage ArrowDType int64
droussea2001 2f84261
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 259b423
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 0067edf
Correct error in empty mapper management
droussea2001 ac6d324
Merge branch 'BUG-57390/Identity-checking-NA-in-map-incorrect' of htt…
droussea2001 547662d
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 e94997a
Manage IntervalDtype
droussea2001 f93dc66
Try to manage date with pyarrow
droussea2001 963f99a
Manage timedelta, datetimetz and date
droussea2001 e92152e
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 f8deed6
pylint fix
droussea2001 b90af02
Code simplification
droussea2001 26a6fb7
Correct values initialization problem
droussea2001 fa46a96
Manage pyarrow and python storage
droussea2001 d6264e6
Manage pyarrow and python storage in map dict like
droussea2001 237926d
Correct wrong default storage type
droussea2001 247e9d8
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 0fc4b60
Add convert_non_numeric as map_infer_mask parameter
droussea2001 d60b7e9
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 6b5c8db
pyarrow data are sent to map_infer as iterator
droussea2001 8578b1e
Add method _maybe_convert_pyarrow_objects
droussea2001 3fb8b0d
Remove check_dtype
droussea2001 b7de292
Code simplification
droussea2001 a42048f
Manage default storage value
droussea2001 4c61857
ord(x) return a TypeError if x is a pyarrow.lib.LargeStringScalar
droussea2001 2d92818
Manage str.encode for pyarrow.lib.LargeStringScalar
droussea2001 56f8f16
Manage string convertible to nullable dtype
droussea2001 88a54f7
Manage Based masked dtype
droussea2001 6dbbf13
Code clean up
droussea2001 72eca60
Code simplification
droussea2001 d8a70b4
Manage pyarrow string
droussea2001 ec16f75
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 69269d7
Manage json and decimal extension array
droussea2001 d7d0614
Manage na_value in python string
droussea2001 0f242b2
Cast to BasedMasked is limited to array containing one type
droussea2001 d293ce6
numpy dtype is extracted from the identified types in object
droussea2001 d19fb2c
Correct typo in exception
droussea2001 9363be6
Correct typo in based masked array conversion
droussea2001 58de9ac
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 434cb7e
Integrate map evolution without to_numpy() conversion
droussea2001 adc8493
Add test for map operation applied on series supporting NA as na_value
droussea2001 83d7093
Adapt test_map test to take into account series containing pd.NA
droussea2001 df473d7
Add an entry in Conversion section (issue 57390)
droussea2001 4cb8fcf
Add the possibility to process pd.NA values
droussea2001 20dd8e1
Add the possibility to process pd.NA values
droussea2001 45bc299
Remove test ambiguity with pd.NA processing
droussea2001 e38dac0
Code clean up
droussea2001 5fb2d6c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a92fb75
Limit NA management to BooleanArray, FloatingArray and IntegerArray t…
droussea2001 d8005ed
Try to correct BaseMaskedArray cast error detected by mypy
droussea2001 9900710
Correct typo error: missing else condition
droussea2001 dfdf6ed
Try to correct mypy error with mask parameter in map_infer
droussea2001 451f055
Code clean up: simplify map_infer calls
droussea2001 a836ad1
Correct values input type for map_infer_mask
droussea2001 01a8d88
Manage ExtensionArray and convert to nullable dtype
droussea2001 0f3bfaa
Add convert_to_nullable_dtype to map_infer (used in maybe_convert_obj…
droussea2001 1221069
Add convert_to_nullable_dtype to map_infer_mask (used in maybe_conver…
droussea2001 d70be9d
Conversion to numpy object is not necessary anymore
droussea2001 b77da73
Tests results are verified as ExtensionArray
droussea2001 ad8494a
Tests was extended to Int64, Float64 and boolean
droussea2001 0ad45c7
convert to nullable dtype only if there are nullable value
droussea2001 f73e7b6
Manage date and time dtype pyarrow as object
droussea2001 972957c
Manage pyarrow string
droussea2001 4f6fd09
Manage pyarrow string
droussea2001 59d4c3e
Manage BasedMaskedArray
droussea2001 b8f8e23
Test directly ExtensionArray
droussea2001 41c13f3
pyarrow data keep their original type if possible
droussea2001 f87ee61
if map return only pd.NA values their type is double pyarrow
droussea2001 48c2dd5
Add storage to map_infer_mask
droussea2001 d5aeef2
Add storage to map_infer_mask
droussea2001 9cd640f
Add empty dict as NA value for JSONArray extension
droussea2001 05c01e6
Add storage parameter to map_infer_mask
droussea2001 1244406
Cast result to an extension array
droussea2001 47e3c24
Cast result to a NumpyExtensionArray an extension array
droussea2001 18ae900
Cast result to an extension array
droussea2001 1df0396
Remove dtype test
droussea2001 20de040
Take into account UserDict in checknull
droussea2001 7798eee
Take into na_value in in map_infer_mask
droussea2001 a88295f
Manage IntervalDtype
droussea2001 e09f878
Manage ArrowDType int64
droussea2001 55b8992
Correct error in empty mapper management
droussea2001 f7d875c
Manage IntervalDtype
droussea2001 4f7574f
Try to manage date with pyarrow
droussea2001 77965c3
Manage timedelta, datetimetz and date
droussea2001 a01b55e
pylint fix
droussea2001 f5efabc
Code simplification
droussea2001 d8c0219
Correct values initialization problem
droussea2001 a89373c
Manage pyarrow and python storage
droussea2001 89b872f
Manage pyarrow and python storage in map dict like
droussea2001 d9f9319
Correct wrong default storage type
droussea2001 038cfb8
Add convert_non_numeric as map_infer_mask parameter
droussea2001 a885b7a
pyarrow data are sent to map_infer as iterator
droussea2001 d344841
Add method _maybe_convert_pyarrow_objects
droussea2001 0b56b9c
Remove check_dtype
droussea2001 93bb8d7
Code simplification
droussea2001 bc04ec7
Manage default storage value
droussea2001 a03357e
ord(x) return a TypeError if x is a pyarrow.lib.LargeStringScalar
droussea2001 646a85d
Manage str.encode for pyarrow.lib.LargeStringScalar
droussea2001 b4adcad
Manage string convertible to nullable dtype
droussea2001 efc2600
Manage Based masked dtype
droussea2001 d4b5396
Code clean up
droussea2001 5c1c726
Code simplification
droussea2001 7c6fdb2
Manage pyarrow string
droussea2001 8f18e41
Manage json and decimal extension array
droussea2001 1945ce6
Manage na_value in python string
droussea2001 e2f2482
Cast to BasedMasked is limited to array containing one type
droussea2001 a5d3b74
numpy dtype is extracted from the identified types in object
droussea2001 e6d9f48
Correct typo in exception
droussea2001 3447e1a
Correct typo in based masked array conversion
droussea2001 6f0beb6
Remove check_dtype filter for tests
droussea2001 d6ae469
Resolve merge
droussea2001 d47c9b6
Resolve merge
droussea2001 a9cce25
Resolve merge
droussea2001 c673338
Resolve merge problem
droussea2001 ebcce09
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 43b6b5a
take into account unsigned type
droussea2001 dcab913
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 2801cc0
manage native python type and numpy scalar type
droussea2001 04df901
correct merge problem
droussea2001 7bbc88b
take into account NaT value
droussea2001 3d3e473
code clean up: pyarrow management simplification in maybe_convert_obj…
droussea2001 183b6e6
code clean up: pyarrow management simplification in maybe_convert_obj…
droussea2001 8c3050c
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 25a9d35
BooleanArray does not support pd.NaT
droussea2001 c10a244
code clean up
droussea2001 ebb8d26
code clean up: _convert_to_pyarrow simplification
droussea2001 53a885c
Code refacto and clean up
droussea2001 acfe152
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 f84cd8b
Code clean up (restore iterator in map_infer_mask
droussea2001 3784b5e
Code simplification
droussea2001 f1fb54e
Correct pyarrow cast explanation in comment
droussea2001 e54dcdb
Code simplification and new comments about scalar type interpretation
droussea2001 ba7d37d
Merge remote-tracking branch 'upstream/main' into BUG-57390/Identity-…
droussea2001 1ab81c0
Code simplification
droussea2001 985598b
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 c23f65b
Remove unnecessary test
droussea2001 d1a8190
static check correction
droussea2001 f7f6578
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 39fd9bc
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 7f45b06
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 651d37f
Merge branch 'main' into BUG-57390/Identity-checking-NA-in-map-incorrect
droussea2001 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I think this section would be easier to read if you just try to assign the proper
mask
andna_value
to variables before making one filelib.map_infer
call at the endThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I'm surprised we don't have something more generic to cover this - does every function handle this on its own? Seems like a common pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok ! I tried to separate
mask
andna_value
initialization fromlib.map_infer
call at the end.About using something more generic, I did some experimentations with the
_ensure_data
function but if we have aBaseMaskedArray
int64, for example, it casts it in float64 after callingnp.asarray(values)
.But, I will explore the code to see if I can find something useful. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @rhshadrach runs into this pattern in groupby a lot - he may know more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me what pattern you're referring to.