Skip to content

Bug: Fix Inconsistent Behavior for Series.searchsorted #49706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jan 6, 2023

Conversation

yqyqyq-W
Copy link
Contributor

@yqyqyq-W yqyqyq-W commented Nov 15, 2022

@yqyqyq-W yqyqyq-W changed the title Bug: Fix Inconsistent Behavior for Series.searchsorted Bug: Fix Inconsistent Behavior for Series.searchsorted #49620 Nov 15, 2022
@yqyqyq-W yqyqyq-W changed the title Bug: Fix Inconsistent Behavior for Series.searchsorted #49620 Bug: Fix Inconsistent Behavior for Series.searchsorted Nov 15, 2022
@yqyqyq-W yqyqyq-W marked this pull request as ready for review November 15, 2022 18:54
@@ -1270,6 +1271,13 @@ def searchsorted(
sorter: NumpySorter = None,
) -> npt.NDArray[np.intp] | np.intp:

if isinstance(value, pd.DataFrame):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check with ABCDataFrame

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the usage of ABCDataFrame. Does that mean I should use isinstance(value, ABCDataFrame) instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. at the top do from pandas.core.dtypes.generic import ABCDataFrame

@jbrockmendel
Copy link
Member

needs test

@yqyqyq-W yqyqyq-W marked this pull request as draft November 16, 2022 01:38
@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas Series Series data structure labels Nov 16, 2022
@yqyqyq-W yqyqyq-W marked this pull request as ready for review November 16, 2022 19:39
@@ -1270,6 +1270,13 @@ def searchsorted(
sorter: NumpySorter = None,
) -> npt.NDArray[np.intp] | np.intp:

if isinstance(value, ABCDataFrame):
msg = (
"Value must be array-like or scalar, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add "1D" here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added 1-D. In fact, n-D object except DataFrame works well with this function. As long as we don't call pd.array(n-d DataFrame).

@jbrockmendel
Copy link
Member

there's another subtle bug you've uncovered

>>> df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
>>> pd.array(df)
<StringArray>
[
['[1 3]', '[1 3]'],
['[2 4]', '[2 4]']
]
Shape: (2, 2), dtype: string

pd.array(df) should probably raise, maybe return pd.array(df._values). Definitely not this StringArray.

@@ -65,3 +67,10 @@ def test_searchsorted_sorter(self):
res = ser.searchsorted([0, 3], sorter=np.argsort(ser))
exp = np.array([0, 2], dtype=np.intp)
tm.assert_numpy_array_equal(res, exp)

def test_searchsorted_Dataframe_fail(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you de-capitalize "DataFrame" here, and add a comment # GH#49620 on the nex tline

@yqyqyq-W yqyqyq-W marked this pull request as draft November 21, 2022 03:39
@yqyqyq-W
Copy link
Contributor Author

yqyqyq-W commented Nov 21, 2022

That's exactly the real problem. The reason why DataFrame should be forbidden here is that type infer in pd.array doesn't work on n-D DataFrame. But such a fix is far beyond my ability.

@yqyqyq-W yqyqyq-W marked this pull request as ready for review November 21, 2022 06:18
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 22, 2022
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Jan 4, 2023
@jbrockmendel
Copy link
Member

@mroeschke i think this might be fine as-is; i can do a separate PR to fix pd.array

@jbrockmendel jbrockmendel reopened this Jan 5, 2023
@mroeschke mroeschke added this to the 2.0 milestone Jan 6, 2023
@mroeschke mroeschke merged commit c29736a into pandas-dev:main Jan 6, 2023
@mroeschke
Copy link
Member

Thanks @yqyqyq-W

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Series Series data structure Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Series.searchsorted(...) fails with Timestamp DataFrames
3 participants