pd.cut returning incorrect output in some cases

(Below is from master)

```python
import numpy as np
import pandas as pd

arr = np.arange(10).astype(object)
arr[::2] = np.nan

print(arr)
# [nan 1 nan 3 nan 5 nan 7 nan 9]

result = pd.cut(arr, 2)

print(result)
# [NaN, (0.992, 5.0], NaN, (0.992, 5.0], NaN, (0.992, 5.0], NaN, (0.992, 5.0], NaN, (0.992, 5.0]]
# Categories (2, interval[float64]): [(0.992, 5.0] < (5.0, 9.0]]

print(result.unique())
# [NaN, (0.992, 5.0]]
# Categories (1, interval[float64]): [(0.992, 5.0]]
```

Using `cut` with an array of object dtype containing missing values seems to return the wrong intervals in some cases (e.g., in the example above only the first interval appears in the result). Actually, the only situation where I've been able to reproduce this problem is specifically when the NaN values are evenly spaced, which is strange.

Looks like the problem is due to `searchsorted` in `numpy`:

```python
import numpy as np

arr = np.array([1, 2, 3, 4, 5], dtype=object)
arr[::2] = np.nan

print(arr)
# [nan 2 nan 4 nan]

bins = np.array([1, 3, 5])

# Inserts into same position (incorrect)
bins.searchsorted(arr)                                                                                                               
# array([0, 1, 0, 1, 0])

# Now inserts into different positions (correct)
bins.searchsorted(arr.astype(float))
# array([3, 1, 3, 2, 3])

np.__version__
# '1.17.5'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pd.cut returning incorrect output in some cases #31586

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pd.cut returning incorrect output in some cases #31586

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions