Skip to content

BUG: Can only compare identically-labeled Series objects (string vs. object) #61099

Open
@wahsmail

Description

@wahsmail

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['a', 'b', 'c'])
s2.index = s2.index.astype('string')

s1 < s2  # fails

s1, s2 = s1.align(s2)
s1 < s2  # also fails

s1 = s1.reindex(s2.index)
s1 < s2  # succeeds

Issue Description

When a series (or dataframe) with otherwise identical indices are compared, but the indexes are technically dtype(object) and dtype(string), element-wise comparison fails. In the debugger, it looks like the ExtensionArray StringArray.equals is False when comparing to a python list of strings, causing Series._indexed_same to return False.

Expected Behavior

Ideally the string and object dtype would be comparable. This in-between state for Pandas dtypes has been quite awkward, with some libraries porting over to numpy-nullable / pyarrow dtype backends, but the Pandas library defaults not using them yet.

Installed Versions

Replace this line with the output of pd.show_versions()

Metadata

Metadata

Labels

BugNeeds DiscussionRequires discussion from core team before further actionStringsString extension data type and string data

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions