Skip to content

BUG: pandas 1.2.0 and Pyarrow [0.16.0, 1.0.0) are incompatible for some column types #38801

Closed
@ADraginda

Description

@ADraginda

#35259 added optional importing from pyarrow. The currently listed minimum version of pyarrow is 0.15.1, and the logic of said PR guards against importing attributes from pyarrow.compute because it is not available in 0.15.1

Problem: pyarrow added the compute module in 0.16.0 but attributes imported by pandas are not available in that module until 1.0.0

Therefore: with pandas 1.2.0 and merging a DataFrame (with an Array String column?) :

Pyarrow Version Behavior
not installed works
< 0.15.1 not supported
[0.15.1, 0.16.0). works (pyarrow.compute.equal not used?)
[0.16.0, 1.0.0) ArgumentError
[1.0.0, 2.0.0 (latest)] works (using pyarrow.compute.equal)

Adding another try/except around the comparison functions in pandas string_arrow.py will change the table of functionality to:

Pyarrow Version Behavior
not installed works
< 0.15.1 not supported
[0.15.1, 1.0.0). works (pyarrow.compute.equal not used?)
[1.0.0, 2.0.0 (latest)] works (using pyarrow.compute.equal)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDependenciesRequired and optional dependenciesExtensionArrayExtending pandas with custom dtypes or arrays.RegressionFunctionality that used to work in a prior pandas versionStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions