Skip to content

replace _msgpack with _pyarrow #28944

Closed
Closed
@endremborza

Description

@endremborza

After finding out the future of read/to_msgpack and on-the-wire transmission of pandas objects in #28388 and #28388, as I understand there will be no fast way to do:

>>> buf = df.to_XY()
>>> type(buf)
<class 'bytes'>
>>> df2 = pd.read_XY(buf)
>>> df.equals(df2)
True

I think a lot of users would welcome the XY: pyarrow (or simply arrow) function to replace the current msgpack version. This could be done with a few wrappers over pyarrow pandas serialization functions. Testing and maintaining might seem daunting, but as I see it, the arrow project is quite committed to maintaining pandas compatibility.

However, if this is too much, at the very least a doc upgrade should address this, as currently it is not very easy to find the closest thing reproducing the top example.

I would be happy to post a PR with either, just pick one, and if the first one seems viable then the extent of necessary testing should be determined.

btw, my current method of replicating the above behavior is

>>> import pyarrow as pa
>>> buf = pa.serialize_pandas(df).to_pybytes()
>>> type(buf)
<class 'bytes'>
>>> df2 = pa.deserialize_pandas(buf)
>>> df.equals(df2)
True

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions