-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Support large strings in interchange protocol #56772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
If I run
df = pd.Series([], name="a", dtype="large_string[pyarrow]").to_frame()
dfi = df.__dataframe__()
result = pd.api.interchange.from_dataframe(dfi)
print(dfi.__dataframe__().get_column_by_name('a').get_buffers()['data'])
then I get
(PandasBuffer({'bufsize': 0, 'ptr': 94739763740952, 'device': 'CPU'}), (<DtypeKind.STRING: 21>, 8, 'u', '='))
I think it should be 'U'
at the end?
I think you just need to update
pandas/pandas/core/interchange/column.py
Lines 303 to 309 in d2f05c2
# Define the dtype for the returned buffer | |
dtype = ( | |
DtypeKind.STRING, | |
8, | |
ArrowCTypes.STRING, | |
Endianness.NATIVE, | |
) # note: currently only support native endianness |
That may also fix #56754 |
@MarcoGorelli Does your commit fix the issue you detected earlier? If yes, we could backport to 2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @phofl !
Thanks to you :) |
…nge protocol) (#56795) Backport PR #56772: Support large strings in interchange protocol Co-authored-by: Patrick Hoefler <[email protected]>
* Support large strings in interchange protocol * Update test_impl.py * fixup buffer dtype, add todo * add whatsnew --------- Co-authored-by: MarcoGorelli <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.cc @MarcoGorelli