-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Add PyArrow user guide #51371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Add PyArrow user guide #51371
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this'll be useful - I've left a really minor comment
doc/source/user_guide/pyarrow.rst
Outdated
df | ||
|
||
If you already have an :external+pyarrow:py:class:`pyarrow.Array` or :external+pyarrow:py:class:`pyarrow.ChunkedArray`, | ||
you can pass it into :class:`.arrays.ArrowExtensionArray` to construct the associated :class:`Series`, :class:`Index:` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can pass it into :class:`.arrays.ArrowExtensionArray` to construct the associated :class:`Series`, :class:`Index:` | |
you can pass it into :class:`.arrays.ArrowExtensionArray` to construct the associated :class:`Series`, :class:`Index` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment, otherwise lgtm
doc/source/user_guide/pyarrow.rst
Outdated
df | ||
|
||
By default, these functions, and all other IO reader functions, return NumPy-backed data. These readers can return | ||
PyArrow-backed data by specifying ``use_nullable_dtypes`` with the global configuration option ``"mode.dtype_backend"`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you reference the global option to set use_nullable_dtypes
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this sentence a bit confusing, seems like we're first suggesting to use mode.dtype_backend='use_nullable_dtypes'
. Maybe we can write it like by specifying the parameter use nullable_dtypes *and* the global configuration ...
, or something else that helps understand this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a section about the global option and clarified that the parameter and global option need to be used together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice, I added couple of suggestions, but great addition.
doc/source/user_guide/pyarrow.rst
Outdated
df | ||
|
||
By default, these functions, and all other IO reader functions, return NumPy-backed data. These readers can return | ||
PyArrow-backed data by specifying ``use_nullable_dtypes`` with the global configuration option ``"mode.dtype_backend"`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this sentence a bit confusing, seems like we're first suggesting to use mode.dtype_backend='use_nullable_dtypes'
. Maybe we can write it like by specifying the parameter use nullable_dtypes *and* the global configuration ...
, or something else that helps understand this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice, looks perfect!
doc/source/user_guide/pyarrow.rst
Outdated
.. ipython:: python | ||
|
||
pd.set_option("mode.dtype_backend", "pyarrow") | ||
pd.options.mode.nullable_dtypes = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has side effect, I guess we need to reset to False after this chapter is over (ci failure seems to be caused by this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will convert to a code block instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works as well, thx
No description provided.