Skip to content

DOC: Add PyArrow user guide #51371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 15, 2023
Merged

Conversation

mroeschke
Copy link
Member

No description provided.

@mroeschke mroeschke added Docs Arrow pyarrow functionality labels Feb 14, 2023
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this'll be useful - I've left a really minor comment

df

If you already have an :external+pyarrow:py:class:`pyarrow.Array` or :external+pyarrow:py:class:`pyarrow.ChunkedArray`,
you can pass it into :class:`.arrays.ArrowExtensionArray` to construct the associated :class:`Series`, :class:`Index:`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
you can pass it into :class:`.arrays.ArrowExtensionArray` to construct the associated :class:`Series`, :class:`Index:`
you can pass it into :class:`.arrays.ArrowExtensionArray` to construct the associated :class:`Series`, :class:`Index`

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, otherwise lgtm

df

By default, these functions, and all other IO reader functions, return NumPy-backed data. These readers can return
PyArrow-backed data by specifying ``use_nullable_dtypes`` with the global configuration option ``"mode.dtype_backend"``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reference the global option to set use_nullable_dtypesas well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this sentence a bit confusing, seems like we're first suggesting to use mode.dtype_backend='use_nullable_dtypes'. Maybe we can write it like by specifying the parameter use nullable_dtypes *and* the global configuration ..., or something else that helps understand this better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a section about the global option and clarified that the parameter and global option need to be used together

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, I added couple of suggestions, but great addition.

df

By default, these functions, and all other IO reader functions, return NumPy-backed data. These readers can return
PyArrow-backed data by specifying ``use_nullable_dtypes`` with the global configuration option ``"mode.dtype_backend"``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this sentence a bit confusing, seems like we're first suggesting to use mode.dtype_backend='use_nullable_dtypes'. Maybe we can write it like by specifying the parameter use nullable_dtypes *and* the global configuration ..., or something else that helps understand this better.

@mroeschke mroeschke added this to the 2.0 milestone Feb 14, 2023
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, looks perfect!

.. ipython:: python

pd.set_option("mode.dtype_backend", "pyarrow")
pd.options.mode.nullable_dtypes = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has side effect, I guess we need to reset to False after this chapter is over (ci failure seems to be caused by this)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will convert to a code block instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works as well, thx

@mroeschke mroeschke merged commit 6d5e78e into pandas-dev:main Feb 15, 2023
@mroeschke mroeschke deleted the doc/pyarrow/userguide branch February 15, 2023 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants