-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: nullable Float32/64 ExtensionArray #34307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
cddc939
6a1d822
f43f021
ffdd65c
3d189f9
ff3b937
13b8281
4c1d06c
ebbc64d
94be5f2
8cf0d47
f7cc1be
107b083
ba1e62c
879d3e0
ed9a14b
c16ca4c
aa45aac
25eb1ba
b78c041
45b98f2
314b6a9
a157806
81456f9
71009c3
56d2311
65a2060
8b36098
7f3e965
66d6939
e0c9d9a
d37b815
44e699a
b42b61d
edf9618
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -254,6 +254,66 @@ If needed you can adjust the bins with the argument ``offset`` (a Timedelta) tha | |
|
||
For a full example, see: :ref:`timeseries.adjust-the-start-of-the-bins`. | ||
|
||
.. _whatsnew_110.floating: | ||
|
||
Experimental nullable data types for float data | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
We've added :class:`Float32Dtype` / :class:`Float64Dtype` and :class:`~arrays.FloatingArray`, | ||
an extension data type dedicated to floating point data that can hold the | ||
``pd.NA`` missing value indicator (:issue:`32265`, :issue:`34307`). | ||
|
||
While the default float data type already supports missing values using ``np.nan``, | ||
this new data type uses ``pd.NA`` (and its corresponding behaviour) as missing | ||
value indicator, in line with the already existing nullable :ref:`integer <integer_na>` | ||
and :ref:`boolean <boolean>` data types. | ||
|
||
One example where the behaviour of ``np.nan`` and ``pd.NA`` is different is | ||
comparison operations: | ||
|
||
.. code-block:: python | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# the default numpy float64 dtype | ||
>>> s1 = pd.Series([1.5, None]) | ||
>>> s1 | ||
0 1.5 | ||
1 NaN | ||
dtype: float64 | ||
|
||
>>> s1 > 1 | ||
0 True | ||
1 False | ||
dtype: bool | ||
|
||
# the new nullable float64 dtype | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
>>> s2 = pd.Series([1.5, None], dtype="Float64") | ||
>>> s2 | ||
0 1.5 | ||
1 <NA> | ||
dtype: Float64 | ||
|
||
>>> s2 > 1 | ||
0 True | ||
1 <NA> | ||
dtype: boolean | ||
|
||
See the :ref:`missing_data.NA` doc section for more details on the behaviour | ||
when using the ``pd.NA`` missing value indicator. | ||
|
||
As shown above, the dtype can be specified using the "Float64" or "Float32" | ||
string (capitalized to distinguish it from the default "float64" data type). | ||
Alternatively, you can also use the dtype object: | ||
|
||
.. ipython:: python | ||
|
||
pd.Series([1.5, None], dtype=pd.Float32Dtype()) | ||
|
||
.. warning:: | ||
|
||
Experimental: the new floating data types are currently experimental, and its | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there a doc section can link to? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not yet, can probably put more or less the same content as in this whatsnew somewhere in the user guide There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO, we should consolidate https://pandas.pydata.org/pandas-docs/dev/user_guide/integer_na.html and https://pandas.pydata.org/pandas-docs/dev/user_guide/boolean.html into a single "nullable data types" page and add it there (not in this PR). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed, that has been on my personal to do list, but never got to it. In general we should have a page about data types (and can still have some subpages if needed) |
||
behaviour or API may still change without warning. Expecially the behaviour | ||
regarding NaN (distinct from NA missing values) is subject to change. | ||
|
||
fsspec now used for filesystem handling | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -606,10 +606,11 @@ def logical_method(self, other): | |
@classmethod | ||
def _create_comparison_method(cls, op): | ||
def cmp_method(self, other): | ||
from pandas.arrays import IntegerArray | ||
from pandas.arrays import IntegerArray, FloatingArray | ||
|
||
if isinstance( | ||
other, (ABCDataFrame, ABCSeries, ABCIndexClass, IntegerArray) | ||
other, | ||
(ABCDataFrame, ABCSeries, ABCIndexClass, IntegerArray, FloatingArray), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should create a superclass NumericArray I think for Integer / Floating |
||
): | ||
# Rely on pandas to unbox and dispatch to us. | ||
return NotImplemented | ||
|
Uh oh!
There was an error while loading. Please reload this page.