Skip to content

Commit c20e795

Browse files
committed
Merge branch 'main' into 50040-add-math-mode-formatter-escape=latex-part2
2 parents 2320fb3 + c8ea34c commit c20e795

File tree

165 files changed

+2189
-2091
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

165 files changed

+2189
-2091
lines changed

.github/actions/build_pandas/action.yml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,5 @@ runs:
1616
python -m pip install -e . --no-build-isolation --no-use-pep517 --no-index
1717
shell: bash -el {0}
1818
env:
19-
# Cannot use parallel compilation on Windows, see https://github.com/pandas-dev/pandas/issues/30873
20-
# GH 47305: Parallel build causes flaky ImportError: /home/runner/work/pandas/pandas/pandas/_libs/tslibs/timestamps.cpython-38-x86_64-linux-gnu.so: undefined symbol: pandas_datetime_to_datetimestruct
21-
N_JOBS: 1
22-
#N_JOBS: ${{ runner.os == 'Windows' && 1 || 2 }}
19+
# https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources
20+
N_JOBS: ${{ runner.os == 'macOS' && 3 || 2 }}

.github/actions/setup-conda/action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ runs:
3030
environment-name: ${{ inputs.environment-name }}
3131
extra-specs: ${{ inputs.extra-specs }}
3232
channels: conda-forge
33-
channel-priority: ${{ runner.os == 'macOS' && 'flexible' || 'strict' }}
33+
channel-priority: 'strict'
3434
condarc-file: ci/condarc.yml
3535
cache-env: true
3636
cache-downloads: true

.github/workflows/32-bit-linux.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,10 @@ on:
55
branches:
66
- main
77
- 2.0.x
8-
- 1.5.x
98
pull_request:
109
branches:
1110
- main
1211
- 2.0.x
13-
- 1.5.x
1412
paths-ignore:
1513
- "doc/**"
1614

.github/workflows/code-checks.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,10 @@ on:
55
branches:
66
- main
77
- 2.0.x
8-
- 1.5.x
98
pull_request:
109
branches:
1110
- main
1211
- 2.0.x
13-
- 1.5.x
1412

1513
env:
1614
ENV_FILE: environment.yml

.github/workflows/docbuild-and-upload.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,12 @@ on:
55
branches:
66
- main
77
- 2.0.x
8-
- 1.5.x
98
tags:
109
- '*'
1110
pull_request:
1211
branches:
1312
- main
1413
- 2.0.x
15-
- 1.5.x
1614

1715
env:
1816
ENV_FILE: environment.yml

.github/workflows/macos-windows.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,10 @@ on:
55
branches:
66
- main
77
- 2.0.x
8-
- 1.5.x
98
pull_request:
109
branches:
1110
- main
1211
- 2.0.x
13-
- 1.5.x
1412
paths-ignore:
1513
- "doc/**"
1614
- "web/**"

.github/workflows/package-checks.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,10 @@ on:
55
branches:
66
- main
77
- 2.0.x
8-
- 1.5.x
98
pull_request:
109
branches:
1110
- main
1211
- 2.0.x
13-
- 1.5.x
1412
types: [ labeled, opened, synchronize, reopened ]
1513

1614
permissions:

.github/workflows/python-dev.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,13 @@ name: Python Dev
2323
on:
2424
push:
2525
branches:
26-
# - main
27-
# - 1.5.x
26+
- main
27+
- 2.0.x
2828
- None
2929
pull_request:
3030
branches:
31-
# - main
32-
# - 1.5.x
31+
- main
32+
- 2.0.x
3333
- None
3434
paths-ignore:
3535
- "doc/**"
@@ -47,7 +47,7 @@ permissions:
4747

4848
jobs:
4949
build:
50-
# if: false # Uncomment this to freeze the workflow, comment it to unfreeze
50+
if: false # Uncomment this to freeze the workflow, comment it to unfreeze
5151
runs-on: ${{ matrix.os }}
5252
strategy:
5353
fail-fast: false

.github/workflows/sdist.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,10 @@ on:
55
branches:
66
- main
77
- 2.0.x
8-
- 1.5.x
98
pull_request:
109
branches:
1110
- main
1211
- 2.0.x
13-
- 1.5.x
1412
types: [labeled, opened, synchronize, reopened]
1513
paths-ignore:
1614
- "doc/**"

.github/workflows/ubuntu.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,10 @@ on:
55
branches:
66
- main
77
- 2.0.x
8-
- 1.5.x
98
pull_request:
109
branches:
1110
- main
1211
- 2.0.x
13-
- 1.5.x
1412
paths-ignore:
1513
- "doc/**"
1614
- "web/**"

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ repos:
3131
rev: v0.0.253
3232
hooks:
3333
- id: ruff
34+
args: [--exit-non-zero-on-fix]
3435
- repo: https://github.com/jendrikseipp/vulture
3536
rev: 'v2.7'
3637
hooks:

MANIFEST.in

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,5 +58,3 @@ prune pandas/tests/io/parser/data
5858
# Selectively re-add *.cxx files that were excluded above
5959
graft pandas/_libs/src
6060
graft pandas/_libs/tslibs/src
61-
include pandas/_libs/pd_parser.h
62-
include pandas/_libs/pd_parser.c

asv_bench/benchmarks/algorithms.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ class Factorize:
2323
"uint",
2424
"float",
2525
"object",
26+
"object_str",
2627
"datetime64[ns]",
2728
"datetime64[ns, tz]",
2829
"Int64",
@@ -46,7 +47,8 @@ def setup(self, unique, sort, dtype):
4647
"int": pd.Index(np.arange(N), dtype="int64"),
4748
"uint": pd.Index(np.arange(N), dtype="uint64"),
4849
"float": pd.Index(np.random.randn(N), dtype="float64"),
49-
"object": string_index,
50+
"object_str": string_index,
51+
"object": pd.Index(np.arange(N), dtype="object"),
5052
"datetime64[ns]": pd.date_range("2011-01-01", freq="H", periods=N),
5153
"datetime64[ns, tz]": pd.date_range(
5254
"2011-01-01", freq="H", periods=N, tz="Asia/Tokyo"
@@ -62,6 +64,9 @@ def setup(self, unique, sort, dtype):
6264
def time_factorize(self, unique, sort, dtype):
6365
pd.factorize(self.data, sort=sort)
6466

67+
def peakmem_factorize(self, unique, sort, dtype):
68+
pd.factorize(self.data, sort=sort)
69+
6570

6671
class Duplicated:
6772
params = [

doc/source/development/contributing_environment.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ locally before pushing your changes. It's recommended to also install the :ref:`
2121
Step 1: install a C compiler
2222
----------------------------
2323

24-
How to do this will depend on your platform. If you choose to user ``Docker``
24+
How to do this will depend on your platform. If you choose to use ``Docker``
2525
in the next step, then you can skip this step.
2626

2727
**Windows**

doc/source/development/extending.rst

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -488,3 +488,49 @@ registers the default "matplotlib" backend as follows.
488488
489489
More information on how to implement a third-party plotting backend can be found at
490490
https://github.com/pandas-dev/pandas/blob/main/pandas/plotting/__init__.py#L1.
491+
492+
.. _extending.pandas_priority:
493+
494+
Arithmetic with 3rd party types
495+
-------------------------------
496+
497+
In order to control how arithmetic works between a custom type and a pandas type,
498+
implement ``__pandas_priority__``. Similar to numpy's ``__array_priority__``
499+
semantics, arithmetic methods on :class:`DataFrame`, :class:`Series`, and :class:`Index`
500+
objects will delegate to ``other``, if it has an attribute ``__pandas_priority__`` with a higher value.
501+
502+
By default, pandas objects try to operate with other objects, even if they are not types known to pandas:
503+
504+
.. code-block:: python
505+
506+
>>> pd.Series([1, 2]) + [10, 20]
507+
0 11
508+
1 22
509+
dtype: int64
510+
511+
In the example above, if ``[10, 20]`` was a custom type that can be understood as a list, pandas objects will still operate with it in the same way.
512+
513+
In some cases, it is useful to delegate to the other type the operation. For example, consider I implement a
514+
custom list object, and I want the result of adding my custom list with a pandas :class:`Series` to be an instance of my list
515+
and not a :class:`Series` as seen in the previous example. This is now possible by defining the ``__pandas_priority__`` attribute
516+
of my custom list, and setting it to a higher value, than the priority of the pandas objects I want to operate with.
517+
518+
The ``__pandas_priority__`` of :class:`DataFrame`, :class:`Series`, and :class:`Index` are ``4000``, ``3000``, and ``2000`` respectively. The base ``ExtensionArray.__pandas_priority__`` is ``1000``.
519+
520+
.. code-block:: python
521+
522+
class CustomList(list):
523+
__pandas_priority__ = 5000
524+
525+
def __radd__(self, other):
526+
# return `self` and not the addition for simplicity
527+
return self
528+
529+
custom = CustomList()
530+
series = pd.Series([1, 2, 3])
531+
532+
# Series refuses to add custom, since it's an unknown type with higher priority
533+
assert series.__add__(custom) is NotImplemented
534+
535+
# This will cause the custom class `__radd__` being used instead
536+
assert series + custom is custom

doc/source/user_guide/integer_na.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ implemented within pandas.
3232
arr = pd.array([1, 2, None], dtype=pd.Int64Dtype())
3333
arr
3434
35-
Or the string alias ``"Int64"`` (note the capital ``"I"``, to differentiate from
35+
Or the string alias ``"Int64"`` (note the capital ``"I"``) to differentiate from
3636
NumPy's ``'int64'`` dtype:
3737

3838
.. ipython:: python
@@ -67,7 +67,7 @@ with the dtype.
6767
pd.array([1, 2])
6868
6969
For backwards-compatibility, :class:`Series` infers these as either
70-
integer or float dtype
70+
integer or float dtype.
7171

7272
.. ipython:: python
7373
@@ -101,7 +101,7 @@ dtype if needed.
101101
# comparison
102102
s == 1
103103
104-
# indexing
104+
# slicing operation
105105
s.iloc[1:3]
106106
107107
# operate with other dtypes
@@ -110,7 +110,7 @@ dtype if needed.
110110
# coerce when needed
111111
s + 0.01
112112
113-
These dtypes can operate as part of ``DataFrame``.
113+
These dtypes can operate as part of a ``DataFrame``.
114114

115115
.. ipython:: python
116116
@@ -119,7 +119,7 @@ These dtypes can operate as part of ``DataFrame``.
119119
df.dtypes
120120
121121
122-
These dtypes can be merged & reshaped & casted.
122+
These dtypes can be merged, reshaped & casted.
123123

124124
.. ipython:: python
125125

doc/source/user_guide/io.rst

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -170,12 +170,15 @@ dtype : Type name or dict of column -> type, default ``None``
170170
the default determines the dtype of the columns which are not explicitly
171171
listed.
172172

173-
use_nullable_dtypes : bool = False
174-
Whether or not to use nullable dtypes as default when reading data. If
175-
set to True, nullable dtypes are used for all dtypes that have a nullable
176-
implementation, even if no nulls are present.
173+
dtype_backend : {"numpy_nullable", "pyarrow"}, defaults to NumPy backed DataFrames
174+
Which dtype_backend to use, e.g. whether a DataFrame should have NumPy
175+
arrays, nullable dtypes are used for all dtypes that have a nullable
176+
implementation when "numpy_nullable" is set, pyarrow is used for all
177+
dtypes if "pyarrow" is set.
177178

178-
.. versionadded:: 2.0
179+
The dtype_backends are still experimential.
180+
181+
.. versionadded:: 2.0
179182

180183
engine : {``'c'``, ``'python'``, ``'pyarrow'``}
181184
Parser engine to use. The C and pyarrow engines are faster, while the python engine
@@ -475,7 +478,7 @@ worth trying.
475478
476479
os.remove("foo.csv")
477480
478-
Setting ``use_nullable_dtypes=True`` will result in nullable dtypes for every column.
481+
Setting ``dtype_backend="numpy_nullable"`` will result in nullable dtypes for every column.
479482

480483
.. ipython:: python
481484
@@ -484,7 +487,7 @@ Setting ``use_nullable_dtypes=True`` will result in nullable dtypes for every co
484487
3,4.5,False,b,6,7.5,True,a,12-31-2019,
485488
"""
486489
487-
df = pd.read_csv(StringIO(data), use_nullable_dtypes=True, parse_dates=["i"])
490+
df = pd.read_csv(StringIO(data), dtype_backend="numpy_nullable", parse_dates=["i"])
488491
df
489492
df.dtypes
490493

doc/source/user_guide/pyarrow.rst

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -145,8 +145,8 @@ functions provide an ``engine`` keyword that can dispatch to PyArrow to accelera
145145
df
146146
147147
By default, these functions and all other IO reader functions return NumPy-backed data. These readers can return
148-
PyArrow-backed data by specifying the parameter ``use_nullable_dtypes=True`` **and** the global configuration option ``"mode.dtype_backend"``
149-
set to ``"pyarrow"``. A reader does not need to set ``engine="pyarrow"`` to necessarily return PyArrow-backed data.
148+
PyArrow-backed data by specifying the parameter ``dtype_backend="pyarrow"``. A reader does not need to set
149+
``engine="pyarrow"`` to necessarily return PyArrow-backed data.
150150

151151
.. ipython:: python
152152
@@ -155,20 +155,10 @@ set to ``"pyarrow"``. A reader does not need to set ``engine="pyarrow"`` to nece
155155
1,2.5,True,a,,,,,
156156
3,4.5,False,b,6,7.5,True,a,
157157
""")
158-
with pd.option_context("mode.dtype_backend", "pyarrow"):
159-
df_pyarrow = pd.read_csv(data, use_nullable_dtypes=True)
158+
df_pyarrow = pd.read_csv(data, dtype_backend="pyarrow")
160159
df_pyarrow.dtypes
161160
162-
To simplify specifying ``use_nullable_dtypes=True`` in several functions, you can set a global option ``nullable_dtypes``
163-
to ``True``. You will still need to set the global configuration option ``"mode.dtype_backend"`` to ``pyarrow``.
164-
165-
.. code-block:: ipython
166-
167-
In [1]: pd.set_option("mode.dtype_backend", "pyarrow")
168-
169-
In [2]: pd.options.mode.nullable_dtypes = True
170-
171-
Several non-IO reader functions can also use the ``"mode.dtype_backend"`` option to return PyArrow-backed data including:
161+
Several non-IO reader functions can also use the ``dtype_backend`` argument to return PyArrow-backed data including:
172162

173163
* :func:`to_numeric`
174164
* :meth:`DataFrame.convert_dtypes`

0 commit comments

Comments
 (0)