Skip to content

Commit 5896e01

Browse files
Merge branch 'main' into raise-on-parse-int-overflow
2 parents 485dcfc + ddf2541 commit 5896e01

File tree

79 files changed

+1136
-359
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+1136
-359
lines changed

.circleci/setup_env.sh

Lines changed: 12 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,16 @@
11
#!/bin/bash -e
22

3-
# edit the locale file if needed
4-
if [[ "$(uname)" == "Linux" && -n "$LC_ALL" ]]; then
5-
echo "Adding locale to the first line of pandas/__init__.py"
6-
rm -f pandas/__init__.pyc
7-
SEDC="3iimport locale\nlocale.setlocale(locale.LC_ALL, '$LC_ALL')\n"
8-
sed -i "$SEDC" pandas/__init__.py
9-
10-
echo "[head -4 pandas/__init__.py]"
11-
head -4 pandas/__init__.py
12-
echo
13-
fi
3+
echo "Install Mambaforge"
4+
MAMBA_URL="https://github.com/conda-forge/miniforge/releases/download/4.14.0-0/Mambaforge-4.14.0-0-Linux-aarch64.sh"
5+
echo "Downloading $MAMBA_URL"
6+
wget -q $MAMBA_URL -O minimamba.sh
7+
chmod +x minimamba.sh
148

9+
MAMBA_DIR="$HOME/miniconda3"
10+
rm -rf $MAMBA_DIR
11+
./minimamba.sh -b -p $MAMBA_DIR
1512

16-
MINICONDA_DIR=/usr/local/miniconda
17-
if [ -e $MINICONDA_DIR ] && [ "$BITS32" != yes ]; then
18-
echo "Found Miniconda installation at $MINICONDA_DIR"
19-
else
20-
echo "Install Miniconda"
21-
DEFAULT_CONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest"
22-
if [[ "$(uname -m)" == 'aarch64' ]]; then
23-
CONDA_URL="https://github.com/conda-forge/miniforge/releases/download/4.10.1-4/Miniforge3-4.10.1-4-Linux-aarch64.sh"
24-
elif [[ "$(uname)" == 'Linux' ]]; then
25-
if [[ "$BITS32" == "yes" ]]; then
26-
CONDA_URL="$DEFAULT_CONDA_URL-Linux-x86.sh"
27-
else
28-
CONDA_URL="$DEFAULT_CONDA_URL-Linux-x86_64.sh"
29-
fi
30-
elif [[ "$(uname)" == 'Darwin' ]]; then
31-
CONDA_URL="$DEFAULT_CONDA_URL-MacOSX-x86_64.sh"
32-
else
33-
echo "OS $(uname) not supported"
34-
exit 1
35-
fi
36-
echo "Downloading $CONDA_URL"
37-
wget -q $CONDA_URL -O miniconda.sh
38-
chmod +x miniconda.sh
39-
40-
MINICONDA_DIR="$HOME/miniconda3"
41-
rm -rf $MINICONDA_DIR
42-
./miniconda.sh -b -p $MINICONDA_DIR
43-
fi
44-
export PATH=$MINICONDA_DIR/bin:$PATH
13+
export PATH=$MAMBA_DIR/bin:$PATH
4514

4615
echo
4716
echo "which conda"
@@ -51,7 +20,7 @@ echo
5120
echo "update conda"
5221
conda config --set ssl_verify false
5322
conda config --set quiet true --set always_yes true --set changeps1 false
54-
conda install -y -c conda-forge -n base 'mamba>=0.21.2' pip setuptools
23+
mamba install -y -c conda-forge -n base pip setuptools
5524

5625
echo "conda info -a"
5726
conda info -a
@@ -70,11 +39,6 @@ time mamba env update -n pandas-dev --file="${ENV_FILE}"
7039
echo "conda list -n pandas-dev"
7140
conda list -n pandas-dev
7241

73-
if [[ "$BITS32" == "yes" ]]; then
74-
# activate 32-bit compiler
75-
export CONDA_BUILD=1
76-
fi
77-
7842
echo "activate pandas-dev"
7943
source activate pandas-dev
8044

@@ -90,15 +54,9 @@ if pip list | grep -q ^pandas; then
9054
pip uninstall -y pandas || true
9155
fi
9256

93-
if [ "$(conda list -f qt --json)" != [] ]; then
94-
echo
95-
echo "remove qt"
96-
echo "causes problems with the clipboard, we use xsel for that"
97-
conda remove qt -y --force || true
98-
fi
99-
10057
echo "Build extensions"
101-
python setup.py build_ext -q -j3
58+
# GH 47305: Parallel build can causes flaky ImportError from pandas/_libs/tslibs
59+
python setup.py build_ext -q -j1
10260

10361
echo "Install pandas"
10462
python -m pip install --no-build-isolation --no-use-pep517 -e .

.github/workflows/docbuild-and-upload.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ jobs:
6767
echo "${{ secrets.server_ssh_key }}" > ~/.ssh/id_rsa
6868
chmod 600 ~/.ssh/id_rsa
6969
echo "${{ secrets.server_ip }} ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBE1Kkopomm7FHG5enATf7SgnpICZ4W2bw+Ho+afqin+w7sMcrsa0je7sbztFAV8YchDkiBKnWTG4cRT+KZgZCaY=" > ~/.ssh/known_hosts
70-
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
70+
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/tags/'))
7171

7272
- name: Copy cheatsheets into site directory
7373
run: cp doc/cheatsheet/Pandas_Cheat_Sheet* web/build/

asv_bench/benchmarks/groupby.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -560,6 +560,45 @@ def time_frame_agg(self, dtype, method):
560560
self.df.groupby("key").agg(method)
561561

562562

563+
class GroupByCythonAggEaDtypes:
564+
"""
565+
Benchmarks specifically targeting our cython aggregation algorithms
566+
(using a big enough dataframe with simple key, so a large part of the
567+
time is actually spent in the grouped aggregation).
568+
"""
569+
570+
param_names = ["dtype", "method"]
571+
params = [
572+
["Float64", "Int64", "Int32"],
573+
[
574+
"sum",
575+
"prod",
576+
"min",
577+
"max",
578+
"mean",
579+
"median",
580+
"var",
581+
"first",
582+
"last",
583+
"any",
584+
"all",
585+
],
586+
]
587+
588+
def setup(self, dtype, method):
589+
N = 1_000_000
590+
df = DataFrame(
591+
np.random.randint(0, high=100, size=(N, 10)),
592+
columns=list("abcdefghij"),
593+
dtype=dtype,
594+
)
595+
df["key"] = np.random.randint(0, 100, size=N)
596+
self.df = df
597+
598+
def time_frame_agg(self, dtype, method):
599+
self.df.groupby("key").agg(method)
600+
601+
563602
class Cumulative:
564603
param_names = ["dtype", "method"]
565604
params = [

asv_bench/benchmarks/multiindex_object.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,11 @@
33
import numpy as np
44

55
from pandas import (
6+
NA,
67
DataFrame,
78
MultiIndex,
89
RangeIndex,
10+
Series,
911
date_range,
1012
)
1113

@@ -255,4 +257,31 @@ def time_operation(self, index_structure, dtype, method):
255257
getattr(self.left, method)(self.right)
256258

257259

260+
class Unique:
261+
params = [
262+
(("Int64", NA), ("int64", 0)),
263+
]
264+
param_names = ["dtype_val"]
265+
266+
def setup(self, dtype_val):
267+
level = Series(
268+
[1, 2, dtype_val[1], dtype_val[1]] + list(range(1_000_000)),
269+
dtype=dtype_val[0],
270+
)
271+
self.midx = MultiIndex.from_arrays([level, level])
272+
273+
level_dups = Series(
274+
[1, 2, dtype_val[1], dtype_val[1]] + list(range(500_000)) * 2,
275+
dtype=dtype_val[0],
276+
)
277+
278+
self.midx_dups = MultiIndex.from_arrays([level_dups, level_dups])
279+
280+
def time_unique(self, dtype_val):
281+
self.midx.unique()
282+
283+
def time_unique_dups(self, dtype_val):
284+
self.midx_dups.unique()
285+
286+
258287
from .pandas_vb_common import setup # noqa: F401 isort:skip

ci/deps/actions-310.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ dependencies:
1919
- pytz
2020

2121
# optional dependencies
22-
- aiobotocore<2.0.0
2322
- beautifulsoup4
2423
- blosc
2524
- bottleneck
@@ -44,7 +43,7 @@ dependencies:
4443
- pyreadstat
4544
- python-snappy
4645
- pyxlsb
47-
- s3fs>=2021.05.0
46+
- s3fs>=2021.08.0
4847
- scipy
4948
- sqlalchemy
5049
- tabulate

ci/deps/actions-38-downstream_compat.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ dependencies:
4444
- pytables
4545
- python-snappy
4646
- pyxlsb
47-
- s3fs>=2021.05.0
47+
- s3fs>=2021.08.0
4848
- scipy
4949
- sqlalchemy
5050
- tabulate

ci/deps/actions-38-minimum_versions.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,10 @@ dependencies:
2626
- bottleneck=1.3.2
2727
- brotlipy=0.7.0
2828
- fastparquet=0.4.0
29-
- fsspec=2021.05.0
29+
- fsspec=2021.07.0
3030
- html5lib=1.1
3131
- hypothesis=6.13.0
32-
- gcsfs=2021.05.0
32+
- gcsfs=2021.07.0
3333
- jinja2=3.0.0
3434
- lxml=4.6.3
3535
- matplotlib=3.3.2
@@ -45,7 +45,7 @@ dependencies:
4545
- pytables=3.6.1
4646
- python-snappy=0.6.0
4747
- pyxlsb=1.0.8
48-
- s3fs=2021.05.0
48+
- s3fs=2021.08.0
4949
- scipy=1.7.1
5050
- sqlalchemy=1.4.16
5151
- tabulate=0.8.9

ci/deps/actions-38.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ dependencies:
1919
- pytz
2020

2121
# optional dependencies
22-
- aiobotocore<2.0.0
2322
- beautifulsoup4
2423
- blosc
2524
- bottleneck
@@ -44,7 +43,7 @@ dependencies:
4443
- pytables
4544
- python-snappy
4645
- pyxlsb
47-
- s3fs>=2021.05.0
46+
- s3fs>=2021.08.0
4847
- scipy
4948
- sqlalchemy
5049
- tabulate

ci/deps/actions-39.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ dependencies:
1919
- pytz
2020

2121
# optional dependencies
22-
- aiobotocore<2.0.0
2322
- beautifulsoup4
2423
- blosc
2524
- bottleneck
@@ -44,7 +43,7 @@ dependencies:
4443
- pytables
4544
- python-snappy
4645
- pyxlsb
47-
- s3fs>=2021.05.0
46+
- s3fs>=2021.08.0
4847
- scipy
4948
- sqlalchemy
5049
- tabulate

ci/deps/circle-38-arm64.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ dependencies:
1919
- pytz
2020

2121
# optional dependencies
22-
- aiobotocore<2.0.0
2322
- beautifulsoup4
2423
- blosc
2524
- bottleneck
@@ -45,7 +44,7 @@ dependencies:
4544
- pytables
4645
- python-snappy
4746
- pyxlsb
48-
- s3fs>=2021.05.0
47+
- s3fs>=2021.08.0
4948
- scipy
5049
- sqlalchemy
5150
- tabulate

doc/source/development/contributing.rst

Lines changed: 19 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -194,30 +194,10 @@ Doing 'git status' again should give something like::
194194
# modified: /relative/path/to/file-you-added.py
195195
#
196196

197-
Finally, commit your changes to your local repository with an explanatory message. pandas
198-
uses a convention for commit message prefixes and layout. Here are
199-
some common prefixes along with general guidelines for when to use them:
197+
Finally, commit your changes to your local repository with an explanatory commit
198+
message::
200199

201-
* ENH: Enhancement, new functionality
202-
* BUG: Bug fix
203-
* DOC: Additions/updates to documentation
204-
* TST: Additions/updates to tests
205-
* BLD: Updates to the build process/scripts
206-
* PERF: Performance improvement
207-
* TYP: Type annotations
208-
* CLN: Code cleanup
209-
210-
The following defines how a commit message should be structured. Please reference the
211-
relevant GitHub issues in your commit message using GH1234 or #1234. Either style
212-
is fine, but the former is generally preferred:
213-
214-
* a subject line with ``< 80`` chars.
215-
* One blank line.
216-
* Optionally, a commit message body.
217-
218-
Now you can commit your changes in your local repository::
219-
220-
git commit -m
200+
git commit -m "your commit message goes here"
221201

222202
.. _contributing.push-code:
223203

@@ -262,16 +242,28 @@ double check your branch changes against the branch it was based on:
262242
Finally, make the pull request
263243
------------------------------
264244

265-
If everything looks good, you are ready to make a pull request. A pull request is how
245+
If everything looks good, you are ready to make a pull request. A pull request is how
266246
code from a local repository becomes available to the GitHub community and can be looked
267-
at and eventually merged into the main version. This pull request and its associated
247+
at and eventually merged into the main version. This pull request and its associated
268248
changes will eventually be committed to the main branch and available in the next
269-
release. To submit a pull request:
249+
release. To submit a pull request:
270250

271251
#. Navigate to your repository on GitHub
272-
#. Click on the ``Pull Request`` button
252+
#. Click on the ``Compare & pull request`` button
273253
#. You can then click on ``Commits`` and ``Files Changed`` to make sure everything looks
274254
okay one last time
255+
#. Write a descriptive title that includes prefixes. pandas uses a convention for title
256+
prefixes. Here are some common ones along with general guidelines for when to use them:
257+
258+
* ENH: Enhancement, new functionality
259+
* BUG: Bug fix
260+
* DOC: Additions/updates to documentation
261+
* TST: Additions/updates to tests
262+
* BLD: Updates to the build process/scripts
263+
* PERF: Performance improvement
264+
* TYP: Type annotations
265+
* CLN: Code cleanup
266+
275267
#. Write a description of your changes in the ``Preview Discussion`` tab
276268
#. Click ``Send Pull Request``.
277269

doc/source/development/contributing_codebase.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,11 @@ If you want to run checks on all recently committed files on upstream/main you c
7575

7676
without needing to have done ``pre-commit install`` beforehand.
7777

78+
.. note::
79+
80+
You may want to periodically run ``pre-commit gc``, to clean up repos
81+
which are no longer used.
82+
7883
.. note::
7984

8085
If you have conflicting installations of ``virtualenv``, then you may get an

0 commit comments

Comments
 (0)