Skip to content

Commit 55ca1f1

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into perf-fillna
2 parents f60d43e + 9f6a91a commit 55ca1f1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+557
-175
lines changed

.github/workflows/sdist.yml

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
name: sdist
2+
3+
on:
4+
push:
5+
branches:
6+
- master
7+
pull_request:
8+
branches:
9+
- master
10+
- 1.2.x
11+
- 1.3.x
12+
paths-ignore:
13+
- "doc/**"
14+
15+
jobs:
16+
build:
17+
runs-on: ubuntu-latest
18+
timeout-minutes: 60
19+
defaults:
20+
run:
21+
shell: bash -l {0}
22+
23+
strategy:
24+
fail-fast: false
25+
matrix:
26+
python-version: ["3.7", "3.8", "3.9"]
27+
28+
steps:
29+
- uses: actions/checkout@v2
30+
with:
31+
fetch-depth: 0
32+
33+
- name: Set up Python
34+
uses: actions/setup-python@v2
35+
with:
36+
python-version: ${{ matrix.python-version }}
37+
38+
- name: Install dependencies
39+
run: |
40+
python -m pip install --upgrade pip setuptools wheel
41+
42+
# GH 39416
43+
pip install numpy
44+
45+
- name: Build pandas sdist
46+
run: |
47+
pip list
48+
python setup.py sdist --formats=gztar
49+
50+
- uses: conda-incubator/setup-miniconda@v2
51+
with:
52+
activate-environment: pandas-sdist
53+
python-version: ${{ matrix.python-version }}
54+
55+
- name: Install pandas from sdist
56+
run: |
57+
conda list
58+
python -m pip install dist/*.gz
59+
60+
- name: Import pandas
61+
run: |
62+
cd ..
63+
conda list
64+
python -c "import pandas; pandas.show_versions();"

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@ repos:
99
- id: absolufy-imports
1010
files: ^pandas/
1111
- repo: https://github.com/python/black
12-
rev: 21.5b2
12+
rev: 21.6b0
1313
hooks:
1414
- id: black
1515
- repo: https://github.com/codespell-project/codespell
16-
rev: v2.0.0
16+
rev: v2.1.0
1717
hooks:
1818
- id: codespell
1919
types_or: [python, rst, markdown]
@@ -53,16 +53,16 @@ repos:
5353
types: [text]
5454
args: [--append-config=flake8/cython-template.cfg]
5555
- repo: https://github.com/PyCQA/isort
56-
rev: 5.8.0
56+
rev: 5.9.0
5757
hooks:
5858
- id: isort
5959
- repo: https://github.com/asottile/pyupgrade
60-
rev: v2.18.3
60+
rev: v2.19.4
6161
hooks:
6262
- id: pyupgrade
6363
args: [--py37-plus]
6464
- repo: https://github.com/pre-commit/pygrep-hooks
65-
rev: v1.8.0
65+
rev: v1.9.0
6666
hooks:
6767
- id: rst-backticks
6868
- id: rst-directive-colons

asv_bench/benchmarks/algos/isin.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -325,3 +325,13 @@ def setup(self, dtype, series_type):
325325

326326
def time_isin(self, dtypes, series_type):
327327
self.series.isin(self.values)
328+
329+
330+
class IsInWithLongTupples:
331+
def setup(self):
332+
t = tuple(range(1000))
333+
self.series = Series([t] * 1000)
334+
self.values = [t]
335+
336+
def time_isin(self):
337+
self.series.isin(self.values)

doc/source/user_guide/indexing.rst

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1523,18 +1523,17 @@ Looking up values by index/column labels
15231523
----------------------------------------
15241524

15251525
Sometimes you want to extract a set of values given a sequence of row labels
1526-
and column labels, this can be achieved by ``DataFrame.melt`` combined by filtering the corresponding
1527-
rows with ``DataFrame.loc``. For instance:
1526+
and column labels, this can be achieved by ``pandas.factorize`` and NumPy indexing.
1527+
For instance:
15281528

15291529
.. ipython:: python
15301530
15311531
df = pd.DataFrame({'col': ["A", "A", "B", "B"],
15321532
'A': [80, 23, np.nan, 22],
15331533
'B': [80, 55, 76, 67]})
15341534
df
1535-
melt = df.melt('col')
1536-
melt = melt.loc[melt['col'] == melt['variable'], 'value']
1537-
melt.reset_index(drop=True)
1535+
idx, cols = pd.factorize(df['col'])
1536+
df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
15381537
15391538
Formerly this could be achieved with the dedicated ``DataFrame.lookup`` method
15401539
which was deprecated in version 1.2.0.

doc/source/whatsnew/v1.2.5.rst

Lines changed: 7 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_125:
22

3-
What's new in 1.2.5 (May ??, 2021)
4-
----------------------------------
3+
What's new in 1.2.5 (June 22, 2021)
4+
-----------------------------------
55

66
These are the changes in pandas 1.2.5. See :ref:`release` for a full changelog
77
including other versions of pandas.
@@ -14,32 +14,12 @@ including other versions of pandas.
1414

1515
Fixed regressions
1616
~~~~~~~~~~~~~~~~~
17-
- Regression in :func:`concat` between two :class:`DataFrames` where one has an :class:`Index` that is all-None and the other is :class:`DatetimeIndex` incorrectly raising (:issue:`40841`)
17+
- Fixed regression in :func:`concat` between two :class:`DataFrame` where one has an :class:`Index` that is all-None and the other is :class:`DatetimeIndex` incorrectly raising (:issue:`40841`)
1818
- Fixed regression in :meth:`DataFrame.sum` and :meth:`DataFrame.prod` when ``min_count`` and ``numeric_only`` are both given (:issue:`41074`)
19-
- Regression in :func:`read_csv` when using ``memory_map=True`` with an non-UTF8 encoding (:issue:`40986`)
20-
- Regression in :meth:`DataFrame.replace` and :meth:`Series.replace` when the values to replace is a NumPy float array (:issue:`40371`)
21-
- Regression in :func:`ExcelFile` when a corrupt file is opened but not closed (:issue:`41778`)
22-
23-
.. ---------------------------------------------------------------------------
24-
25-
26-
.. _whatsnew_125.bug_fixes:
27-
28-
Bug fixes
29-
~~~~~~~~~
30-
31-
-
32-
-
33-
34-
.. ---------------------------------------------------------------------------
35-
36-
.. _whatsnew_125.other:
37-
38-
Other
39-
~~~~~
40-
41-
-
42-
-
19+
- Fixed regression in :func:`read_csv` when using ``memory_map=True`` with an non-UTF8 encoding (:issue:`40986`)
20+
- Fixed regression in :meth:`DataFrame.replace` and :meth:`Series.replace` when the values to replace is a NumPy float array (:issue:`40371`)
21+
- Fixed regression in :func:`ExcelFile` when a corrupt file is opened but not closed (:issue:`41778`)
22+
- Fixed regression in :meth:`DataFrame.astype` with ``dtype=str`` failing to convert ``NaN`` in categorical columns (:issue:`41797`)
4323

4424
.. ---------------------------------------------------------------------------
4525

doc/source/whatsnew/v1.3.0.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -269,12 +269,14 @@ Other enhancements
269269
- :meth:`read_csv` and :meth:`read_json` expose the argument ``encoding_errors`` to control how encoding errors are handled (:issue:`39450`)
270270
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` use Kleene logic with nullable data types (:issue:`37506`)
271271
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` return a ``BooleanDtype`` for columns with nullable data types (:issue:`33449`)
272+
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising with ``object`` data containing ``pd.NA`` even when ``skipna=True`` (:issue:`37501`)
272273
- :meth:`.GroupBy.rank` now supports object-dtype data (:issue:`38278`)
273274
- Constructing a :class:`DataFrame` or :class:`Series` with the ``data`` argument being a Python iterable that is *not* a NumPy ``ndarray`` consisting of NumPy scalars will now result in a dtype with a precision the maximum of the NumPy scalars; this was already the case when ``data`` is a NumPy ``ndarray`` (:issue:`40908`)
274275
- Add keyword ``sort`` to :func:`pivot_table` to allow non-sorting of the result (:issue:`39143`)
275276
- Add keyword ``dropna`` to :meth:`DataFrame.value_counts` to allow counting rows that include ``NA`` values (:issue:`41325`)
276277
- :meth:`Series.replace` will now cast results to ``PeriodDtype`` where possible instead of ``object`` dtype (:issue:`41526`)
277278
- Improved error message in ``corr`` and ``cov`` methods on :class:`.Rolling`, :class:`.Expanding`, and :class:`.ExponentialMovingWindow` when ``other`` is not a :class:`DataFrame` or :class:`Series` (:issue:`41741`)
279+
- :meth:`DataFrame.explode` now supports exploding multiple columns. Its ``column`` argument now also accepts a list of str or tuples for exploding on multiple columns at the same time (:issue:`39240`)
278280

279281
.. ---------------------------------------------------------------------------
280282
@@ -914,6 +916,7 @@ Datetimelike
914916
- Bug in constructing a :class:`DataFrame` or :class:`Series` with mismatched ``datetime64`` data and ``timedelta64`` dtype, or vice-versa, failing to raise a ``TypeError`` (:issue:`38575`, :issue:`38764`, :issue:`38792`)
915917
- Bug in constructing a :class:`Series` or :class:`DataFrame` with a ``datetime`` object out of bounds for ``datetime64[ns]`` dtype or a ``timedelta`` object out of bounds for ``timedelta64[ns]`` dtype (:issue:`38792`, :issue:`38965`)
916918
- Bug in :meth:`DatetimeIndex.intersection`, :meth:`DatetimeIndex.symmetric_difference`, :meth:`PeriodIndex.intersection`, :meth:`PeriodIndex.symmetric_difference` always returning object-dtype when operating with :class:`CategoricalIndex` (:issue:`38741`)
919+
- Bug in :meth:`DatetimeIndex.intersection` giving incorrect results with non-Tick frequencies with ``n != 1`` (:issue:`42104`)
917920
- Bug in :meth:`Series.where` incorrectly casting ``datetime64`` values to ``int64`` (:issue:`37682`)
918921
- Bug in :class:`Categorical` incorrectly typecasting ``datetime`` object to ``Timestamp`` (:issue:`38878`)
919922
- Bug in comparisons between :class:`Timestamp` object and ``datetime64`` objects just outside the implementation bounds for nanosecond ``datetime64`` (:issue:`39221`)

doc/source/whatsnew/v1.4.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ Other API changes
9696

9797
Deprecations
9898
~~~~~~~~~~~~
99-
-
99+
- Deprecated :meth:`Index.is_type_compatible` (:issue:`42113`)
100100
-
101101

102102
.. ---------------------------------------------------------------------------

pandas/_libs/hashtable.pyi

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,3 +228,5 @@ def ismember(
228228
arr: np.ndarray,
229229
values: np.ndarray,
230230
) -> np.ndarray: ... # np.ndarray[bool]
231+
def object_hash(obj) -> int: ...
232+
def objects_are_equal(a, b) -> bool: ...

pandas/_libs/hashtable.pyx

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ from pandas._libs.khash cimport (
3434
are_equivalent_khcomplex64_t,
3535
are_equivalent_khcomplex128_t,
3636
kh_needed_n_buckets,
37+
kh_python_hash_equal,
38+
kh_python_hash_func,
3739
kh_str_t,
3840
khcomplex64_t,
3941
khcomplex128_t,
@@ -46,6 +48,14 @@ def get_hashtable_trace_domain():
4648
return KHASH_TRACE_DOMAIN
4749

4850

51+
def object_hash(obj):
52+
return kh_python_hash_func(obj)
53+
54+
55+
def objects_are_equal(a, b):
56+
return kh_python_hash_equal(a, b)
57+
58+
4959
cdef int64_t NPY_NAT = util.get_nat()
5060
SIZE_HINT_LIMIT = (1 << 20) + 7
5161

pandas/_libs/khash.pxd

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@ cdef extern from "khash_python.h":
4141
bint are_equivalent_float32_t \
4242
"kh_floats_hash_equal" (float32_t a, float32_t b) nogil
4343

44+
uint32_t kh_python_hash_func(object key)
45+
bint kh_python_hash_equal(object a, object b)
46+
4447
ctypedef struct kh_pymap_t:
4548
khuint_t n_buckets, size, n_occupied, upper_bound
4649
uint32_t *flags

pandas/_libs/src/klib/khash_python.h

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,9 @@ int PANDAS_INLINE tupleobject_cmp(PyTupleObject* a, PyTupleObject* b){
226226

227227

228228
int PANDAS_INLINE pyobject_cmp(PyObject* a, PyObject* b) {
229+
if (a == b) {
230+
return 1;
231+
}
229232
if (Py_TYPE(a) == Py_TYPE(b)) {
230233
// special handling for some built-in types which could have NaNs
231234
// as we would like to have them equivalent, but the usual
@@ -284,7 +287,7 @@ Py_hash_t PANDAS_INLINE complexobject_hash(PyComplexObject* key) {
284287
}
285288

286289

287-
khint32_t PANDAS_INLINE kh_python_hash_func(PyObject* key);
290+
khuint32_t PANDAS_INLINE kh_python_hash_func(PyObject* key);
288291

289292
//we could use any hashing algorithm, this is the original CPython's for tuples
290293

@@ -325,7 +328,7 @@ Py_hash_t PANDAS_INLINE tupleobject_hash(PyTupleObject* key) {
325328
}
326329

327330

328-
khint32_t PANDAS_INLINE kh_python_hash_func(PyObject* key) {
331+
khuint32_t PANDAS_INLINE kh_python_hash_func(PyObject* key) {
329332
Py_hash_t hash;
330333
// For PyObject_Hash holds:
331334
// hash(0.0) == 0 == hash(-0.0)

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,13 @@ cdef inline object create_timestamp_from_ts(int64_t value,
129129
return ts_base
130130

131131

132+
def _unpickle_timestamp(value, freq, tz):
133+
# GH#41949 dont warn on unpickle if we have a freq
134+
ts = Timestamp(value, tz=tz)
135+
ts._set_freq(freq)
136+
return ts
137+
138+
132139
# ----------------------------------------------------------------------
133140

134141
def integer_op_not_supported(obj):
@@ -725,7 +732,7 @@ cdef class _Timestamp(ABCTimestamp):
725732

726733
def __reduce__(self):
727734
object_state = self.value, self._freq, self.tzinfo
728-
return (Timestamp, object_state)
735+
return (_unpickle_timestamp, object_state)
729736

730737
# -----------------------------------------------------------------
731738
# Rendering Methods

pandas/_typing.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@
122122
JSONSerializable = Optional[Union[PythonScalar, List, Dict]]
123123
Frequency = Union[str, "DateOffset"]
124124
Axes = Collection[Any]
125+
RandomState = Union[int, ArrayLike, np.random.Generator, np.random.RandomState]
125126

126127
# dtypes
127128
NpDtype = Union[str, np.dtype]

pandas/core/algorithms.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,11 @@ def _ensure_data(values: ArrayLike) -> tuple[np.ndarray, DtypeObj]:
140140
return np.asarray(values).view("uint8"), values.dtype
141141
else:
142142
# i.e. all-bool Categorical, BooleanArray
143-
return np.asarray(values).astype("uint8", copy=False), values.dtype
143+
try:
144+
return np.asarray(values).astype("uint8", copy=False), values.dtype
145+
except TypeError:
146+
# GH#42107 we have pd.NAs present
147+
return np.asarray(values), values.dtype
144148

145149
elif is_integer_dtype(values.dtype):
146150
return np.asarray(values), values.dtype

pandas/core/arrays/categorical.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
NaT,
2727
algos as libalgos,
2828
hashtable as htable,
29+
lib,
2930
)
3031
from pandas._libs.arrays import NDArrayBacked
3132
from pandas._libs.lib import no_default
@@ -523,14 +524,17 @@ def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
523524
try:
524525
new_cats = np.asarray(self.categories)
525526
new_cats = new_cats.astype(dtype=dtype, copy=copy)
527+
fill_value = lib.item_from_zerodim(np.array(np.nan).astype(dtype))
526528
except (
527529
TypeError, # downstream error msg for CategoricalIndex is misleading
528530
ValueError,
529531
):
530532
msg = f"Cannot cast {self.categories.dtype} dtype to {dtype}"
531533
raise ValueError(msg)
532534

533-
result = take_nd(new_cats, ensure_platform_int(self._codes))
535+
result = take_nd(
536+
new_cats, ensure_platform_int(self._codes), fill_value=fill_value
537+
)
534538

535539
return result
536540

pandas/core/arrays/sparse/array.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1448,7 +1448,7 @@ def __array_ufunc__(self, ufunc: np.ufunc, method: str, *inputs, **kwargs):
14481448
sp_values, self.sp_index, SparseDtype(sp_values.dtype, fill_value)
14491449
)
14501450

1451-
result = getattr(ufunc, method)(*[np.asarray(x) for x in inputs], **kwargs)
1451+
result = getattr(ufunc, method)(*(np.asarray(x) for x in inputs), **kwargs)
14521452
if out:
14531453
if len(out) == 1:
14541454
out = out[0]

pandas/core/arrays/sparse/scipy_sparse.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ def _get_label_to_i_dict(labels, sort_labels=False):
5858
return {k: i for i, k in enumerate(labels)}
5959

6060
def _get_index_subset_to_coord_dict(index, subset, sort_labels=False):
61-
ilabels = list(zip(*[index._get_level_values(i) for i in subset]))
61+
ilabels = list(zip(*(index._get_level_values(i) for i in subset)))
6262
labels_to_i = _get_label_to_i_dict(ilabels, sort_labels=sort_labels)
6363
labels_to_i = Series(labels_to_i)
6464
if len(subset) > 1:

0 commit comments

Comments
 (0)