Skip to content

Commit 7fa241b

Browse files
author
auderson
committed
Merge remote-tracking branch 'upstream/main' into roll_var_remove_floating_point_artifacts
2 parents ca1ee1a + 32999a1 commit 7fa241b

37 files changed

+516
-237
lines changed

.pre-commit-config.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,13 @@ repos:
176176
files: ^pandas/core/
177177
exclude: ^pandas/core/api\.py$
178178
types: [python]
179+
- id: use-io-common-urlopen
180+
name: Use pandas.io.common.urlopen instead of urllib.request.urlopen
181+
language: python
182+
entry: python scripts/use_io_common_urlopen.py
183+
files: ^pandas/
184+
exclude: ^pandas/tests/
185+
types: [python]
179186
- id: no-bool-in-core-generic
180187
name: Use bool_t instead of bool in pandas/core/generic.py
181188
entry: python scripts/no_bool_in_generic.py

LICENSES/KLIB_LICENSE

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
The MIT License
2+
3+
Copyright (c) 2008- Attractive Chaos <[email protected]>
4+
5+
Permission is hereby granted, free of charge, to any person obtaining
6+
a copy of this software and associated documentation files (the
7+
"Software"), to deal in the Software without restriction, including
8+
without limitation the rights to use, copy, modify, merge, publish,
9+
distribute, sublicense, and/or sell copies of the Software, and to
10+
permit persons to whom the Software is furnished to do so, subject to
11+
the following conditions:
12+
13+
The above copyright notice and this permission notice shall be
14+
included in all copies or substantial portions of the Software.
15+
16+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
20+
BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
21+
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
22+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
23+
SOFTWARE.

MANIFEST.in

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
include RELEASE.md
2+
include versioneer.py
23

34
graft doc
45
prune doc/build
@@ -54,9 +55,6 @@ global-exclude *.pxi
5455
# exclude the whole directory to avoid running related tests in sdist
5556
prune pandas/tests/io/parser/data
5657

57-
include versioneer.py
58-
include pandas/_version.py
59-
include pandas/io/formats/templates/*.tpl
60-
58+
# Selectively re-add *.cxx files that were excluded above
6159
graft pandas/_libs/src
6260
graft pandas/_libs/tslibs/src

doc/source/development/code_style.rst

Lines changed: 0 additions & 31 deletions
This file was deleted.

doc/source/development/contributing_codebase.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,15 +37,14 @@ In addition to ``./ci/code_checks.sh``, some extra checks are run by
3737
``pre-commit`` - see :ref:`here <contributing.pre-commit>` for how to
3838
run them.
3939

40-
Additional standards are outlined on the :ref:`pandas code style guide <code_style>`.
41-
4240
.. _contributing.pre-commit:
4341

4442
Pre-commit
4543
----------
4644

4745
Additionally, :ref:`Continuous Integration <contributing.ci>` will run code formatting checks
48-
like ``black``, ``flake8``, ``isort``, and ``cpplint`` and more using `pre-commit hooks <https://pre-commit.com/>`_
46+
like ``black``, ``flake8`` (including a `pandas-dev-flaker <https://github.com/pandas-dev/pandas-dev-flaker>`_ plugin),
47+
``isort``, and ``cpplint`` and more using `pre-commit hooks <https://pre-commit.com/>`_
4948
Any warnings from these checks will cause the :ref:`Continuous Integration <contributing.ci>` to fail; therefore,
5049
it is helpful to run the check yourself before submitting code. This
5150
can be done by installing ``pre-commit``::

doc/source/development/index.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@ Development
1616
contributing_environment
1717
contributing_documentation
1818
contributing_codebase
19-
code_style
2019
maintaining
2120
internals
2221
test_writing

doc/source/whatsnew/v0.13.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -664,7 +664,7 @@ Enhancements
664664
other = pd.DataFrame({'A': [1, 3, 3, 7], 'B': ['e', 'f', 'f', 'e']})
665665
mask = dfi.isin(other)
666666
mask
667-
dfi[mask.any(1)]
667+
dfi[mask.any(axis=1)]
668668
669669
- ``Series`` now supports a ``to_frame`` method to convert it to a single-column DataFrame (:issue:`5164`)
670670

doc/source/whatsnew/v1.4.3.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ including other versions of pandas.
1414

1515
Fixed regressions
1616
~~~~~~~~~~~~~~~~~
17+
- Fixed regression in :meth:`DataFrame.nsmallest` led to wrong results when ``np.nan`` in the sorting column (:issue:`46589`)
1718
- Fixed regression in :func:`read_fwf` raising ``ValueError`` when ``widths`` was specified with ``usecols`` (:issue:`46580`)
1819
-
1920

doc/source/whatsnew/v1.5.0.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -431,6 +431,7 @@ Other Deprecations
431431
- Deprecated behavior of method :meth:`DataFrame.quantile`, attribute ``numeric_only`` will default False. Including datetime/timedelta columns in the result (:issue:`7308`).
432432
- Deprecated :attr:`Timedelta.freq` and :attr:`Timedelta.is_populated` (:issue:`46430`)
433433
- Deprecated :attr:`Timedelta.delta` (:issue:`46476`)
434+
- Deprecated passing arguments as positional in :meth:`DataFrame.any` and :meth:`Series.any` (:issue:`44802`)
434435
- Deprecated the ``closed`` argument in :meth:`interval_range` in favor of ``inclusive`` argument; In a future version passing ``closed`` will raise (:issue:`40245`)
435436
- Deprecated the methods :meth:`DataFrame.mad`, :meth:`Series.mad`, and the corresponding groupby methods (:issue:`11787`)
436437

@@ -500,6 +501,7 @@ Conversion
500501
- Bug in :meth:`Series.astype` and :meth:`DataFrame.astype` from floating dtype to unsigned integer dtype failing to raise in the presence of negative values (:issue:`45151`)
501502
- Bug in :func:`array` with ``FloatingDtype`` and values containing float-castable strings incorrectly raising (:issue:`45424`)
502503
- Bug when comparing string and datetime64ns objects causing ``OverflowError`` exception. (:issue:`45506`)
504+
- Bug in metaclass of generic abstract dtypes causing :meth:`DataFrame.apply` and :meth:`Series.apply` to raise for the built-in function ``type`` (:issue:`46684`)
503505

504506
Strings
505507
^^^^^^^
@@ -565,6 +567,7 @@ I/O
565567
- Bug in :func:`read_csv` not respecting a specified converter to index columns in all cases (:issue:`40589`)
566568
- Bug in :func:`read_parquet` when ``engine="pyarrow"`` which caused partial write to disk when column of unsupported datatype was passed (:issue:`44914`)
567569
- Bug in :func:`DataFrame.to_excel` and :class:`ExcelWriter` would raise when writing an empty DataFrame to a ``.ods`` file (:issue:`45793`)
570+
- Bug in :func:`read_html` where elements surrounding ``<br>`` were joined without a space between them (:issue:`29528`)
568571
- Bug in Parquet roundtrip for Interval dtype with ``datetime64[ns]`` subtype (:issue:`45881`)
569572
- Bug in :func:`read_excel` when reading a ``.ods`` file with newlines between xml elements (:issue:`45598`)
570573
- Bug in :func:`read_parquet` when ``engine="fastparquet"`` where the file was not closed on error (:issue:`46555`)

pandas/_libs/algos.pxd

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
1-
from pandas._libs.dtypes cimport numeric_t
1+
from pandas._libs.dtypes cimport (
2+
numeric_object_t,
3+
numeric_t,
4+
)
25

36

47
cdef numeric_t kth_smallest_c(numeric_t* arr, Py_ssize_t k, Py_ssize_t n) nogil
@@ -10,3 +13,10 @@ cdef enum TiebreakEnumType:
1013
TIEBREAK_FIRST
1114
TIEBREAK_FIRST_DESCENDING
1215
TIEBREAK_DENSE
16+
17+
18+
cdef numeric_object_t get_rank_nan_fill_val(
19+
bint rank_nans_highest,
20+
numeric_object_t val,
21+
bint is_datetimelike=*,
22+
)

pandas/_libs/algos.pyx

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -822,13 +822,17 @@ def is_monotonic(ndarray[numeric_object_t, ndim=1] arr, bint timelike):
822822

823823
cdef numeric_object_t get_rank_nan_fill_val(
824824
bint rank_nans_highest,
825-
numeric_object_t[:] _=None
825+
numeric_object_t val,
826+
bint is_datetimelike=False,
826827
):
827828
"""
828829
Return the value we'll use to represent missing values when sorting depending
829830
on if we'd like missing values to end up at the top/bottom. (The second parameter
830831
is unused, but needed for fused type specialization)
831832
"""
833+
if numeric_object_t is int64_t and is_datetimelike and not rank_nans_highest:
834+
return NPY_NAT + 1
835+
832836
if rank_nans_highest:
833837
if numeric_object_t is object:
834838
return Infinity()
@@ -854,6 +858,9 @@ cdef numeric_object_t get_rank_nan_fill_val(
854858
if numeric_object_t is object:
855859
return NegInfinity()
856860
elif numeric_object_t is int64_t:
861+
# Note(jbrockmendel) 2022-03-15 for reasons unknown, using util.INT64_MIN
862+
# instead of NPY_NAT here causes build warnings and failure in
863+
# test_cummax_i8_at_implementation_bound
857864
return NPY_NAT
858865
elif numeric_object_t is int32_t:
859866
return util.INT32_MIN
@@ -975,7 +982,7 @@ def rank_1d(
975982
# will flip the ordering to still end up with lowest rank.
976983
# Symmetric logic applies to `na_option == 'bottom'`
977984
nans_rank_highest = ascending ^ (na_option == 'top')
978-
nan_fill_val = get_rank_nan_fill_val[numeric_object_t](nans_rank_highest)
985+
nan_fill_val = get_rank_nan_fill_val(nans_rank_highest, <numeric_object_t>0)
979986
if nans_rank_highest:
980987
order = [masked_vals, mask]
981988
else:
@@ -1335,7 +1342,7 @@ def rank_2d(
13351342

13361343
nans_rank_highest = ascending ^ (na_option == 'top')
13371344
if check_mask:
1338-
nan_fill_val = get_rank_nan_fill_val[numeric_object_t](nans_rank_highest)
1345+
nan_fill_val = get_rank_nan_fill_val(nans_rank_highest, <numeric_object_t>0)
13391346

13401347
if numeric_object_t is object:
13411348
mask = missing.isnaobj2d(values).view(np.uint8)

pandas/_libs/groupby.pyx

Lines changed: 12 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,10 @@ from numpy.math cimport NAN
3131
cnp.import_array()
3232

3333
from pandas._libs cimport util
34-
from pandas._libs.algos cimport kth_smallest_c
34+
from pandas._libs.algos cimport (
35+
get_rank_nan_fill_val,
36+
kth_smallest_c,
37+
)
3538

3639
from pandas._libs.algos import (
3740
ensure_platform_int,
@@ -989,36 +992,16 @@ cdef inline bint _treat_as_na(numeric_object_t val, bint is_datetimelike) nogil:
989992
return False
990993

991994

992-
cdef numeric_t _get_min_or_max(numeric_t val, bint compute_max, bint is_datetimelike):
995+
cdef numeric_object_t _get_min_or_max(numeric_object_t val, bint compute_max, bint is_datetimelike):
993996
"""
994-
Find either the min or the max supported by numeric_t; 'val' is a placeholder
995-
to effectively make numeric_t an argument.
997+
Find either the min or the max supported by numeric_object_t; 'val' is a
998+
placeholder to effectively make numeric_object_t an argument.
996999
"""
997-
if numeric_t is int64_t:
998-
if compute_max and is_datetimelike:
999-
return -_int64_max
1000-
# Note(jbrockmendel) 2022-03-15 for reasons unknown, using util.INT64_MIN
1001-
# instead of NPY_NAT here causes build warnings and failure in
1002-
# test_cummax_i8_at_implementation_bound
1003-
return NPY_NAT if compute_max else util.INT64_MAX
1004-
elif numeric_t is int32_t:
1005-
return util.INT32_MIN if compute_max else util.INT32_MAX
1006-
elif numeric_t is int16_t:
1007-
return util.INT16_MIN if compute_max else util.INT16_MAX
1008-
elif numeric_t is int8_t:
1009-
return util.INT8_MIN if compute_max else util.INT8_MAX
1010-
1011-
elif numeric_t is uint64_t:
1012-
return 0 if compute_max else util.UINT64_MAX
1013-
elif numeric_t is uint32_t:
1014-
return 0 if compute_max else util.UINT32_MAX
1015-
elif numeric_t is uint16_t:
1016-
return 0 if compute_max else util.UINT16_MAX
1017-
elif numeric_t is uint8_t:
1018-
return 0 if compute_max else util.UINT8_MAX
1019-
1020-
else:
1021-
return -np.inf if compute_max else np.inf
1000+
return get_rank_nan_fill_val(
1001+
not compute_max,
1002+
val=val,
1003+
is_datetimelike=is_datetimelike,
1004+
)
10221005

10231006

10241007
cdef numeric_t _get_na_val(numeric_t val, bint is_datetimelike):

pandas/_libs/tslibs/conversion.pyx

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ import cython
22
import numpy as np
33

44
cimport numpy as cnp
5+
from cpython.object cimport PyObject
56
from numpy cimport (
67
int32_t,
78
int64_t,
@@ -273,7 +274,8 @@ def ensure_timedelta64ns(arr: ndarray, copy: bool = True):
273274

274275
@cython.boundscheck(False)
275276
@cython.wraparound(False)
276-
def datetime_to_datetime64(ndarray[object] values):
277+
def datetime_to_datetime64(ndarray values):
278+
# ndarray[object], but can't declare object without ndim
277279
"""
278280
Convert ndarray of datetime-like objects to int64 array representing
279281
nanosecond timestamps.
@@ -288,20 +290,27 @@ def datetime_to_datetime64(ndarray[object] values):
288290
inferred_tz : tzinfo or None
289291
"""
290292
cdef:
291-
Py_ssize_t i, n = len(values)
293+
Py_ssize_t i, n = values.size
292294
object val
293-
int64_t[:] iresult
295+
int64_t ival
296+
ndarray iresult # int64_t, but can't declare that without specifying ndim
294297
npy_datetimestruct dts
295298
_TSObject _ts
296299
bint found_naive = False
297300
tzinfo inferred_tz = None
298301

299-
result = np.empty(n, dtype='M8[ns]')
302+
cnp.broadcast mi
303+
304+
result = np.empty((<object>values).shape, dtype='M8[ns]')
300305
iresult = result.view('i8')
306+
307+
mi = cnp.PyArray_MultiIterNew2(iresult, values)
301308
for i in range(n):
302-
val = values[i]
309+
# Analogous to: val = values[i]
310+
val = <object>(<PyObject**>cnp.PyArray_MultiIter_DATA(mi, 1))[0]
311+
303312
if checknull_with_nat(val):
304-
iresult[i] = NPY_NAT
313+
ival = NPY_NAT
305314
elif PyDateTime_Check(val):
306315
if val.tzinfo is not None:
307316
if found_naive:
@@ -314,18 +323,23 @@ def datetime_to_datetime64(ndarray[object] values):
314323
inferred_tz = val.tzinfo
315324

316325
_ts = convert_datetime_to_tsobject(val, None)
317-
iresult[i] = _ts.value
326+
ival = _ts.value
318327
check_dts_bounds(&_ts.dts)
319328
else:
320329
found_naive = True
321330
if inferred_tz is not None:
322331
raise ValueError('Cannot mix tz-aware with '
323332
'tz-naive values')
324-
iresult[i] = pydatetime_to_dt64(val, &dts)
333+
ival = pydatetime_to_dt64(val, &dts)
325334
check_dts_bounds(&dts)
326335
else:
327336
raise TypeError(f'Unrecognized value type: {type(val)}')
328337

338+
# Analogous to: iresult[i] = ival
339+
(<int64_t*>cnp.PyArray_MultiIter_DATA(mi, 0))[0] = ival
340+
341+
cnp.PyArray_MultiIter_NEXT(mi)
342+
329343
return result, inferred_tz
330344

331345

pandas/core/algorithms.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1181,7 +1181,6 @@ def compute(self, method: str) -> Series:
11811181
arr = arr[::-1]
11821182

11831183
nbase = n
1184-
findex = len(self.obj)
11851184
narr = len(arr)
11861185
n = min(n, narr)
11871186

@@ -1194,6 +1193,11 @@ def compute(self, method: str) -> Series:
11941193
if self.keep != "all":
11951194
inds = inds[:n]
11961195
findex = nbase
1196+
else:
1197+
if len(inds) < nbase and len(nan_index) + len(inds) >= nbase:
1198+
findex = len(nan_index) + len(inds)
1199+
else:
1200+
findex = len(inds)
11971201

11981202
if self.keep == "last":
11991203
# reverse indices

pandas/core/arrays/datetimes.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2247,10 +2247,9 @@ def objects_to_datetime64ns(
22472247
result = result.reshape(data.shape, order=order)
22482248
except ValueError as err:
22492249
try:
2250-
values, tz_parsed = conversion.datetime_to_datetime64(data.ravel("K"))
2250+
values, tz_parsed = conversion.datetime_to_datetime64(data)
22512251
# If tzaware, these values represent unix timestamps, so we
22522252
# return them as i8 to distinguish from wall times
2253-
values = values.reshape(data.shape, order=order)
22542253
return values.view("i8"), tz_parsed
22552254
except (ValueError, TypeError):
22562255
raise err

0 commit comments

Comments
 (0)