Skip to content

Commit 138c787

Browse files
authored
Merge branch 'pandas-dev:main' into main
2 parents c166b8d + f759c33 commit 138c787

32 files changed

+302
-231
lines changed

.github/workflows/scorecards.yml

Lines changed: 0 additions & 54 deletions
This file was deleted.

README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@
1111
[![Package Status](https://img.shields.io/pypi/status/pandas.svg)](https://pypi.org/project/pandas/)
1212
[![License](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/pandas-dev/pandas/blob/main/LICENSE)
1313
[![Coverage](https://codecov.io/github/pandas-dev/pandas/coverage.svg?branch=main)](https://codecov.io/gh/pandas-dev/pandas)
14-
[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/pandas-dev/pandas/badge)](https://api.securityscorecards.dev/projects/github.com/pandas-dev/pandas)
1514
[![Downloads](https://static.pepy.tech/personalized-badge/pandas?period=month&units=international_system&left_color=black&right_color=orange&left_text=PyPI%20downloads%20per%20month)](https://pepy.tech/project/pandas)
1615
[![Slack](https://img.shields.io/badge/join_Slack-information-brightgreen.svg?logo=slack)](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack)
1716
[![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://numfocus.org)

doc/source/user_guide/io.rst

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5790,21 +5790,6 @@ Specifying this will return an iterator through chunks of the query result:
57905790
for chunk in pd.read_sql_query("SELECT * FROM data_chunks", engine, chunksize=5):
57915791
print(chunk)
57925792
5793-
You can also run a plain query without creating a ``DataFrame`` with
5794-
:func:`~pandas.io.sql.execute`. This is useful for queries that don't return values,
5795-
such as INSERT. This is functionally equivalent to calling ``execute`` on the
5796-
SQLAlchemy engine or db connection object. Again, you must use the SQL syntax
5797-
variant appropriate for your database.
5798-
5799-
.. code-block:: python
5800-
5801-
from pandas.io import sql
5802-
5803-
sql.execute("SELECT * FROM table_name", engine)
5804-
sql.execute(
5805-
"INSERT INTO table_name VALUES(?, ?, ?)", engine, params=[("id", 1, 12.2, True)]
5806-
)
5807-
58085793
58095794
Engine connection examples
58105795
''''''''''''''''''''''''''

doc/source/whatsnew/v1.5.3.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Fixed regressions
1919
- Enforced reversion of ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` in function :meth:`DataFrame.plot.scatter` (:issue:`49732`)
2020
- Fixed regression in :meth:`SeriesGroupBy.apply` setting a ``name`` attribute on the result if the result was a :class:`DataFrame` (:issue:`49907`)
2121
- Fixed performance regression in setting with the :meth:`~DataFrame.at` indexer (:issue:`49771`)
22+
- Fixed regression in :func:`to_datetime` raising ``ValueError`` when parsing array of ``float`` containing ``np.nan`` (:issue:`50237`)
2223
-
2324

2425
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v2.0.0.rst

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -30,25 +30,31 @@ sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (
3030

3131
.. _whatsnew_200.enhancements.io_use_nullable_dtypes_and_nullable_backend:
3232

33-
Configuration option, ``io.nullable_backend``, to return pyarrow-backed dtypes from IO functions
34-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
33+
Configuration option, ``mode.nullable_backend``, to return pyarrow-backed dtypes
34+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3535

3636
The ``use_nullable_dtypes`` keyword argument has been expanded to the following functions to enable automatic conversion to nullable dtypes (:issue:`36712`)
3737

3838
* :func:`read_csv`
3939
* :func:`read_excel`
4040
* :func:`read_sql`
4141

42-
Additionally a new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in the following functions
42+
Additionally a new global configuration, ``mode.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in the following functions
4343
to select the nullable dtypes implementation.
4444

4545
* :func:`read_csv` (with ``engine="pyarrow"``)
4646
* :func:`read_excel`
4747
* :func:`read_parquet`
4848
* :func:`read_orc`
4949

50-
By default, ``io.nullable_backend`` is set to ``"pandas"`` to return existing, numpy-backed nullable dtypes, but it can also
51-
be set to ``"pyarrow"`` to return pyarrow-backed, nullable :class:`ArrowDtype` (:issue:`48957`).
50+
51+
And the following methods will also utilize the ``mode.nullable_backend`` option.
52+
53+
* :meth:`DataFrame.convert_dtypes`
54+
* :meth:`Series.convert_dtypes`
55+
56+
By default, ``mode.nullable_backend`` is set to ``"pandas"`` to return existing, numpy-backed nullable dtypes, but it can also
57+
be set to ``"pyarrow"`` to return pyarrow-backed, nullable :class:`ArrowDtype` (:issue:`48957`, :issue:`49997`).
5258

5359
.. ipython:: python
5460
@@ -57,12 +63,12 @@ be set to ``"pyarrow"`` to return pyarrow-backed, nullable :class:`ArrowDtype` (
5763
1,2.5,True,a,,,,,
5864
3,4.5,False,b,6,7.5,True,a,
5965
""")
60-
with pd.option_context("io.nullable_backend", "pandas"):
66+
with pd.option_context("mode.nullable_backend", "pandas"):
6167
df = pd.read_csv(data, use_nullable_dtypes=True)
6268
df.dtypes
6369
6470
data.seek(0)
65-
with pd.option_context("io.nullable_backend", "pyarrow"):
71+
with pd.option_context("mode.nullable_backend", "pyarrow"):
6672
df_pyarrow = pd.read_csv(data, use_nullable_dtypes=True, engine="pyarrow")
6773
df_pyarrow.dtypes
6874
@@ -470,6 +476,7 @@ Other API changes
470476
- :func:`read_stata` with parameter ``index_col`` set to ``None`` (the default) will now set the index on the returned :class:`DataFrame` to a :class:`RangeIndex` instead of a :class:`Int64Index` (:issue:`49745`)
471477
- Changed behavior of :class:`Index`, :class:`Series`, and :class:`DataFrame` arithmetic methods when working with object-dtypes, the results no longer do type inference on the result of the array operations, use ``result.infer_objects()`` to do type inference on the result (:issue:`49999`)
472478
- Changed behavior of :class:`Index` constructor with an object-dtype ``numpy.ndarray`` containing all-``bool`` values or all-complex values, this will now retain object dtype, consistent with the :class:`Series` behavior (:issue:`49594`)
479+
- Changed behavior of :class:`Series` and :class:`DataFrame` constructors when given an integer dtype and floating-point data that is not round numbers, this now raises ``ValueError`` instead of silently retaining the float dtype; do ``Series(data)`` or ``DataFrame(data)`` to get the old behavior, and ``Series(data).astype(dtype)`` or ``DataFrame(data).astype(dtype)`` to get the specified dtype (:issue:`49599`)
473480
- Changed behavior of :meth:`DataFrame.shift` with ``axis=1``, an integer ``fill_value``, and homogeneous datetime-like dtype, this now fills new columns with integer dtypes instead of casting to datetimelike (:issue:`49842`)
474481
- Files are now closed when encountering an exception in :func:`read_json` (:issue:`49921`)
475482
- Changed behavior of :func:`read_csv`, :func:`read_json` & :func:`read_fwf`, where the index will now always be a :class:`RangeIndex`, when no index is specified. Previously the index would be a :class:`Index` with dtype ``object`` if the new DataFrame/Series has length 0 (:issue:`49572`)
@@ -775,6 +782,7 @@ Datetimelike
775782
- Bug in ``pandas.tseries.holiday.Holiday`` where a half-open date interval causes inconsistent return types from :meth:`USFederalHolidayCalendar.holidays` (:issue:`49075`)
776783
- Bug in rendering :class:`DatetimeIndex` and :class:`Series` and :class:`DataFrame` with timezone-aware dtypes with ``dateutil`` or ``zoneinfo`` timezones near daylight-savings transitions (:issue:`49684`)
777784
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing :class:`Timestamp`, ``datetime.datetime``, ``datetime.date``, or ``np.datetime64`` objects when non-ISO8601 ``format`` was passed (:issue:`49298`, :issue:`50036`)
785+
- Bug in :class:`Timestamp` was showing ``UserWarning``, which was not actionable by users, when parsing non-ISO8601 delimited date strings (:issue:`50232`)
778786
-
779787

780788
Timedelta

pandas/_libs/tslibs/parsing.pyx

Lines changed: 0 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -85,12 +85,6 @@ class DateParseError(ValueError):
8585
_DEFAULT_DATETIME = datetime(1, 1, 1).replace(hour=0, minute=0,
8686
second=0, microsecond=0)
8787

88-
PARSING_WARNING_MSG = (
89-
"Parsing dates in {format} format when dayfirst={dayfirst} was specified. "
90-
"This may lead to inconsistently parsed dates! Specify a format "
91-
"to ensure consistent parsing."
92-
)
93-
9488
cdef:
9589
set _not_datelike_strings = {"a", "A", "m", "M", "p", "P", "t", "T"}
9690

@@ -203,28 +197,10 @@ cdef object _parse_delimited_date(str date_string, bint dayfirst):
203197
# date_string can't be converted to date, above format
204198
return None, None
205199

206-
swapped_day_and_month = False
207200
if 1 <= month <= MAX_DAYS_IN_MONTH and 1 <= day <= MAX_DAYS_IN_MONTH \
208201
and (month <= MAX_MONTH or day <= MAX_MONTH):
209202
if (month > MAX_MONTH or (day <= MAX_MONTH and dayfirst)) and can_swap:
210203
day, month = month, day
211-
swapped_day_and_month = True
212-
if dayfirst and not swapped_day_and_month:
213-
warnings.warn(
214-
PARSING_WARNING_MSG.format(
215-
format="MM/DD/YYYY",
216-
dayfirst="True",
217-
),
218-
stacklevel=find_stack_level(),
219-
)
220-
elif not dayfirst and swapped_day_and_month:
221-
warnings.warn(
222-
PARSING_WARNING_MSG.format(
223-
format="DD/MM/YYYY",
224-
dayfirst="False (the default)",
225-
),
226-
stacklevel=find_stack_level(),
227-
)
228204
# In Python <= 3.6.0 there is no range checking for invalid dates
229205
# in C api, thus we call faster C version for 3.6.1 or newer
230206
return datetime_new(year, month, day, 0, 0, 0, 0, None), reso

pandas/_libs/tslibs/strptime.pyx

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,11 @@ from pandas._libs.tslibs.np_datetime cimport (
4242
pydatetime_to_dt64,
4343
)
4444
from pandas._libs.tslibs.timestamps cimport _Timestamp
45-
from pandas._libs.util cimport is_datetime64_object
45+
from pandas._libs.util cimport (
46+
is_datetime64_object,
47+
is_float_object,
48+
is_integer_object,
49+
)
4650

4751
cnp.import_array()
4852

@@ -185,6 +189,12 @@ def array_strptime(
185189
elif is_datetime64_object(val):
186190
iresult[i] = get_datetime64_nanos(val, NPY_FR_ns)
187191
continue
192+
elif (
193+
(is_integer_object(val) or is_float_object(val))
194+
and (val != val or val == NPY_NAT)
195+
):
196+
iresult[i] = NPY_NAT
197+
continue
188198
else:
189199
val = str(val)
190200

pandas/core/config_init.py

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -539,13 +539,25 @@ def use_inf_as_na_cb(key) -> None:
539539
The default storage for StringDtype.
540540
"""
541541

542+
nullable_backend_doc = """
543+
: string
544+
The nullable dtype implementation to return.
545+
Available options: 'pandas', 'pyarrow', the default is 'pandas'.
546+
"""
547+
542548
with cf.config_prefix("mode"):
543549
cf.register_option(
544550
"string_storage",
545551
"python",
546552
string_storage_doc,
547553
validator=is_one_of_factory(["python", "pyarrow"]),
548554
)
555+
cf.register_option(
556+
"nullable_backend",
557+
"pandas",
558+
nullable_backend_doc,
559+
validator=is_one_of_factory(["pandas", "pyarrow"]),
560+
)
549561

550562
# Set up the io.excel specific reader configuration.
551563
reader_engine_doc = """
@@ -673,20 +685,6 @@ def use_inf_as_na_cb(key) -> None:
673685
validator=is_one_of_factory(["auto", "sqlalchemy"]),
674686
)
675687

676-
io_nullable_backend_doc = """
677-
: string
678-
The nullable dtype implementation to return when ``use_nullable_dtypes=True``.
679-
Available options: 'pandas', 'pyarrow', the default is 'pandas'.
680-
"""
681-
682-
with cf.config_prefix("io.nullable_backend"):
683-
cf.register_option(
684-
"io_nullable_backend",
685-
"pandas",
686-
io_nullable_backend_doc,
687-
validator=is_one_of_factory(["pandas", "pyarrow"]),
688-
)
689-
690688
# --------
691689
# Plotting
692690
# ---------

pandas/core/construction.py

Lines changed: 3 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@
2727
DtypeObj,
2828
T,
2929
)
30-
from pandas.errors import IntCastingNaNError
3130

3231
from pandas.core.dtypes.base import (
3332
ExtensionDtype,
@@ -46,7 +45,6 @@
4645
is_datetime64_ns_dtype,
4746
is_dtype_equal,
4847
is_extension_array_dtype,
49-
is_float_dtype,
5048
is_integer_dtype,
5149
is_list_like,
5250
is_object_dtype,
@@ -503,7 +501,6 @@ def sanitize_array(
503501
copy: bool = False,
504502
*,
505503
allow_2d: bool = False,
506-
strict_ints: bool = False,
507504
) -> ArrayLike:
508505
"""
509506
Sanitize input data to an ndarray or ExtensionArray, copy if specified,
@@ -517,8 +514,6 @@ def sanitize_array(
517514
copy : bool, default False
518515
allow_2d : bool, default False
519516
If False, raise if we have a 2D Arraylike.
520-
strict_ints : bool, default False
521-
If False, silently ignore failures to cast float data to int dtype.
522517
523518
Returns
524519
-------
@@ -571,32 +566,7 @@ def sanitize_array(
571566
if isinstance(data, np.matrix):
572567
data = data.A
573568

574-
if dtype is not None and is_float_dtype(data.dtype) and is_integer_dtype(dtype):
575-
# possibility of nan -> garbage
576-
try:
577-
# GH 47391 numpy > 1.24 will raise a RuntimeError for nan -> int
578-
# casting aligning with IntCastingNaNError below
579-
with np.errstate(invalid="ignore"):
580-
# GH#15832: Check if we are requesting a numeric dtype and
581-
# that we can convert the data to the requested dtype.
582-
subarr = maybe_cast_to_integer_array(data, dtype)
583-
584-
except IntCastingNaNError:
585-
raise
586-
except ValueError:
587-
# Pre-2.0, we would have different behavior for Series vs DataFrame.
588-
# DataFrame would call np.array(data, dtype=dtype, copy=copy),
589-
# which would cast to the integer dtype even if the cast is lossy.
590-
# See GH#40110.
591-
if strict_ints:
592-
raise
593-
594-
# We ignore the dtype arg and return floating values,
595-
# e.g. test_constructor_floating_data_int_dtype
596-
# TODO: where is the discussion that documents the reason for this?
597-
subarr = np.array(data, copy=copy)
598-
599-
elif dtype is None:
569+
if dtype is None:
600570
subarr = data
601571
if data.dtype == object:
602572
subarr = maybe_infer_to_datetimelike(data)
@@ -629,27 +599,8 @@ def sanitize_array(
629599
subarr = np.array([], dtype=np.float64)
630600

631601
elif dtype is not None:
632-
try:
633-
subarr = _try_cast(data, dtype, copy)
634-
except ValueError:
635-
if is_integer_dtype(dtype):
636-
if strict_ints:
637-
raise
638-
casted = np.array(data, copy=False)
639-
if casted.dtype.kind == "f":
640-
# GH#40110 match the behavior we have if we passed
641-
# a ndarray[float] to begin with
642-
return sanitize_array(
643-
casted,
644-
index,
645-
dtype,
646-
copy=False,
647-
allow_2d=allow_2d,
648-
)
649-
else:
650-
raise
651-
else:
652-
raise
602+
subarr = _try_cast(data, dtype, copy)
603+
653604
else:
654605
subarr = maybe_convert_platform(data)
655606
if subarr.dtype == object:

0 commit comments

Comments
 (0)