-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Behavior with fallback between raise and coerce #46071 #47745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
0d320e9
df08fd1
7c69de8
2ab558a
78e6bc7
3da1585
b53d897
ef1f736
8867090
7347c78
3b08c71
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,11 +3,13 @@ | |
""" | ||
import operator | ||
|
||
from dateutil.parser._parser import ParserError | ||
import numpy as np | ||
import pytest | ||
|
||
from pandas._libs.tslibs import tz_compare | ||
from pandas._libs.tslibs.dtypes import NpyDatetimeUnit | ||
from pandas.errors import OutOfBoundsDatetime | ||
|
||
from pandas.core.dtypes.dtypes import DatetimeTZDtype | ||
|
||
|
@@ -639,3 +641,40 @@ def test_tz_localize_t2d(self): | |
|
||
roundtrip = expected.tz_localize("US/Pacific") | ||
tm.assert_datetime_array_equal(roundtrip, dta) | ||
|
||
@pytest.mark.parametrize( | ||
"error", | ||
["coerce", "raise"], | ||
) | ||
def test_coerce_fallback(self, error): | ||
# GH#46071 | ||
datapythonista marked this conversation as resolved.
Show resolved
Hide resolved
|
||
s = pd.Series(["6/30/2025", "1 27 2024"]) | ||
expected = pd.Series( | ||
[pd.Timestamp("2025-06-30 00:00:00"), pd.Timestamp("2024-01-27 00:00:00")] | ||
) | ||
|
||
result = pd.to_datetime(s, errors=error, infer_datetime_format=True) | ||
|
||
if error == "coerce": | ||
assert result[1] is not pd.NaT | ||
|
||
tm.assert_series_equal(expected, result) | ||
|
||
expected2 = pd.Series([pd.Timestamp("2000-01-01 00:00:00"), pd.NaT]) | ||
|
||
es1 = pd.Series(["1/1/2000", "7/12/1200"]) | ||
es2 = pd.Series(["1/1/2000", "Hello"]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe instead of |
||
|
||
if error == "coerce": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the fixes @srotondo. This test is a bit complex and not very easy to read. I think we could make things more readable if we have if for example split this in different tests. Something like:
What do you think? Then each test would be quite easy to read. Or maybe by parametrizing the input and the two outputs (coerce and raise). This will avoid a bit of unrepeated code and be quite concise, but probably not so readable. See what makes sense to you, but would be good to make things as simple as possible, as the pandas codebase is already huge and too complex. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @datapythonista thank you for your feedback on the tests. Because the cases are all closely related, I think it will be easier to understand as one test, but I did simplify some of the test and added more comments to clarify parts of the test. |
||
eres1 = pd.to_datetime(es1, errors=error, infer_datetime_format=True) | ||
eres2 = pd.to_datetime(es2, errors=error, infer_datetime_format=True) | ||
tm.assert_series_equal(expected2, eres1) | ||
tm.assert_series_equal(expected2, eres2) | ||
else: | ||
with pytest.raises( | ||
OutOfBoundsDatetime, match="Out of bounds nanosecond timestamp" | ||
): | ||
pd.to_datetime(es1, errors=error, infer_datetime_format=True) | ||
|
||
with pytest.raises(ParserError, match="Unknown string format: Hello"): | ||
pd.to_datetime(es2, errors=error, infer_datetime_format=True) |
Uh oh!
There was an error while loading. Please reload this page.