Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import dateutil.tz
import zoneinfo
utc0 = pd.Timestamp('2023-11-05T08:30:00Z')
utc1 = pd.Timestamp('2023-11-05T09:30:00Z')
tz = dateutil.tz.gettz('US/Pacific')
assert pd.Timestamp(year=2023, month=11, day=5, hour=1, minute=30, fold=0, tz=tz) == utc0
assert pd.Timestamp(year=2023, month=11, day=5, hour=1, minute=30, fold=1, tz=tz) == utc1
tz = zoneinfo.ZoneInfo('US/Pacific')
assert pd.Timestamp(year=2023, month=11, day=5, hour=1, minute=30, fold=0, tz=tz) == utc0
assert pd.Timestamp(year=2023, month=11, day=5, hour=1, minute=30, fold=1, tz=tz) == utc1
tz = 'US/Pacific'
assert pd.Timestamp(year=2023, month=11, day=5, hour=1, minute=30, fold=0, tz=tz) == utc0
assert pd.Timestamp(year=2023, month=11, day=5, hour=1, minute=30, fold=1, tz=tz) == utc1
Issue Description
The fold argument to the Timestamp constructor appears to be ignored when tz is provided as a string, but works as expected for the corresponding dateutil.tz or zoneinfo objects.
On the current development branch, I get an AmbiguousTimeError error on the last two asserts
--------------------------------------------------------------------------- AmbiguousTimeError Traceback (most recent call last) Cell In[1], line 17 14 assert pd.Timestamp(year=2023, month=11, day=5, hour=1, minute=30, fold=1, tz=tz) == utc1 16 tz = 'US/Pacific' ---> 17 assert pd.Timestamp(year=2023, month=11, day=5, hour=1, minute=30, fold=0, tz=tz) == utc0 18 assert pd.Timestamp(year=2023, month=11, day=5, hour=1, minute=30, fold=1, tz=tz) == utc1File timestamps.pyx:1882, in pandas._libs.tslibs.timestamps.Timestamp.new()
File conversion.pyx:328, in pandas._libs.tslibs.conversion.convert_to_tsobject()
File conversion.pyx:399, in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()
File conversion.pyx:658, in pandas._libs.tslibs.conversion._localize_pydatetime()
File ~/venv/lib/python3.11/site-packages/pytz/tzinfo.py:366, in DstTzInfo.localize(self, dt, is_dst)
360 # If we get this far, we have multiple possible timezones - this
361 # is an ambiguous case occurring during the end-of-DST transition.
362
363 # If told to be strict, raise an exception since we have an
364 # ambiguous case
365 if is_dst is None:
--> 366 raise AmbiguousTimeError(dt)
368 # Filter out the possiblilities that don't match the requested
369 # is_dst
370 filtered_possible_loc_dt = [
371 p for p in possible_loc_dt if bool(p.tzinfo._dst) == is_dst
372 ]AmbiguousTimeError: 2023-11-05 01:30:00
This behavior is at least better than the current release (2.1,2), which fails with an AssertionError because
pd.Timestamp(year=2023, month=11, day=5, hour=1, minute=30, fold=0, tz=tz)
returns the incorrect timestamp Timestamp('2023-11-05 01:30:00-0800', tz='US/Pacific')
Expected Behavior
I would expect the behavior of interpreting ambiguous timestamps with 'fold' provided to be the same when the timezone is defined as a string (e.g. tz='US/Pacific') as when using the equivalent zoneinfo or dateutil.tz timezone. I noticed that the 'fold' argument is not permitted when using a pytz timezone, but at least in that case a descriptive error is provided.
Installed Versions
INSTALLED VERSIONS
commit : b2d9ec1
python : 3.11.5.final.0
python-bits : 64
OS : Linux
OS-release : 6.6.1-arch1-1
Version : #1 SMP PREEMPT_DYNAMIC Wed, 08 Nov 2023 16:05:38 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.utf8
LOCALE : en_US.UTF-8
pandas : 2.2.0.dev0+564.gb2d9ec17c5
numpy : 1.26.2
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 23.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.17.2
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None