Skip to content

BUG: pandas.Index.union does not behave properly across summer hour change #45863

Closed
@chribag

Description

@chribag

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import datetime as dt

index1 = pd.date_range(start=dt.datetime(2021,10,28), periods=3, freq='1D', tz='Europe/London')
index2 = pd.date_range(start=dt.datetime(2021,10,30), periods=4, freq='1D', tz='Europe/London')

index1.union(index2)

Issue Description

result:

DatetimeIndex(['2021-10-28 00:00:00+01:00', '2021-10-29 00:00:00+01:00',
               '2021-10-30 00:00:00+01:00', '2021-10-31 00:00:00+01:00',
               '2021-10-31 23:00:00+00:00', '2021-11-01 23:00:00+00:00',
               '2021-11-02 23:00:00+00:00'],
              dtype='datetime64[ns, Europe/London]', freq='D')

When computing the union of 2 daily DatetimeIndex one of which is across a summer/winter time change, the result is not correct.
The wrong union currently omits or adds dates to the result.

Expected Behavior

expected result:

DatetimeIndex(['2021-10-28 00:00:00+01:00', '2021-10-29 00:00:00+01:00',
               '2021-10-30 00:00:00+01:00', '2021-10-31 00:00:00+01:00',
               '2021-11-01 00:00:00+00:00', '2021-11-02 00:00:00+00:00'],
              dtype='datetime64[ns, Europe/London]', freq='D')

The correct result can be obtained by removing first the freq on one of the index with:

index1.freq = None

Installed Versions

INSTALLED VERSIONS

commit : bb1f651
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.25-linuxkit
Version : #1 SMP Tue Mar 23 09:27:39 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.0
numpy : 1.22.1
pytz : 2021.3
dateutil : 2.8.2
pip : 21.0.1
setuptools : 49.6.0.post20210108
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.7.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.20.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : 2022.01.0
gcsfs : None
matplotlib : 3.4.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 6.0.1
pyreadstat : None
pyxlsb : None
s3fs : 2022.01.0
scipy : 1.7.3
sqlalchemy : 1.3.23
tables : None
tabulate : None
xarray : 0.18.2
xlrd : None
xlwt : None
zstandard : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugTimezonesTimezone data dtypesetopsunion, intersection, difference, symmetric_difference

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions