Closed
Description
Below find a handful of reproducible observations in Pandas 1.0.5 re: DatetimeIndex.get_slice_bound(...)
where I'm genuinely not sure what the correct behaviors should all be. Some of these appear related to #34077 in that slice_locs(...)
uses get_slice_bound(...)
under the covers.
Observations inlined and expected behaviors discussed afterwards:
########################################
OBSERVATIONS 1 using a UTC DatetimeIndex
########################################
import datetime
import pandas as pd
import pandas.util.testing as put
# Generate a UTC-localized DatetimeIndex
df = put.makeTimeDataFrame()
df = df.tz_localize("utc")
index = df.index
# Show the generated DatetimeIndex, which should look like:
# DatetimeIndex(['2000-01-03 00:00:00+00:00', '2000-01-04 00:00:00+00:00',
# ...
# '2000-02-10 00:00:00+00:00', '2000-02-11 00:00:00+00:00'],
# dtype='datetime64[ns, UTC]', freq='B')
print(index)
# (A) When the date is inside the DatetimeIndex, this call completes.
index.get_slice_bound(datetime.date(2000, 1, 7), kind="ix", side="left")
# (B) Notice date before start of index
# TypeError: searchsorted requires compatible dtype or scalar, not date
index.get_slice_bound(datetime.date(2000, 1, 1), kind="ix", side="left")
# (C) Notice date after end of index
# TypeError: searchsorted requires compatible dtype or scalar, not date
index.get_slice_bound(datetime.date(2020, 1, 1), kind="ix", side="left")
# (D) When the Timestamp is inside the DatetimeIndex, this call completes
index.get_slice_bound(pd.Timestamp("2000-01-07"), kind="ix", side="left")
# (E) Notice Timestamp before start of index
# TypeError: Cannot compare tz-naive and tz-aware datetime-like objects
index.get_slice_bound(pd.Timestamp("2000-01-01"), kind="ix", side="left")
# (F) Notice Timestamp after end of index
# TypeError: Cannot compare tz-naive and tz-aware datetime-like objects
index.get_slice_bound(pd.Timestamp("2020-01-01"), kind="ix", side="left")
Discussion:
- Above, I do not know if (A)-(C) should behave as (E)-(F) or not.
- Above, I suspect (D) should raise as (E)-(F) do.
- I can see arguments where datetime.date's lacking tzinfo could be inferred to be the DatetimeIndex.tzinfo. Then (A)-(C) would not raise.
- I can see arguments where Timestamps lacking tzinfo could be inferred to be the DatetimeIndex.tzinfo. Then (D)-(F) would not raise.
- I don't expect any data-dependence, meaning that the specific YYYY-MM-DD should not impact if something raises.
- I have not tested datetime.datetime under these circumstances.
- I believe some of these behaviors may have changed since the 0.2-series.
Again, observations inlined and expected behaviors discussed afterwards:
########################################################
OBSERVATIONS 2 using a DatetimeIndex with tzinfo is None
########################################################
import datetime
import pandas as pd
import pandas.util.testing as put
# Generate a non-localized DatetimeIndex
df = put.makeTimeDataFrame()
index = df.index
assert index.tzinfo is None
# Show the generated DatetimeIndex, which should look like:
# DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06',
# ...
# '2000-02-10', '2000-02-11'],
# dtype='datetime64[ns]', freq='B')
print(index)
# (G) When the date is inside the DatetimeIndex, this call completes.
index.get_slice_bound(datetime.date(2000, 1, 7), kind="ix", side="left")
# (H) Notice date before start of index
# TypeError: searchsorted requires compatible dtype or scalar, not date
index.get_slice_bound(datetime.date(2000, 1, 1), kind="ix", side="left")
# (I) Notice date after end of index
# TypeError: searchsorted requires compatible dtype or scalar, not date
index.get_slice_bound(datetime.date(2020, 1, 1), kind="ix", side="left")
# (J) When the Timestamp is inside the DatetimeIndex, this call completes
index.get_slice_bound(pd.Timestamp("2000-01-07"), kind="ix", side="left")
# (K) Call completes for Timestamp before start of index
index.get_slice_bound(pd.Timestamp("2000-01-01"), kind="ix", side="left")
# (L) Call completes for Timestamp after end of index
index.get_slice_bound(pd.Timestamp("2020-01-01"), kind="ix", side="left")
Discussion:
- Above, I expect (H)-(I) to behave as (G).
- Above, (J)-(K) appear correct to me.
- I don't expect any data-dependence, meaning that the specific YYYY-MM-DD should not impact if something raises.
- I have not tested datetime.datetime under these circumstances.
- I believe some of these behaviors may have changed since the 0.2-series.
INSTALLED VERSIONS
------------------
commit : None
python : 3.6.10.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.67-ts1
machine : x86_64
processor :
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.utf8
LOCALE : en_US.UTF-8
pandas : 1.0.5
numpy : 1.16.6
pytz : 2019.2
dateutil : 2.8.0
pip : 19.0.3
setuptools : 40.8.0
Cython : 0.29.20
pytest : 5.3.1
hypothesis : 3.57.0
sphinx : 1.8.5
blosc : 1.5.1
feather : None
xlsxwriter : 1.0.2
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.5.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.2.1
fastparquet : 0.3.3
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.0.3
numexpr : 2.6.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 1.0.0
pytables : None
pytest : 5.3.1
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.1
sqlalchemy : 1.2.1
tables : 3.5.2
tabulate : 0.8.3
xarray : 0.10.0
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : 1.0.2
numba : 0.50.1