Skip to content

REGR: NotImplementedError: Prefix not defined when slicing offset with loc #47547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Aug 23, 2022
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Fixed regressions
- Fixed regression in :meth:`DataFrame.loc` not updating the cache correctly after values were set (:issue:`47867`)
- Fixed regression in :meth:`DataFrame.loc` not aligning index in some cases when setting a :class:`DataFrame` (:issue:`47578`)
- Fixed regression in :meth:`DataFrame.loc` setting a length-1 array like value to a single value in the DataFrame (:issue:`46268`)
- Fixed regression when slicing with :meth:`DataFrame.loc` with :class:`DateOffset`-index (:issue:`46671`)
- Fixed regression in setting ``None`` or non-string value into a ``string``-dtype Series using a mask (:issue:`47628`)
- Fixed regression using custom Index subclasses (for example, used in xarray) with :meth:`~DataFrame.reset_index` or :meth:`Index.insert` (:issue:`47071`)
- Fixed regression in :meth:`DatetimeIndex.intersection` when the :class:`DatetimeIndex` has dates crossing daylight savings time (:issue:`46702`)
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,12 @@ def _parsed_string_to_bounds(self, reso: Resolution, parsed):

def _parse_with_reso(self, label: str):
# overridden by TimedeltaIndex
parsed, reso_str = parsing.parse_time_string(label, self.freq)
try:
if self.freq is None or hasattr(self.freq, "rule_code"):
freq = self.freq
except NotImplementedError:
freq = getattr(self, "freqstr", getattr(self, "inferred_freq", None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, I don't think we actually run into cases that need this. For example, freqstr for the DateOffset case gives:

In [3]: d.index.freq
Out[3]: <DateOffset: days=1>

In [4]: d.index.freqstr
Out[4]: '<DateOffset: days=1>'

This "<DateOffset: days=1>" string is never going to be useful. But in practice, it also seems that parsing.parse_time_string only actually uses freq in very few cases. From a quick look, it only seems to use it if it is exactly equal to "M", or in case of parsing quarters. Here I could create a snippet where passing the freq actually impacts the result:

In [6]: parsing.parse_time_string("4q2022", "A-DEC")
Out[6]: (datetime.datetime(2022, 10, 1, 0, 0), 'quarter')

In [7]: parsing.parse_time_string("4q2022", "A-NOV")
Out[7]: (datetime.datetime(2022, 9, 1, 0, 0), 'quarter')

But, so this only is for quarterly offsets, and those do define a rule_code, so you probably don't run into this issue with that kind of freqs.

This line freq = getattr(self, "freqstr", getattr(self, "inferred_freq", None)) is restoring what we did before #42149, so that's good for a regression fix in the bug-fix release. But we should maybe also see if we can just remove it later on (and just pass None here if rule_code isn't defined?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked passing None if rule_code isn't defined and a couple of tests fail in this case. Didn't look into it further though. But I think it should be possible to improve the situation around rule_code.

parsed, reso_str = parsing.parse_time_string(label, freq)
reso = Resolution.from_attrname(reso_str)
return parsed, reso

Expand Down
43 changes: 43 additions & 0 deletions pandas/tests/frame/indexing/test_getitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,14 @@
CategoricalDtype,
CategoricalIndex,
DataFrame,
DateOffset,
DatetimeIndex,
Index,
MultiIndex,
Series,
Timestamp,
concat,
date_range,
get_dummies,
period_range,
)
Expand Down Expand Up @@ -172,6 +174,47 @@ def test_getitem_iloc_two_dimensional_generator(self):
expected = Series([5, 6], name="b", index=[1, 2])
tm.assert_series_equal(result, expected)

def test_getitem_iloc_dateoffset_days(self):
# GH 46671
df = DataFrame(
list(range(10)),
index=date_range("01-01-2022", periods=10, freq=DateOffset(days=1)),
)
result = df.loc["2022-01-01":"2022-01-03"]
expected = DataFrame(
[0, 1, 2],
index=DatetimeIndex(
["2022-01-01", "2022-01-02", "2022-01-03"],
dtype="datetime64[ns]",
freq=DateOffset(days=1),
),
)
tm.assert_frame_equal(result, expected)

df = DataFrame(
list(range(10)),
index=date_range(
"01-01-2022", periods=10, freq=DateOffset(days=1, hours=2)
),
)
result = df.loc["2022-01-01":"2022-01-03"]
expected = DataFrame(
[0, 1, 2],
index=DatetimeIndex(
["2022-01-01 00:00:00", "2022-01-02 02:00:00", "2022-01-03 04:00:00"],
dtype="datetime64[ns]",
freq=DateOffset(days=1, hours=2),
),
)
tm.assert_frame_equal(result, expected)

df = DataFrame(
list(range(10)),
index=date_range("01-01-2022", periods=10, freq=DateOffset(minutes=3)),
)
result = df.loc["2022-01-01":"2022-01-03"]
tm.assert_frame_equal(result, df)


class TestGetitemCallable:
def test_getitem_callable(self, float_frame):
Expand Down