Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame(
dict(
Date=pd.period_range("2000-01-01", periods=10, name="Date"),
Value=range(10),
),
)
result1 = df[df.Date == "2000-01-06"] # Correct
result2 = df.query("Date == '2000-01-06'") # Empty!
# To show that the period type is the problem
df.Date = df.Date.dt.to_timestamp()
result3 = df[df.Date == "2000-01-06"] # Correct
result4 = df.query("Date == '2000-01-06'") # Correct
Issue Description
I would expect query("Date == '2000-01-06'")
to return the same regardless of whether the Date
column was datetime
or period
, given that df.Date == "2000-01-06"
returns the same in both cases.
Also, >=
, <
, etc. work as expected for the period column, it's just ==
that is wrong.
I dug a little deeper and it seems that query
in this case is using .isin
to evaluate ==
, so the issue also shows in this code without any eval
going on.
pd.period_range("2000-01-01", periods=3).isin(['2000-01-02']) # False, False, False
pd.date_range("2000-01-01", periods=3).isin(['2000-01-02']) # False, True, False
pd.period_range("2000-01-01", periods=3) == '2000-01-02' # False, True, False
From what I can tell, isin
will attempt to turn the passed list into a PeriodArray
here:
pandas/pandas/core/arrays/datetimelike.py
Line 766 in 9be48ef
But it will fail because it can't work out the freq, so reverts to just doing element-wise equality which compares periods and strings and so everything is False
.
Maybe a check here to see if this is PeriodArray
and if so pass self.freq
, or is there something like period_array_like
that will copy the freq/dtype from one period array to a new one.
Is it worth questioning why .query()
uses isin
to handle ==
in the first place? I have no idea, I'm sure there's a good reason.
Expected Behavior
Clear from the above?
Installed Versions
pd.show_versions()
is still buggy, doesn't work on my machine.
I'm on version 2.0.3
.