Description
-
[ y] I have checked that this issue has not already been reported. (Here - it has in Debian)
-
[ y] I have confirmed this bug exists on the latest version of pandas.
-
[ y] (optional) I have confirmed this bug exists on the master branch of pandas. (Though with a mix of Debian and pip dependencies, as pip doesn't have them all for 3.9 yet.)
Code Sample, a copy-pastable example
The tests in tests/io/pytables/test_store.py, or
import pandas as pd;from pandas.io.pytables import HDFStore;s1=HDFStore("tmp1.h5","w");df=pd.DataFrame([[1,2,3],[4,5,6]],columns=['A','B','C']);s1.append("d1",df,data_columns=["B"]);df2=s1.select("d1","index>df.index[0]");print(type(df2.index[0]))
Problem description
In Python 3.9, HDFStore.Select fails if it involves an index expression, with this traceback:
self = <DatetimeArray>
['2000-01-03 00:00:00', '2000-01-04 00:00:00', '2000-01-05 00:00:00',
'2000-01-06 00:00:00', '2000-01...2-08 00:00:00',
'2000-02-09 00:00:00', '2000-02-10 00:00:00', '2000-02-11 00:00:00']
Length: 30, dtype: datetime64[ns]
key = 4
def __getitem__(self, key):
if lib.is_integer(key):
# fast-path
result = self._ndarray[key]
if self.ndim == 1:
return self._box_func(result)
return self._from_backing_data(result)
key = extract_array(key, extract_numpy=True)
key = check_array_indexer(self, key)
> result = self._ndarray[key]
E IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
pandas/core/arrays/_mixins.py:200: IndexError
The debugger says key is a pandas.core.computation.pytables.Constant, while in Python 3.8 (where this works) it is a plain int. The underlying cause may be Python replacing ast.Index with bare values.
The CI may have missed this because it skips optional dependencies on 3.9 (to avoid having to build them).
Possible fix
Warning: not fully tested.
--- a/pandas/core/computation/pytables.py
+++ b/pandas/core/computation/pytables.py
@@ -429,6 +429,10 @@ class PyTablesExprVisitor(BaseExprVisito
value = value.value
except AttributeError:
pass
+ try:
+ slobj = slobj.value
+ except AttributeError:
+ pass
try:
return self.const_type(value[slobj], self.env)
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.9.0.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.0-11-amd64
Version : #1 SMP Debian 4.19.146-1 (2020-09-17)
machine : x86_64
processor :
byteorder : little
LC_ALL : C
LANG : C
LOCALE : None.None
pandas : 0+unknown
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 50.3.0
Cython : 0.29.21
pytest : 6.1.1
hypothesis : 5.32.1
sphinx : 3.2.1
blosc : 1.9.2
feather : None
xlsxwriter : 1.1.2
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.2.1
fsspec : 0.8.4
fastparquet : None
gcsfs : 0.7.1
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : 0.5.1
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.7
xarray : None
xlrd : 1.1.0
xlwt : 1.3.0
numba : None