Skip to content

BUG: pytables index expressions fail in Python 3.9 #37217

Closed
@rebecca-palmer

Description

@rebecca-palmer
  • [ y] I have checked that this issue has not already been reported. (Here - it has in Debian)

  • [ y] I have confirmed this bug exists on the latest version of pandas.

  • [ y] (optional) I have confirmed this bug exists on the master branch of pandas. (Though with a mix of Debian and pip dependencies, as pip doesn't have them all for 3.9 yet.)


Code Sample, a copy-pastable example

The tests in tests/io/pytables/test_store.py, or

import pandas as pd;from pandas.io.pytables import HDFStore;s1=HDFStore("tmp1.h5","w");df=pd.DataFrame([[1,2,3],[4,5,6]],columns=['A','B','C']);s1.append("d1",df,data_columns=["B"]);df2=s1.select("d1","index>df.index[0]");print(type(df2.index[0]))

Problem description

In Python 3.9, HDFStore.Select fails if it involves an index expression, with this traceback:

self = <DatetimeArray>
['2000-01-03 00:00:00', '2000-01-04 00:00:00', '2000-01-05 00:00:00',
 '2000-01-06 00:00:00', '2000-01...2-08 00:00:00',
 '2000-02-09 00:00:00', '2000-02-10 00:00:00', '2000-02-11 00:00:00']
Length: 30, dtype: datetime64[ns]
key = 4

    def __getitem__(self, key):
        if lib.is_integer(key):
            # fast-path
            result = self._ndarray[key]
            if self.ndim == 1:
                return self._box_func(result)
            return self._from_backing_data(result)
    
        key = extract_array(key, extract_numpy=True)
        key = check_array_indexer(self, key)
>       result = self._ndarray[key]
E       IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

pandas/core/arrays/_mixins.py:200: IndexError

The debugger says key is a pandas.core.computation.pytables.Constant, while in Python 3.8 (where this works) it is a plain int. The underlying cause may be Python replacing ast.Index with bare values.

The CI may have missed this because it skips optional dependencies on 3.9 (to avoid having to build them).

Possible fix

Warning: not fully tested.

--- a/pandas/core/computation/pytables.py
+++ b/pandas/core/computation/pytables.py

@@ -429,6 +429,10 @@ class PyTablesExprVisitor(BaseExprVisito
             value = value.value
         except AttributeError:
             pass
+        try:
+            slobj = slobj.value
+        except AttributeError:
+            pass
 
         try:
             return self.const_type(value[slobj], self.env)

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.9.0.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.0-11-amd64
Version : #1 SMP Debian 4.19.146-1 (2020-09-17)
machine : x86_64
processor :
byteorder : little
LC_ALL : C
LANG : C
LOCALE : None.None

pandas : 0+unknown
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 50.3.0
Cython : 0.29.21
pytest : 6.1.1
hypothesis : 5.32.1
sphinx : 3.2.1
blosc : 1.9.2
feather : None
xlsxwriter : 1.1.2
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.2.1
fsspec : 0.8.4
fastparquet : None
gcsfs : 0.7.1
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : 0.5.1
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.7
xarray : None
xlrd : 1.1.0
xlwt : 1.3.0
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugClosing CandidateMay be closeable, needs more eyeballsIO HDF5read_hdf, HDFStore

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions