Description
Code Sample, a copy-pastable example if possible
import pandas as pd
myDf = pd.DataFrame({'a' : pd.Series([1443525810,1443540836,1443609470]),
'b' : pd.Series(['ab','cd','ab'])})
myDf.to_hdf('test.h5', 'test')
with pd.HDFStore('test.h5') as myFile:
df = myFile.select('/test', start=0, stop=2) # omit "start=0, stop=2" to prevent error
display (df)
Problem description
ValueError: Shape of passed values is (2, 3), indices imply (2, 2)
Expected Output
a b
0 1443525810 ab
1 1443540836 cd
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Windows
OS-release: 2012ServerR2
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.13.1
scipy: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
Other remarks:
-
please be gentle, this is my first Github interaction :)
-
notebook attached that contains problem and solution output
-
My guess is that pytables.py's read_array takes the one-dimensional behavior of VLArray into account too late; after slicing "data = node[start:stop]", resulting in the slice returning the whole column, my following implementation of the method seems to fix it.
def read_array(self, key, start=None, stop=None): """ read an array for the specified node (off of group """ import tables node = getattr(self.group, key) attrs = node._v_attrs transposed = getattr(attrs, 'transposed', False) if isinstance(node, tables.VLArray): ret = node[0][start:stop] else: dtype = getattr(attrs, 'value_type', None) shape = getattr(attrs, 'shape', None) if shape is not None: # length 0 axis ret = np.empty(shape, dtype=dtype) else: ret = node[start:stop] if dtype == u('datetime64'): # reconstruct a timezone if indicated ret = _set_tz(ret, getattr(attrs, 'tz', None), coerce=True) elif dtype == u('timedelta64'): ret = np.asarray(ret, dtype='m8[ns]') if transposed: return ret.T else: return ret