Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
Code Sample
import pandas as pd
df = pd.read_hdf('metr-la.h5')
print(df)
metr-la.h5
https://drive.google.com/open?id=10FOTa6HXPqX8Pf5WRoRwcFnW9BrNZEIX
It's index is DataTime type, data is float64 type.
Output
Traceback (most recent call last):
File "C:\Test.py", line 5, in <module>
print(pandas_df)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\frame.py", line 751, in __repr__
show_dimensions=show_dimensions,
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\frame.py", line 881, in to_string
line_width=line_width,
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\io\formats\format.py", line 630, in __init__
self._chk_truncate()
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\io\formats\format.py", line 716, in _chk_truncate
frame = concat((frame.iloc[:row_num, :], frame.iloc[-row_num:, :]))
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 284, in concat
sort=sort,
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 454, in __init__
self.new_axes = self._get_new_axes()
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 521, in _get_new_axes
for i in range(ndim)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 521, in <listcomp>
for i in range(ndim)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 574, in _get_concat_axis
concat_axis = _concat_indexes(indexes)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 592, in _concat_indexes
return indexes[0].append(indexes[1:])
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\indexes\base.py", line 4153, in append
return self._concat(to_concat, name)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\indexes\base.py", line 4161, in _concat
result = _concat.concat_compat(to_concat)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\dtypes\concat.py", line 162, in concat_compat
return concat_datetime(to_concat, axis=axis, typs=typs)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\dtypes\concat.py", line 392, in concat_datetime
result = type(to_concat[0])._concat_same_type(to_concat, axis=axis)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\arrays\datetimelike.py", line 695, in _concat_same_type
if all(pair[0][-1] + obj.freq == pair[1][0] for pair in pairs):
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\arrays\datetimelike.py", line 695, in <genexpr>
if all(pair[0][-1] + obj.freq == pair[1][0] for pair in pairs):
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\arrays\datetimelike.py", line 540, in __getitem__
return self._box_func(result)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\arrays\datetimes.py", line 476, in <lambda>
return lambda x: Timestamp(x, freq=self.freq, tz=self.tz)
File "pandas\_libs\tslibs\timestamps.pyx", line 1083, in pandas._libs.tslibs.timestamps.Timestamp.__new__
File "pandas\_libs\tslibs\offsets.pyx", line 3580, in pandas._libs.tslibs.offsets.to_offset
ValueError: Invalid frequency: b"ccopy_reg\n_reconstructor\np1\n(cpandas.tseries.offsets\nMinute\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS'normalize'\np6\nI00\nsS'_offset'\np7\ncdatetime\ntimedelta\np8\n(I1\nI0\nI0\ntRp9\nsS'_use_relativedelta'\np10\nI00\nsS'kwds'\np11\n(dp12\nsS'n'\nI5\nsb."
Problem description
Pandas seems to work incorrect with hdf5 file created by old version pandas.
The file work well with pandas 0.19.2. Now, many method will raise ValueError: Invalid frequency.
I am not sure what causes this error, but I found that the DataFrame's index is DatetimeArray type instead of DatetimeIndex type which may be the reason.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : f2ca0a2
python : 3.7.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.1.1
numpy : 1.19.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.2.2
setuptools : 47.1.0
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : None
fsspec : 0.7.2
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.17.0
pytables : None
pyxlsb : None
s3fs : 0.2.2
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : None
numba : 0.48.0