Skip to content

BUG:ValueError: Invalid frequency with date #35917

Open
@lygujiaxin

Description

@lygujiaxin
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.


Code Sample

import pandas as pd

df = pd.read_hdf('metr-la.h5')
print(df)

metr-la.h5
https://drive.google.com/open?id=10FOTa6HXPqX8Pf5WRoRwcFnW9BrNZEIX
It's index is DataTime type, data is float64 type.

Output

Traceback (most recent call last):
  File "C:\Test.py", line 5, in <module>
    print(pandas_df)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\frame.py", line 751, in __repr__
    show_dimensions=show_dimensions,
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\frame.py", line 881, in to_string
    line_width=line_width,
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\io\formats\format.py", line 630, in __init__
    self._chk_truncate()
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\io\formats\format.py", line 716, in _chk_truncate
    frame = concat((frame.iloc[:row_num, :], frame.iloc[-row_num:, :]))
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 284, in concat
    sort=sort,
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 454, in __init__
    self.new_axes = self._get_new_axes()
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 521, in _get_new_axes
    for i in range(ndim)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 521, in <listcomp>
    for i in range(ndim)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 574, in _get_concat_axis
    concat_axis = _concat_indexes(indexes)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\reshape\concat.py", line 592, in _concat_indexes
    return indexes[0].append(indexes[1:])
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\indexes\base.py", line 4153, in append
    return self._concat(to_concat, name)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\indexes\base.py", line 4161, in _concat
    result = _concat.concat_compat(to_concat)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\dtypes\concat.py", line 162, in concat_compat
    return concat_datetime(to_concat, axis=axis, typs=typs)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\dtypes\concat.py", line 392, in concat_datetime
    result = type(to_concat[0])._concat_same_type(to_concat, axis=axis)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\arrays\datetimelike.py", line 695, in _concat_same_type
    if all(pair[0][-1] + obj.freq == pair[1][0] for pair in pairs):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\arrays\datetimelike.py", line 695, in <genexpr>
    if all(pair[0][-1] + obj.freq == pair[1][0] for pair in pairs):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\arrays\datetimelike.py", line 540, in __getitem__
    return self._box_func(result)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\arrays\datetimes.py", line 476, in <lambda>
    return lambda x: Timestamp(x, freq=self.freq, tz=self.tz)
  File "pandas\_libs\tslibs\timestamps.pyx", line 1083, in pandas._libs.tslibs.timestamps.Timestamp.__new__
  File "pandas\_libs\tslibs\offsets.pyx", line 3580, in pandas._libs.tslibs.offsets.to_offset
ValueError: Invalid frequency: b"ccopy_reg\n_reconstructor\np1\n(cpandas.tseries.offsets\nMinute\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS'normalize'\np6\nI00\nsS'_offset'\np7\ncdatetime\ntimedelta\np8\n(I1\nI0\nI0\ntRp9\nsS'_use_relativedelta'\np10\nI00\nsS'kwds'\np11\n(dp12\nsS'n'\nI5\nsb."

Problem description

Pandas seems to work incorrect with hdf5 file created by old version pandas.
The file work well with pandas 0.19.2. Now, many method will raise ValueError: Invalid frequency.

I am not sure what causes this error, but I found that the DataFrame's index is DatetimeArray type instead of DatetimeIndex type which may be the reason.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f2ca0a2
python : 3.7.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.1.1
numpy : 1.19.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.2.2
setuptools : 47.1.0
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : None
fsspec : 0.7.2
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.17.0
pytables : None
pyxlsb : None
s3fs : 0.2.2
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : None
numba : 0.48.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions