Skip to content

reading of old pandas dataframe (created in python 2) failed with 0.23.4 #24925

Closed
@rbenes

Description

@rbenes

Hi,

Firstly I have to apologize, that my description will be very vague.

I have a problem with one of my dataframe that was created earlier with python 2 and older version of pandas (unfortunately I do not know what version). Now I cannot open it in python 3 and pandas 0.23.4 (loading in python 3 with pandas 0.22.0 works fine).

For reading, I am using:

hdf = pd.HDFStore(src_filename, mode=r”)
data_frame = hdf.select(src_tablename)

My stack trace in pandas 0.23.4 is:

Traceback (most recent call last):
    data_frame = hdf.select(src_tablename)
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 743, in select
    return it.get_result()
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 1485, in get_result
    results = self.func(self.start, self.stop, where)
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 734, in func
    columns=columns)
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 4182, in read
    if not self.read_axes(where=where, **kwargs):
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 3385, in read_axes
    errors=self.errors)
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 2195, in convert
    self.data, nan_rep=nan_rep, encoding=encoding, errors=errors)
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 4658, in _unconvert_string_array
    data = libwriters.string_array_replace_from_nan_rep(data, nan_rep)
  File "pandas/_libs/writers.pyx", line 158, in pandas._libs.writers.string_array_replace_from_nan_rep
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'double'

This stack trace led me to this pull request: #24510

If I list it e.g. with h5ls it looks fine (it is loaded and content looks fine).

Unfortunately, I cannot share the dataframe, because it is private and I cannot reproduce process of the creation with older versions any more :-(. So I am not able to deliver that unreable dataframe.

I debuged pandas and found, that this patch helped me.

diff --git a/pandas/io/pytables.py b/pandas/io/pytables.py
index 4e103482f..2ab6ddb5b 100644
--- a/pandas/io/pytables.py
+++ b/pandas/io/pytables.py
@@ -3288,7 +3288,7 @@ class Table(Fixed):
         self.nan_rep = getattr(self.attrs, 'nan_rep', None)
         self.encoding = _ensure_encoding(
             getattr(self.attrs, 'encoding', None))
-        self.errors = getattr(self.attrs, 'errors', 'strict')
+        self.errors = _ensure_decoded(getattr(self.attrs, 'errors', 'strict'))
         self.levels = getattr(
             self.attrs, 'levels', None) or []
         self.index_axes = [

Can anyone advice me, if such a fix is fine and if yes, can I send it as pull request without any reproducer?

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO HDF5read_hdf, HDFStore

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions