reading of old pandas dataframe (created in python 2) failed with 0.23.4

Hi,

Firstly I have to apologize, that my description will be very vague.

I have a problem with one of my dataframe that was created earlier with python 2 and older version of pandas (unfortunately I do not know what version). Now I cannot open it in python 3 and pandas 0.23.4 (loading in python 3 with pandas 0.22.0 works fine).

For reading, I am using:

```python
hdf = pd.HDFStore(src_filename, mode=”r”)
data_frame = hdf.select(src_tablename)
```
My stack trace in pandas 0.23.4 is:

```
Traceback (most recent call last):
    data_frame = hdf.select(src_tablename)
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 743, in select
    return it.get_result()
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 1485, in get_result
    results = self.func(self.start, self.stop, where)
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 734, in func
    columns=columns)
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 4182, in read
    if not self.read_axes(where=where, **kwargs):
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 3385, in read_axes
    errors=self.errors)
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 2195, in convert
    self.data, nan_rep=nan_rep, encoding=encoding, errors=errors)
  File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 4658, in _unconvert_string_array
    data = libwriters.string_array_replace_from_nan_rep(data, nan_rep)
  File "pandas/_libs/writers.pyx", line 158, in pandas._libs.writers.string_array_replace_from_nan_rep
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'double'
```

This stack trace led me to this pull request: https://github.com/pandas-dev/pandas/pull/24510

If I list it e.g. with h5ls it looks fine (it is loaded and content looks fine).

Unfortunately, I cannot share the dataframe, because it is private and I cannot reproduce process of the creation with older versions any more :-(. So I am not able to deliver that unreable dataframe.

I debuged pandas and found, that this patch helped me. 

```
diff --git a/pandas/io/pytables.py b/pandas/io/pytables.py
index 4e103482f..2ab6ddb5b 100644
--- a/pandas/io/pytables.py
+++ b/pandas/io/pytables.py
@@ -3288,7 +3288,7 @@ class Table(Fixed):
         self.nan_rep = getattr(self.attrs, 'nan_rep', None)
         self.encoding = _ensure_encoding(
             getattr(self.attrs, 'encoding', None))
-        self.errors = getattr(self.attrs, 'errors', 'strict')
+        self.errors = _ensure_decoded(getattr(self.attrs, 'errors', 'strict'))
         self.levels = getattr(
             self.attrs, 'levels', None) or []
         self.index_axes = [
```
Can anyone advice me, if such a fix is fine and if yes, can I send it as pull request without any reproducer? 

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reading of old pandas dataframe (created in python 2) failed with 0.23.4 #24925

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

reading of old pandas dataframe (created in python 2) failed with 0.23.4 #24925

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions