Description
Hi,
Firstly I have to apologize, that my description will be very vague.
I have a problem with one of my dataframe that was created earlier with python 2 and older version of pandas (unfortunately I do not know what version). Now I cannot open it in python 3 and pandas 0.23.4 (loading in python 3 with pandas 0.22.0 works fine).
For reading, I am using:
hdf = pd.HDFStore(src_filename, mode=”r”)
data_frame = hdf.select(src_tablename)
My stack trace in pandas 0.23.4 is:
Traceback (most recent call last):
data_frame = hdf.select(src_tablename)
File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 743, in select
return it.get_result()
File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 1485, in get_result
results = self.func(self.start, self.stop, where)
File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 734, in func
columns=columns)
File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 4182, in read
if not self.read_axes(where=where, **kwargs):
File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 3385, in read_axes
errors=self.errors)
File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 2195, in convert
self.data, nan_rep=nan_rep, encoding=encoding, errors=errors)
File "/home/rbenes/virtual_envs/iface_venv36_new_pkgs/lib/python3.6/site-packages/pandas/io/pytables.py", line 4658, in _unconvert_string_array
data = libwriters.string_array_replace_from_nan_rep(data, nan_rep)
File "pandas/_libs/writers.pyx", line 158, in pandas._libs.writers.string_array_replace_from_nan_rep
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'double'
This stack trace led me to this pull request: #24510
If I list it e.g. with h5ls it looks fine (it is loaded and content looks fine).
Unfortunately, I cannot share the dataframe, because it is private and I cannot reproduce process of the creation with older versions any more :-(. So I am not able to deliver that unreable dataframe.
I debuged pandas and found, that this patch helped me.
diff --git a/pandas/io/pytables.py b/pandas/io/pytables.py
index 4e103482f..2ab6ddb5b 100644
--- a/pandas/io/pytables.py
+++ b/pandas/io/pytables.py
@@ -3288,7 +3288,7 @@ class Table(Fixed):
self.nan_rep = getattr(self.attrs, 'nan_rep', None)
self.encoding = _ensure_encoding(
getattr(self.attrs, 'encoding', None))
- self.errors = getattr(self.attrs, 'errors', 'strict')
+ self.errors = _ensure_decoded(getattr(self.attrs, 'errors', 'strict'))
self.levels = getattr(
self.attrs, 'levels', None) or []
self.index_axes = [
Can anyone advice me, if such a fix is fine and if yes, can I send it as pull request without any reproducer?
Thank you.