Skip to content

Incorrect behavior of hdfstore.select for categorical columns (pandas, hdf5) #714

Open
@moskalev

Description

@moskalev

The following example replicates the issue:

import pandas as pd

df = pd.DataFrame(["a","b","c","a"], columns=['col1'])

cats = ["c","b","a"]
df['col1'] = pd.Categorical(df['col1'], categories=cats)

df

col1
0	a
1	b
2	c
3	a

store = pd.HDFStore('bug.h5', complevel=9, complib='blosc:blosclz')
store.append('df', df, data_columns=df.columns, expectedrows=4)
store.select('df', "col1='a'")

col1
2	c

while I expected to get

col1
0	a
3	a

Sorting the categories list before calling pd.Categorical solves the issue for now.

Modules:

pytables                  3.4.3            py36h02b9ad4_0    anaconda
pandas                    0.23.4           py36hf8a1672_0    conda-forge
hdf5                      1.10.1               h9caa474_1  

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions