Skip to content

read_hdf / store.select modifies the passed columns parameters when multi-indexed #7212

Closed
@eldad-a

Description

@eldad-a

code to reproduce:

import pandas as pd
import numpy as np

## generate data
df = pd.DataFrame(np.random.rand(4,5), index=list('abcd'), columns=list('ABCDE'))
df.index.name = 'letters'
df = df.set_index(keys='E' , append=True)

## save to hdf5
h5name = 'tst.h5'
key = 'tst_key'
df.to_hdf(h5name, key,
          mode='a', append=True,
          data_columns = df.index.names+df.columns.tolist(),
          index=False, 
          complevel=5, complib='blosc', 
          #expectedrows = expectedrows ,
          )

## load part of df
cols2load = list('BCD')
print 'before loading: \n\t cols2load = {}'.format(cols2load)
df_ = pd.read_hdf(h5name, key, columns= cols2load)
print 'after loading: \n\t cols2load = {}'.format(cols2load)

The printed output:

before loading:
cols2load = ['B', 'C', 'D']
after loading:
cols2load = ['E', 'letters', 'B', 'C', 'D']

pd.version = '0.13.1'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions