Description
I have a MultiIndexed DataFrame which, when pickled and unpickled, results in some of the data changing.
Specifically, some of its columns contain data which is a tuple of integers, and these integers are different each unpickling. Here's a DataFrame which shows this problem:
http://bec.physics.monash.edu/docs/dataframe.pickle
It was pickled with pandas 0.9.0.dev-667220a (current git-master) and Python 2.7.3 64 bit.
Try the following to reproduce:
import pickle
df = pickle.load(open('dataframe.pickle'))
print df['top','roi0']
Repeating this gives me different data each time.
Which integers I get seems slightly platform dependent. The tuples contain four integers, and on Linux most rows (but not all) in the DataFrame end up getting three zeros and one nonzero value, and all the nonzero values are somewhat similar, and in the tens of millions. On Windows the data seems to be random integers that are less regular, and without most of them being zeros. The data that was pickled should have four different values, between 0 and 2048 or so (these are definitions of region of interest rects on a CCD camera).
I am unable to reproduce the problem with non-herarchical index DataFrames.
Any clue as to why this might be, or any suggested workarounds? We use different serialisation formats for on-disk persistence, but we've been using pickle for slinging dataframes across networks.