Closed
Description
- I load a dataframe from mysql:
df_bugs_activity_4w = psql.read_frame('select * from bugs_activity limit 0, 40000', conn)
- and the structure of df_bugs_activity_4w:
In[19]: df_bugs_activity_4w
Out[19]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 40000 entries, 0 to 39999
Data columns:
bug_id 40000 non-null values
attach_id 13 non-null values
who 40000 non-null values
bug_when 40000 non-null values
fieldid 40000 non-null values
added 40000 non-null values
removed 40000 non-null values
id 40000 non-null values
dtypes: float64(1), int64(4), object(3)
- then, convert the object columns
In [60]: df_bugs_activity_4w = df_bugs_activity_4w.convert_objects()
In [61]: df_bugs_activity_4w
Out[61]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 40000 entries, 0 to 39999
Data columns:
bug_id 40000 non-null values
attach_id 13 non-null values
who 40000 non-null values
bug_when 40000 non-null values
fieldid 40000 non-null values
added 40000 non-null values
removed 40000 non-null values
id 40000 non-null values
dtypes: datetime64[ns](1), float64(1), int64(4), object(2)
- I put it into a hdfstore, and then get it out, found the number of dataframe entries changed from 40,000 to 13 ! That's weird. It seems that the number of 'attach_id' columns limits the total number of dataframe when putting it into a hdfstore.
In [63]: %prun store.put('df_bugs_activity_4w1', df_bugs_activity_4w, table=True)
In [64]: %time store.get('df_bugs_activity_4w1')
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.01 s
Out[64]:
bug_id attach_id who bug_when fieldid added removed id
2012 301879 0 35 1999-04-06 11:22:11 16 dev bug 2013
2014 301879 0 35 1999-04-06 11:22:12 5 para us 2015
2835 301879 0 56 1999-05-14 15:56:12 10 clo op 2836
31244 301879 0 207 2001-07-18 14:11:38 10 op clo 31245
31252 301879 0 207 2001-07-18 15:40:52 10 ana op 31253
31283 301879 0 35 2001-07-18 21:21:33 16 lui dev 31284
31285 301879 0 35 2001-07-18 21:21:34 15 296 10 31286
31287 301879 0 35 2001-07-18 21:21:35 5 unk para 31288
31393 301879 0 159 2001-07-19 12:41:07 16 prat lui 31394
31472 301879 0 207 2001-07-19 17:27:31 10 ope ana 31473
32675 301879 0 207 2001-08-02 10:09:08 10 clos op 32676
38609 235837 0 201 2001-09-26 20:28:11 15 310- 300-3 38610
38610 235838 0 201 2001-09-26 20:28:11 15 310- 300 38611
In [66]: store
Out[66]:
<class 'pandas.io.pytables.HDFStore'>
File path: sample_no_fill.h5
/df_bugs_4w frame_table (typ->appendable,nrows->40000,ncols->52,indexers->[index])
/df_bugs_4w1 frame_table (typ->legacy,nrows->None,ncols->0,indexers->[])
/df_bugs_activity_4w frame_table (typ->appendable,nrows->13,ncols->8,indexers->[index])
/df_bugs_activity_4w1 frame_table (typ->appendable,nrows->13,ncols->8,indexers->[index])
- ps: I have posted this question on stackoverflow
Metadata
Metadata
Assignees
Labels
No labels