Skip to content

Can not put a dataframe into hdfstore *completely* #3012

Closed
@simomo

Description

@simomo
  • I load a dataframe from mysql:
df_bugs_activity_4w = psql.read_frame('select * from bugs_activity limit 0, 40000', conn)
  • and the structure of df_bugs_activity_4w:
In[19]: df_bugs_activity_4w
Out[19]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 40000 entries, 0 to 39999
Data columns:
bug_id       40000  non-null values
attach_id    13  non-null values
who          40000  non-null values
bug_when     40000  non-null values
fieldid      40000  non-null values
added        40000  non-null values
removed      40000  non-null values
id           40000  non-null values
dtypes: float64(1), int64(4), object(3)
  • then, convert the object columns
In [60]: df_bugs_activity_4w = df_bugs_activity_4w.convert_objects()
In [61]: df_bugs_activity_4w
Out[61]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 40000 entries, 0 to 39999
Data columns:
bug_id       40000  non-null values
attach_id    13  non-null values
who          40000  non-null values
bug_when     40000  non-null values
fieldid      40000  non-null values
added        40000  non-null values
removed      40000  non-null values
id           40000  non-null values
dtypes: datetime64[ns](1), float64(1), int64(4), object(2)
  • I put it into a hdfstore, and then get it out, found the number of dataframe entries changed from 40,000 to 13 ! That's weird. It seems that the number of 'attach_id' columns limits the total number of dataframe when putting it into a hdfstore.
In [63]: %prun store.put('df_bugs_activity_4w1', df_bugs_activity_4w, table=True)

In [64]: %time store.get('df_bugs_activity_4w1')
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.01 s
Out[64]:
bug_id  attach_id   who bug_when    fieldid added   removed id
2012     301879  0   35 1999-04-06 11:22:11  16  dev     bug     2013
2014     301879  0   35 1999-04-06 11:22:12  5   para    us 2015
2835     301879  0   56 1999-05-14 15:56:12  10  clo     op  2836
31244    301879  0   207    2001-07-18 14:11:38  10  op  clo     31245
31252    301879  0   207    2001-07-18 15:40:52  10  ana     op  31253
31283    301879  0   35 2001-07-18 21:21:33  16  lui     dev     31284
31285    301879  0   35 2001-07-18 21:21:34  15  296     10  31286
31287    301879  0   35 2001-07-18 21:21:35  5   unk     para    31288
31393    301879  0   159    2001-07-19 12:41:07  16  prat    lui     31394
31472    301879  0   207    2001-07-19 17:27:31  10  ope     ana     31473
32675    301879  0   207    2001-08-02 10:09:08  10  clos    op   32676
38609    235837  0   201    2001-09-26 20:28:11  15  310-    300-3   38610
38610    235838  0   201    2001-09-26 20:28:11  15  310-    300     38611

2013-03-11 21:43:21

In [66]: store
Out[66]:
<class 'pandas.io.pytables.HDFStore'>
File path: sample_no_fill.h5
/df_bugs_4w                      frame_table  (typ->appendable,nrows->40000,ncols->52,indexers->[index])
/df_bugs_4w1                     frame_table  (typ->legacy,nrows->None,ncols->0,indexers->[])           
/df_bugs_activity_4w             frame_table  (typ->appendable,nrows->13,ncols->8,indexers->[index])    
/df_bugs_activity_4w1            frame_table  (typ->appendable,nrows->13,ncols->8,indexers->[index]) 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions