truncation issue with pd.read_csv

hi, thank you very much for the awesome pandas tool.  I would like to report an issue i noticed and seek guidance on remedial steps.  When i store a long integer into a file using pd.to_csv, it stores the data fine - but when i read it back using pd.read_csv, it messes with the last 3 digits.  When i try to save it back again using to_csv (without any edits), the  numbers in resulting CSV file is different from the original CSV file.  I've illustrated the problem below (notice how 4321113141090630389 becomes 4321113141090630400 and 4321583677327450765 becomes 4321583677327450880):

original CSV file created by pd.to_csv:

> > grep -e 321583677327450 -e 321113141090630 orig.piece 
> > orig.piece:1,1;0;0;0;1;1;3844;3844;3844;1;1;1;1;1;1;0;0;1;1;0;0,,,4321583677327450765
> > orig.piece:5,1;0;0;0;1;1;843;843;843;1;1;1;1;1;1;0;0;1;1;0;0,64.0,;,4321113141090630389
> > 
> > > import pandas as pd
> > > import numpy as np
> > > 
> > > orig = pd.read_csv('orig.piece')
> > > orig.dtypes
> > > Unnamed: 0      int64
> > > aa             object
> > > act           float64
> > > ...
> > > ...
> > > s_act         float64
> > > dtype: object
> > > orig['s_act'].head(6)
> > > 0             NaN
> > > 1    4.321584e+18
> > > 2    4.321974e+18
> > > 3    4.321494e+18
> > > 4    4.321283e+18
> > > 5    4.321113e+18
> > > Name: s_act, dtype: float64
> > > 
> > > orig['s_act'].fillna(0).astype(int).head(6)
> > > 0                      0
> > > 1    4321583677327450880
> > > 2    4321973950881710336
> > > 3    4321493786516159488
> > > 4    4321282586859217408
> > > 5    4321113141090630400
> > > 
> > > orig.to_csv('convert.piece')
> > 
> > grep -e 321583677327450 -e 321113141090630 orig.piece convert.piece
> > orig.piece:1,1;0;0;0;1;1;3844;3844;3844;1;1;1;1;1;1;0;0;1;1;0;0,,,4321583677327450765
> > orig.piece:5,1;0;0;0;1;1;843;843;843;1;1;1;1;1;1;0;0;1;1;0;0,64.0,;,4321113141090630389
> > convert.piece:1,1;0;0;0;1;1;3844;3844;3844;1;1;1;1;1;1;0;0;1;1;0;0,,,4.321583677327451e+18
> > convert.piece:5,1;0;0;0;1;1;843;843;843;1;1;1;1;1;1;0;0;1;1;0;0,64.0,;,4.3211131410906304e+18

could you please help me understand why read_csv jumbles the last three digits?  Its not even a rounding issue, the digits are totally different (like 4321583677327450765 becomes 4321583677327450880 above)  Is it because of the scientific notation that comes in the way - how can we disable it and let pandas treat this data as jus object/string or plan integer/float?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

truncation issue with pd.read_csv #7072

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

truncation issue with pd.read_csv #7072

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions