read_sas does not handle numeric variables stored with fewer than 8 bytes in SAS datasets

This problem has been reported twice on stackoverflow, but I cannot find any issue raised here to address it.

https://stackoverflow.com/questions/49059421/pandas-fails-with-correct-data-type-while-reading-a-sas-file
https://stackoverflow.com/questions/51005244/how-can-i-preserve-the-data-type-of-a-column-when-using-pandas-read-sas/51005427#51005427

read_sas() is not handling SAS numeric values that are stored using fewer than the full 8 bytes required for floating point numbers.  It appears that instead of padding the short values with binary zeros that read_sas is using some other, perhaps randomly chosen, bytes to fill out the values.

In the first stackover flow example you can see that SAS has stored the value of the number 8.  This would be represented in IEEE floating point as the 8 bytes represented by the hex string '40 20 00 00 00 00 00 00'.  In the sas7bdat file SAS stored only the 3 most significant bytes.  Instead of padding it with 5 bytes of zeros before converting it it looks like read_sas padded it with bytes represented by the hex codes '06 07 80 FD C1' instead.  So that instead of reading the value as 8 it was read as 8.00046 .


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_sas does not handle numeric variables stored with fewer than 8 bytes in SAS datasets #21616

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

read_sas does not handle numeric variables stored with fewer than 8 bytes in SAS datasets #21616

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions