to_stata always stores strings as str244

Copied from #7858:

I am still getting this bug (pandas 0.15.1, numpy 1.9.1, Stata 13.1 on Windows 7). The written DTA file still stores the strings as str244 even though the strings themselves have length 1. This also means that when they're read back into pandas the DataFrame looks the same. I think the only way to detect it from within Pandas is to look at the size of the DTA file itself.

``` python
df = pd.DataFrame(['a', 'b', 'c'], columns=['alpha'])
df.to_stata('test.dta')
df2 = pd.read_stata('test.dta')
assert (df['alpha'] == df2['alpha']).min()
```

But when loading in Stata:

``` stata
. use test
. describe

Contains data from D:\data\Pollution\test.dta
  obs:             3                          
 vars:             2                          02 Dec 2014 10:30
 size:           744                          
--------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------------
index           long    %12.0g                
alpha           str244  %1s                   
--------------------------------------------------------------------------------------------
Sorted by:  

. compress
  index was long now byte
  alpha was str244 now str1
  (738 bytes saved)

. describe

Contains data from D:\data\Pollution\test.dta
  obs:             3                          
 vars:             2                          02 Dec 2014 10:30
 size:             6                          
--------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------------
index           byte    %12.0g                
alpha           str1    %9s                   
--------------------------------------------------------------------------------------------
Sorted by:  

```

I don't know how critical this is since there are workarounds. You can either use `compress; save, replace` in Stata after every use of pandas or, if the 244 problem makes the DTA exceed your memory limit (which causes quite the system error lightshow as I just experienced), you could pass it through a CSV first.

It's just a matter of convenience.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

to_stata always stores strings as str244 #8969

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

to_stata always stores strings as str244 #8969

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions