ENH: decode for Categoricals

There seems no direct way to return to the original dtype and the [documentation](http://pandas-docs.github.io/pandas-docs-travis/categorical.html#object-creation) recommends: _"To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical)"_

That's slow and a `decode` or `decat` method would be trivial:

``` python
df=pd.DataFrame(np.random.choice(list(u'abcde'), 4e6).reshape(1e6, 4),
    columns=list(u'ABCD'))                                     
for col in df.columns: df[col] = df[col].astype('category')   

%timeit for col in df.columns: df[col].astype('unicode')      
1 loops, best of 3: 1.06 s per loop

%timeit for col in df.columns: cats=df[col].cat.categories; cats[df[col].cat.codes]    
10 loops, best of 3: 33.2 ms per loop   
```

I was working with ~10 categories (partially longer strings) on a 20 mio rows dataset where the difference was even bigger (unfortunately can't reproduce it with dummy data) and using `astype` felt rather buggy (minutes) than only a performance issue.

Given the current limitations on exporting categorical data, having a fast `decode` method would be very convenient. Since category codes are most often strings an optional parameter for direct character set encoding would also be good to have for such a method.

``` python
%timeit for col in df.columns: df[col].astype('unicode').str.encode('latin1')  
1 loops, best of 3: 3.95 s per loop
%timeit for col in df.columns: cats=pd.Series(df[col].cat.categories).str.encode('latin1'); cats[df[col].cat.codes]                                                                  
10 loops, best of 3: 74.5 ms per loop   
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: decode for Categoricals #8628

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ENH: decode for Categoricals #8628

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions