Skip to content

ENH: categorical dataexport - graceful degradation  #8633

Closed
@fkaufer

Description

@fkaufer

It would be great to generally apply graceful degradation for export of categorical data instead of raising exceptions.

Currently this is only the case for to_sql and to_csv, where the categories are exported, while to_pickle is the only option to persist categorical data

For Stata and HDF it is:

  • to_hdf: NotImplementedError: cannot store a category dtype
  • to_stata: ValueError: Data type category not currently understood. Please report an error to the developers.

As long as a backend does not support categoricals or the conversion is not yet implemented, why not generally export categories as a fallback? With the separately discussed decode method (#8628) this would be easy. If the same rigor (backend supports data type natively or fail) would be applied to CSV-IO we could only export string dtypes to CSV.

Thinking one step further, the to_... functions could have an optional parameter named something like convert_cat with options:

  • None: either try to export as a categorical (pickle, potentially HDF, Stata) or raise exception
  • 'category': only export categories (decode method)
  • 'code': export s.cat.codes
  • 'mapping' or 'emulate': export code:category mapping in one/two columns or separate table/frame/... with the code-category mapping.

The last option would probably need additional parameters to control the technical implementation (e.g. table name for mapping or suffixes as for join/merge, ...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions