Skip to content

ENH: Improve error message for repeated Stata categories #13949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

bashtage
Copy link
Contributor

@bashtage bashtage commented Aug 9, 2016

Improve the error message to be more explicit when attempting to read Stata
files containing repeated categories.

closes #13923

@@ -397,6 +397,7 @@ Other enhancements
- ``Series.append`` now supports the ``ignore_index`` option (:issue:`13677`)
- ``.to_stata()`` and ``StataWriter`` can now write variable labels to Stata dta files using a dictionary to make column names to labels (:issue:`13535`, :issue:`13536`)
- ``.to_stata()`` and ``StataWriter`` will automatically convert ``datetime64[ns]`` columns to Stata format ``%tc``, rather than raising a ``ValueError`` (:issue:`12259`)
- ``read_stata()`` and ``StataReader`` return more explicit error message is given when reading Stata data files containing value labels are repeated when ``convert_categoricals=True`` (:issue:`13923`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"raise with a more explicit error message when reading Stata files with repeated value labels when convert_categoricals=True".

@bashtage bashtage force-pushed the categorical-error-message branch from aee7b63 to 72cc94c Compare August 9, 2016 15:38
@codecov-io
Copy link

codecov-io commented Aug 9, 2016

Current coverage is 85.29% (diff: 100%)

Merging #13949 into master will decrease coverage by <.01%

@@             master     #13949   diff @@
==========================================
  Files           139        139          
  Lines         50159      50169    +10   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42786      42792     +6   
- Misses         7373       7377     +4   
  Partials          0          0          

Powered by Codecov. Last update 4df08a9...0880d02

@jreback jreback added Error Reporting Incorrect or improved errors from pandas IO Stata read_stata, to_stata labels Aug 9, 2016
@@ -1649,6 +1649,13 @@ def _do_convert_categoricals(self, data, value_label_dict, lbllist,
categories.append(value_label_dict[label][category])
else:
categories.append(category) # Partially labeled
if len(categories) != len(set(categories)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its a bit more python to:

try:
    cat_data.categories = categories
except ValueError:
    # show your new ValueError

as non-unique category setting already raises.

@bashtage bashtage force-pushed the categorical-error-message branch from 72cc94c to 533a9a0 Compare August 10, 2016 11:17
Improve the error message to be more explicit when attempting to read Stata
files containing repeated categories.

closes pandas-dev#13923
@bashtage bashtage force-pushed the categorical-error-message branch from 533a9a0 to 0880d02 Compare August 10, 2016 11:17
@jreback jreback added this to the 0.19.0 milestone Aug 10, 2016
@jreback jreback closed this in 257ac88 Aug 10, 2016
@jreback
Copy link
Contributor

jreback commented Aug 10, 2016

thanks! great as usual!

@bashtage bashtage deleted the categorical-error-message branch January 24, 2017 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stata read categoricals gives ValueError: Categorical categories must be unique
4 participants