Closed
Description
I am trying to open some Stata files generated in IPUMS International, but I am getting a ValueError: Categorical categories must be unique
. I opened in Stata and could not find a repeated category for the column I am trying to import. I had similar issues with other datasets from the same source, which seemed to be generated by missing values, but that does not seem to be the case here. Here's the link to the file I am trying to read.
Code Sample, a copy-pastable example if possible
df = pd.read_stata('ipumsi_00014.dta', columns=['ethnicsn'])
Expected Output
df.shape = (1694761,1)
output of pd.show_versions()
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Darwin
OS-release: 13.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 18.2
Cython: 0.22.1
numpy: 1.11.1
scipy: 0.15.1
statsmodels: 0.6.1
xarray: None
IPython: 3.2.1
sphinx: None
patsy: 0.2.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 0.8.0
tables: None
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: 2.1.3
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.6.4
lxml: 3.3.5
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: 2.5.3 (dt dec pq3 ext)
jinja2: 2.7.3
boto: 2.34.0
pandas_datareader: None