Skip to content

Unhelpful error message when loading a single column with read_csv and usecols #20529

Closed
@mattmotoki

Description

@mattmotoki

Code

>>> import pandas as pd
>>> df = pd.DataFrame({'x': [0,1], 'x1': [2,3]})
>>> df.to_csv('tmp.csv', index=False)
>>> pd.read_csv('tmp.csv', usecols='x')
   x
0  0
1  1
>>> pd.read_csv('tmp.csv', usecols=['x1'])
   x1
0   2
1   3
>>> pd.read_csv('tmp.csv', usecols='x1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/matt/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/matt/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 449, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/matt/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 818, in __init__
    self._make_engine(self.engine)
  File "/home/matt/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1049, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/matt/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1740, in __init__
    raise ValueError("Usecols do not match names.")
ValueError: Usecols do not match names.

Problem description

When using usecols to load a single column, one needs to have either a single-character column name or provide an array-like object. In the example above, pd.read_csv('tmp.csv', usecols='x') and pd.read_csv('tmp.csv', usecols=['x1']) work as expected; however, things break down for pd.read_csv('tmp.csv', usecols='x1'). The corresponding error message ValueError: Usecols do not match names. is not very helpful either.

Expected Output

It would be nice if there were some type checking done on usecols so that things don't break in the example above. At the least, the error message should be a bit more helpful; e.g., ValueError: Usecols should be array-like.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: None
dateutil: 2.7.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Error ReportingIncorrect or improved errors from pandasIO CSVread_csv, to_csv

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions