Skip to content

Pandas 0.19 read_csv with header=[0, 1] on an empty df throws error  #14515

Closed
@kaloramik

Description

@kaloramik

Pandas 0.19 incorrectly handles empty dataframe files with multi index columns

import pandas as pd
import tempfile

df = pd.DataFrame.from_records([], columns=['col_1', 'col_2'])
joined_df_in = pd.concat([df, df], keys=['a', 'b'], axis=1)
joined_df_in.reset_index(drop=True, inplace=True)

with tempfile.NamedTemporaryFile(delete=False) as f:
    joined_df_in.to_csv(f.name, index=False)

What the file looks like

a,a,b,b
col_1,col_2,col_1,col_2

Expected Output

# in pandas 0.18.1
pd.read_csv(f.name, header=[0,1])

yields what we expect, an empty MultiIndex data frame

(a, col_1)  (a, col_2)  (b, col_1)  (b, col_2)
# in pandas 0.19
pd.read_csv(f.name, header=[0,1])

Throws

---------------------------------------------------------------------------
CParserError                              Traceback (most recent call last)
<ipython-input-10-1051c5f9aa58> in <module>()
----> 1 pd.read_csv(f.name, header=[0,1])

/Users/mik-OD/anaconda/envs/signals/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    643                     skip_blank_lines=skip_blank_lines)
    644 
--> 645         return _read(filepath_or_buffer, kwds)
    646 
    647     parser_f.__name__ = name

/Users/mik-OD/anaconda/envs/signals/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    386 
    387     # Create the parser.
--> 388     parser = TextFileReader(filepath_or_buffer, **kwds)
    389 
    390     if (nrows is not None) and (chunksize is not None):

/Users/mik-OD/anaconda/envs/signals/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    727             self.options['has_index_names'] = kwds['has_index_names']
    728 
--> 729         self._make_engine(self.engine)
    730 
    731     def close(self):

/Users/mik-OD/anaconda/envs/signals/lib/python3.5/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    920     def _make_engine(self, engine='c'):
    921         if engine == 'c':
--> 922             self._engine = CParserWrapper(self.f, **self.options)
    923         else:
    924             if engine == 'python':

/Users/mik-OD/anaconda/envs/signals/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1387         kwds['allow_leading_cols'] = self.index_col is not False
   1388 
-> 1389         self._reader = _parser.TextReader(src, **kwds)
   1390 
   1391         # XXX

pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5811)()

pandas/parser.pyx in pandas.parser.TextReader._get_header (pandas/parser.c:8615)()

CParserError: Passed header=[0,1], len of 2, but only 2 lines in file

Expected Output

Output of pd.show_versions()

For pandas 0.81

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.0.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: None
numpy: 1.11.1
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.1
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.7.3
boto: 2.40.0
pandas_datareader: None

For pandas 0.19


INSTALLED VERSIONS
------------------
commit: None
python: 3.5.0.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: None
numpy: 1.11.1
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.1
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.7.3
boto: 2.40.0
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO CSVread_csv, to_csv

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions