BUG: if first row is short, read_csv raises exception instead of filling with NaN

Please see http://nbviewer.ipython.org/6443825 . Below is an unformatted version.

pandas' read_csv is supposed to handle short rows by filling them with NaN. It does so most of the time. However, if the first row is short and you specify header=None, then you get an error. This happens whether or not you specify column names.
In [20]:

from pandas import DataFrame, read_csv

This works:
In [21]:

mydata='1,2,3\n1,2\n1,2\n'
read_csv(StringIO.StringIO(mydata),header=None,names=['one','two','three'])

Out[21]:
    one     two     three
0   1   2   3
1   1   2   NaN
2   1   2   NaN

But this doesn't:
In [22]:

mydata='1,2\n1,2,3\n4,5,6\n'
read_csv(StringIO.StringIO(mydata),header=None,names=['one','two','three'])

---

CParserError                              Traceback (most recent call last)
<ipython-input-22-4dc7e64ee312> in <module>()
      1 mydata='1,2\n1,2,3\n4,5,6\n'
----> 2 read_csv(StringIO.StringIO(mydata),header=None,names=['one','two','three'])

/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze)
    397                     buffer_lines=buffer_lines)
    398 
--> 399         return _read(filepath_or_buffer, kwds)
    400 
    401     parser_f.__name__ = name

/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    206 
    207     # Create the parser.
--> 208     parser = TextFileReader(filepath_or_buffer, **kwds)
    209 
    210     if nrows is not None:

/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in **init**(self, f, engine, **kwds)
    505             self.options['has_index_names'] = kwds['has_index_names']
    506 
--> 507         self._make_engine(self.engine)
    508 
    509     def _get_options_with_defaults(self, engine):

/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
    607     def _make_engine(self, engine='c'):
    608         if engine == 'c':
--> 609             self._engine = CParserWrapper(self.f, **self.options)
    610         else:
    611             if engine == 'python':

/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in **init**(self, src, *_kwds)
    888         # #2442
    889         kwds['allow_leading_cols'] = self.index_col is not False
--> 890         self._reader = _parser.TextReader(src, *_kwds)
    891 
    892         # XXX

/usr/lib/python2.7/dist-packages/pandas/_parser.so in pandas._parser.TextReader.__cinit__ (pandas/src/parser.c:3946)()

/usr/lib/python2.7/dist-packages/pandas/_parser.so in pandas._parser.TextReader._get_header (pandas/src/parser.c:5628)()

CParserError: Column names have 3 fields, data has 2 fields


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: if first row is short, read_csv raises exception instead of filling with NaN #4749

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: if first row is short, read_csv raises exception instead of filling with NaN #4749

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions