Skip to content

BUG: read_csv names argument inconsisten between c and python engine #38453

Closed
@phofl

Description

@phofl
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

After #38445 there is another inconsistency left to adress.

s = """a, b, c, d
1,2,3,4,
5,6,7,8,"""
pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="c")

pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="python")

Problem description

The bug is caused from the differing lenghts of the header and the names argument.

This returns

   A  B  C  D   E
0  1  2  3  4 NaN
1  5  6  7  8 NaN

for the c engine and raises

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.3/scratches/scratch_4.py", line 323, in <module>
    print(pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="python"))
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 605, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 457, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 814, in __init__
    self._engine = self._make_engine(self.engine)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 1045, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 2303, in __init__
    ) = self._infer_columns()
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 2692, in _infer_columns
    raise ValueError(
ValueError: Number of passed names did not match number of header fields in the file

Process finished with exit code 1

Expected Output

Would expect that both return the same and python engine does not raise.

Output of pd.show_versions()

master

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions