Skip to content

pandas.read_fwf doesn't work with skiprows=callable #20603

Open
@ghost

Description

Code Sample, a copy-pastable example if possible

import pandas

table = """\
id8141    360.242940   149.910199   11950.7
id1594    444.953632   166.985655   11788.4
id1849    364.136849   183.628767   11806.2
id1230    413.836124   184.375703   11916.8
id1948    502.953953   173.237159   12468.3"""

fwf_path = "fwf.dat"
with open(fwf_path, "wb") as fh:
    fh.write(table.encode('utf8'))

def should_skip(row):
    return True # normally condition

table = pandas.read_fwf(fwf_path, skiprows=should_skip)

Problem description

Traceback (most recent call last):
File "fwf_bug.py", line 19, in
table = pandas.read_fwf(fwf_path, skiprows=should_skip)
File "some-path/_venv/lib/python3.5/site-packages/pandas/io/parsers.py", line 741, in read_fwf
return _read(filepath_or_buffer, kwds)
File "some-path/_venv/lib/python3.5/site-packages/pandas/io/parsers.py", line 449, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "some-path/_venv/lib/python3.5/site-packages/pandas/io/parsers.py", line 818, in init
self._make_engine(self.engine)
File "some-path/_venv/lib/python3.5/site-packages/pandas/io/parsers.py", line 1059, in _make_engine
self._engine = klass(self.f, **self.options)
File "some-path/_venv/lib/python3.5/site-packages/pandas/io/parsers.py", line 3412, in init
PythonParser.init(self, f, **kwds)
File "some-path/_venv/lib/python3.5/site-packages/pandas/io/parsers.py", line 2079, in init
self._make_reader(f)
File "some-path/_venv/lib/python3.5/site-packages/pandas/io/parsers.py", line 3416, in _make_reader
self.comment, self.skiprows)
File "some-path/_venv/lib/python3.5/site-packages/pandas/io/parsers.py", line 3316, in init
self.colspecs = self.detect_colspecs(skiprows=skiprows)
File "some-path/_venv/lib/python3.5/site-packages/pandas/io/parsers.py", line 3373, in detect_colspecs
rows = self.get_rows(n, skiprows)
File "some-path/_venv/lib/python3.5/site-packages/pandas/io/parsers.py", line 3361, in get_rows
if i not in skiprows:
TypeError: argument of type 'function' is not iterable

Expected Output

Script runs successfully

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: 6.3.0
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.1
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions