BUG: ExcelFile.parse() skiprows arg doesn't play nice

related/dup #4340

Was building on [this example on SO](http://stackoverflow.com/questions/15555005/get-inferred-dataframe-types-iteratively-using-chunksize), and found a bug in `ExcelFile().parse` option `skiprows`, when passed an index.

``` python
In [2]: xls = pd.ExcelFile('example.xlsx')
In [3]: df = xls.parse(xls.sheet_names[0])
In [4]: print df
              Bill Date  Meter #  Type  SIC Code  Billing Days  Rate Code
0   2011-10-01 00:00:00  1892213     E      8111            29      ALTOU
1   2011-11-01 00:00:00  1892213     E      8111            29      ALTOU
2                   NaN      NaN   NaN       NaN           NaN        NaN
3                   NaN      NaN   NaN       NaN           NaN        NaN
4             Bill Date  Meter #  Type  SIC Code  Billing Days  Rate Code
5   2011-10-01 00:00:00   553961     G      8111            29        GN3
6   2011-11-01 00:00:00   553961     G      8111            29        GN3
7                   NaN      NaN   NaN       NaN           NaN        NaN
8             Bill Date  Meter #  Type  SIC Code  Billing Days  Rate Code
9   2011-10-01 00:00:00  6322158     E                      29          A
10  2011-11-01 00:00:00  6322158     E                      29          A
```

Wanting to avoid the repeated headers and empty rows, I parse with `skiprows`, but no dice (no difference with previously parsed sheet):

``` python
In [5]: skip_idx = np.array([2,3,4,7,8])
In [6]: df = xls.parse(xls.sheet_names[0], skiprows=skip_idx+1)
In [8]: df
Out[8]: 
              Bill Date  Meter #  Type  SIC Code  Billing Days  Rate Code
0   2011-10-01 00:00:00  1892213     E      8111            29      ALTOU
1   2011-11-01 00:00:00  1892213     E      8111            29      ALTOU
2                   NaN      NaN   NaN       NaN           NaN        NaN
3                   NaN      NaN   NaN       NaN           NaN        NaN
4             Bill Date  Meter #  Type  SIC Code  Billing Days  Rate Code
5   2011-10-01 00:00:00   553961     G      8111            29        GN3
6   2011-11-01 00:00:00   553961     G      8111            29        GN3
7                   NaN      NaN   NaN       NaN           NaN        NaN
8             Bill Date  Meter #  Type  SIC Code  Billing Days  Rate Code
9   2011-10-01 00:00:00  6322158     E                      29          A
10  2011-11-01 00:00:00  6322158     E                      29          A
```

Meanwhile, our little friend `pd.read_csv()`, doesn't seem to have the same hangups. Creating a csv directly from the original `.xlsx` and performing the same operations:

``` python
In [9]: df = pd.read_csv('example.csv')
In [10]: df
Out[10]: 
    Bill Date  Meter #  Type  SIC Code  Billing Days  Rate Code
0      Oct-11  1892213     E      8111            29      ALTOU
1      Nov-11  1892213     E      8111            29      ALTOU
2         NaN      NaN   NaN       NaN           NaN        NaN
3         NaN      NaN   NaN       NaN           NaN        NaN
4   Bill Date  Meter #  Type  SIC Code  Billing Days  Rate Code
5      Oct-11   553961     G      8111            29        GN3
6      Nov-11   553961     G      8111            29        GN3
7         NaN      NaN   NaN       NaN           NaN        NaN
8   Bill Date  Meter #  Type  SIC Code  Billing Days  Rate Code
9      Oct-11  6322158     E                      29          A
10     Nov-11  6322158     E                      29          A
```

Looks fine, and now passing the `skiprows` option, and I get a correctly parsed `df` with all the guilty lines missing:

``` python
In [11]: df = pd.read_csv('example.csv', skiprows=skip_id+1)
In [12]: df
Out[12]: 
  Bill Date  Meter # Type SIC Code  Billing Days Rate Code
0    Oct-11  1892213    E     8111            29     ALTOU
1    Nov-11  1892213    E     8111            29     ALTOU
2    Oct-11   553961    G     8111            29       GN3
3    Nov-11   553961    G     8111            29       GN3
4    Oct-11  6322158    E                     29         A
5    Nov-11  6322158    E                     29         A
```

Started tracing for troubleshooting, but after the 5th handoff of the `skiprows` parameter all over in `.io`, I gave up. :(

Thoughts? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: ExcelFile.parse() skiprows arg doesn't play nice #4903

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: ExcelFile.parse() skiprows arg doesn't play nice #4903

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions