df.filter(like='col_name') 2.975X slower than basic column list comprehension

I have found that using the filter method to select columns that match a string pattern is ~3x slower than basic list comprehension on the df.columns list. Not sure how its implemented under the hood but for basic 'in' checks on lots of columns this could slow you down depending on how often you filter.

``` python
import pandas as pd
import numpy as np

# Generate Test DataFrame
NUM_ROWS = 2000
NUM_COLS = 1000
col_names = ['A'+num for num in map(str,np.arange(NUM_COLS).tolist())]
df = pd.DataFrame(np.random.randint(5, size=(NUM_ROWS,NUM_COLS)), dtype=np.int64, columns=col_names)
df['TEST'] = 0
df['TEST2'] = 0

%timeit df.filter(like='TEST')
1000 loops, best of 3: 1.19 ms per loop

%timeit df[[col for col in df.columns if 'TEST' in col]]
1000 loops, best of 3: 400 µs per loop

%time df.filter(like='TEST')
Wall time: 1 ms
Out[4]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2000 entries, 0 to 1999
Data columns (total 2 columns):
TEST     2000  non-null values
TEST2    2000  non-null values
dtypes: int64(2)

%time df[[col for col in df.columns if 'TEST' in col]]
Wall time: 1 ms
Out[5]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2000 entries, 0 to 1999
Data columns (total 2 columns):
TEST     2000  non-null values
TEST2    2000  non-null values
dtypes: int64(2)

pd.__version__
Out[7]: '0.12.0'

np.__version__
Out[8]: '1.7.1'

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

df.filter(like='col_name') 2.975X slower than basic column list comprehension #5657

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

df.filter(like='col_name') 2.975X slower than basic column list comprehension #5657

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions