Skip to content

subset keyword argument will include last column if incorrect keys are given #8303

Closed
@arijun

Description

@arijun

If you give a non-existent key to the subset argument it will default to the last column. For example:

In [3]: d = pd.DataFrame({'a':[1],'b':[2],'c':[np.nan]})
In [4]: len(d.dropna(subset=['x']))
Out[4]: 0

It seems the problem is in the line where dropna takes the subset:

agg_obj = self.take(ax.get_indexer_for(subset),axis=agg_axis)

Here, ax.get_indexer_for(subset) will return -1 as a sentinel value for any keys that were not found in subset, and self.take interprets the -1 as a request for the last column.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions