Skip to content

DEPR: behavior of copy argument in df.reindex is confusing #34663

Open
@fujiaxiang

Description

@fujiaxiang

This was inspired by #33888 and #34584

Problem description

The behavior of copy argument in df.reindex is confusing. Current docstring does it explain it sufficiently clear. It also seems to me copy is unnecessary.

Currently the docstring says

...

A new object is produced unless the new index is equivalent to the current one and ``copy=False``.

...

copy : bool, default True
       Return a new object, even if the passed indexes are the same.

It is hard to clarify what is considered an "equivalent" index. See below for more details.

Further, I believe users rarely purposefully tries to reindex with an "equivalent" index. It happens only if the user does not yet know the current index or the index to conform to, in which case a consistent behavior (e.g. always return new object) is probably preferred.

# On current master
>>> pd.__version__
'1.1.0.dev0+1802.g942beba1e'

>>> df = pd.DataFrame(range(3))
>>> df
   0
0  0
1  1
2  2
>>> df.index
RangeIndex(start=0, stop=3, step=1)

# not equivalent
>>> df is df.reindex(range(3), copy=False)
False

# not equivalent
>>> df is df.reindex(list(range(3)), copy=False)
False

# equivalent
>>> df is df.reindex(pd.RangeIndex(start=0, stop=3, step=1), copy=False)
True

>>> df = pd.DataFrame(range(3), index=list(range(3)))
>>> df
   0
0  0
1  1
2  2
>>> df.index
Int64Index([0, 1, 2], dtype='int64')

# not equivalent
>>> df is df.reindex(range(3), copy=False)
False

# even this is considered not equivalent
>>> df is df.reindex(list(range(3)), copy=False)
False

>>> df is df.reindex(pd.Int64Index([0, 1, 2]), copy=False)
True

You can see it is actually pretty strict to be "equivalent". I feel it does really make sense to have this copy parameter because reindex will return a new object in most cases anyway even when copy=False.

So the question is, can we deprecate copy?

Metadata

Metadata

Assignees

No one assigned

    Labels

    DeprecateFunctionality to remove in pandasIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions