Description
This was inspired by #33888 and #34584
Problem description
The behavior of copy
argument in df.reindex
is confusing. Current docstring does it explain it sufficiently clear. It also seems to me copy
is unnecessary.
Currently the docstring says
...
A new object is produced unless the new index is equivalent to the current one and ``copy=False``.
...
copy : bool, default True
Return a new object, even if the passed indexes are the same.
It is hard to clarify what is considered an "equivalent" index. See below for more details.
Further, I believe users rarely purposefully tries to reindex
with an "equivalent" index. It happens only if the user does not yet know the current index or the index to conform to, in which case a consistent behavior (e.g. always return new object) is probably preferred.
# On current master
>>> pd.__version__
'1.1.0.dev0+1802.g942beba1e'
>>> df = pd.DataFrame(range(3))
>>> df
0
0 0
1 1
2 2
>>> df.index
RangeIndex(start=0, stop=3, step=1)
# not equivalent
>>> df is df.reindex(range(3), copy=False)
False
# not equivalent
>>> df is df.reindex(list(range(3)), copy=False)
False
# equivalent
>>> df is df.reindex(pd.RangeIndex(start=0, stop=3, step=1), copy=False)
True
>>> df = pd.DataFrame(range(3), index=list(range(3)))
>>> df
0
0 0
1 1
2 2
>>> df.index
Int64Index([0, 1, 2], dtype='int64')
# not equivalent
>>> df is df.reindex(range(3), copy=False)
False
# even this is considered not equivalent
>>> df is df.reindex(list(range(3)), copy=False)
False
>>> df is df.reindex(pd.Int64Index([0, 1, 2]), copy=False)
True
You can see it is actually pretty strict to be "equivalent". I feel it does really make sense to have this copy
parameter because reindex
will return a new object in most cases anyway even when copy=False
.
So the question is, can we deprecate copy
?