Description
PDEP-7 did not spell it out explicitly, but a consequence of Copy-on-Write is that the copy
keyword is no longer very useful.
Currently a bunch of methods have this keyword (astype
, rename
, reindex
, ..., full list at #50535), for example:
>>> df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df2 = df.rename(columns=str.upper) # has a default of `copy=True`
With the current default behaviour df2
is a full copy of df
. This default of copy=True
will change to no longer copy when CoW is enabled (but act as "delayed" copy). Users could nowadays use copy=False
to avoid the full copy, but this will no longer be possible with CoW (the previous concept of "shallow copy" no longer exists, xref #36195 (comment)). So passing copy=False
is something we will have to deprecate anyhow.
In theory we could keep copy=True
as a non-default option, which would result in an actual hard copy instead of the CoW-tracked view. However, in #50535, we essentially already decided to not do this, and in the CoW mode currently a copy=True
is simply ignored.
The idea is that if a user really wants a hard copy, they can add a .copy()
in the chain (e.g. df2 = df.rename(..).copy()
instead of df2 = df.rename(..., copy=True)
. But so in #50535 we felt that it was not worth to keep a whole keyword for such minor use case which has a clear and easy alternative.
So the consequence of the current behaviour with CoW enabled is that we can deprecate the copy
keyword altogether. The idea is that we can already start doing this slowly with a DeprecationWarning in pandas 2.2, which at the same time can point people to enable CoW in the warning message as alternative.
While it's a consequence of the CoW behaviour changes, it's still deprecating a keyword in 15+ methods, so opening this issue for visibility. cc @pandas-dev/pandas-core