Skip to content

Copy-on-Write (PDEP-7) follow-up overview issue #48998

Open
@jorisvandenbossche

Description

@jorisvandenbossche

PDEP-7: https://pandas.pydata.org/pdeps/0007-copy-on-write.html

An initial implementation was merged in #46958 (with the proposal described in more detail in https://docs.google.com/document/d/1ZCQ9mx3LBMy-nhwRl33_jgcvWo9IWdEfxDNQ2thyTb0/edit / discussed in #36195).

In #36195 (comment) I mentioned some next steps that are still needed; moving this to a new issue.

Implementation

Complete the API surface:

Improve the performance

  • Optimize setitem operations to prevent copies of whole blocks (eg splitting the block could help keeping a view for all other columns, and we only take a copy for the columns that are modified) where splitting the block could keep a view for all other columns, and
  • Check overall performance impact (eg run asv with / without CoW enabled by default and see the difference)

Provide upgrade path:

  • Add a warning mode that gives deprecation warnings for all cases where the current behaviour would change (initially also behind an option): CoW warning mode for cases that will change behaviour #56019
    • We can also update the message of the existing SettingWithCopyWarnings to point users towards enabling CoW as a way to get rid of the warnings
    • Add a general FutureWarning "on first use that would change" that is only raised a single time

Documentation / feedback

Aside from finalizing the implementation, we also need to start documenting this, and it will be super useful to have people give this a try, run their code or test suites with it, etc, so we can iron out bugs / missing warnings / or discover unexpected consequences that need to be addressed/discussed.

  • Document this new feature (how it works, how you can test it)
  • We can still add a note to the 1.5 whatsnew linking to those docs
  • Write a set of blogposts on the topic
  • Gather feedback from users / downstream packages
  • Update existing documentation:
  • Write an upgrade guide

Some remaining aspects of the API to figure out:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions