Skip to content

Expose the blocks API and disable automatic consolidation #10556

Closed
@shoyer

Description

@shoyer

In my discussion with Jonathan and others and at the SciPy sprints, we agreed that it would be really nice to expose some minimal tools for manipulating and view the internal pandas blocks system. For example, it should be possible to:

  1. manually consolidate blocks
  2. view a representation of the internal blocking of a dataframe (via matplotlib?)

It's not so much that we want to create and use blocks directly, but that we want to make it easier to understand the internal data model and make performance with more predictable.

At the same time, we would like to disable automatic consolidation of blocks in the DataFrame constructor and when inserting new columns. Consolidation is certainly a useful feature, but it is currently not always possible to even predict when it will happen.

Most users never notice or care about consolidation. Power users (concerned about memory or performance) are at least as likely to find it frustrating as helpful, so we should make this something that they can trigger explicitly (as part of the blocks API). This would make it possible to create dataframes while guaranteeing that none of the data is copied (#9216).

cc @jonathanrocher @sinhrks @jreback @cpcloud @TomAugspurger @ARF1 @quicknir

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignClosing CandidateMay be closeable, needs more eyeballsEnhancementInternalsRelated to non-user accessible pandas implementationNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions