Description
In my discussion with Jonathan and others and at the SciPy sprints, we agreed that it would be really nice to expose some minimal tools for manipulating and view the internal pandas blocks system. For example, it should be possible to:
- manually consolidate blocks
- view a representation of the internal blocking of a dataframe (via matplotlib?)
It's not so much that we want to create and use blocks directly, but that we want to make it easier to understand the internal data model and make performance with more predictable.
At the same time, we would like to disable automatic consolidation of blocks in the DataFrame constructor and when inserting new columns. Consolidation is certainly a useful feature, but it is currently not always possible to even predict when it will happen.
Most users never notice or care about consolidation. Power users (concerned about memory or performance) are at least as likely to find it frustrating as helpful, so we should make this something that they can trigger explicitly (as part of the blocks API). This would make it possible to create dataframes while guaranteeing that none of the data is copied (#9216).
cc @jonathanrocher @sinhrks @jreback @cpcloud @TomAugspurger @ARF1 @quicknir