PERF: Remove _item_cache

Discussion copied over from #49450

In OP of #49450(discusses turning on the _item_cache for CoW),

Context: 

> Currently, we use an item cache for DataFrame columns -> Series. Whenever we access a certain column, we cache the resulting Series in `df._item_cache`, and the next time we access a column, we first check if that column already exists in the cache and if so return that directly. I suppose this was done for making repeated column access faster (although the Series construction overhead for this fast path case also has improved I think). But is also has some behavioral consequences, i.e. Series objects from column access can be _identical_ objects, depending on the context:
> 
> ```python
> >>> df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
> >>> s1 = df["a"]
> >>> s2 = df["a"]
> >>> df['b'] = 10 # set existing column -> clears the item cache
> >>> s3 = df["a"]
> >>> s1 is s2
> True
> >>> s1 is s3
> False
> ```
> 

This caching can also have other side effects, though. In investigating #29411, I found that methods like ``memory_usage``(also looks like ``round``, ``duplicated``, may be affected from a quick glance at frame.py) that iterate through all the columns by calling ``.items()``, will actually cause all the columns to be cached in _item_cache, which blows up memory usage. 

This might be tricky to do, though, as Joris noted, since this would be a behavior change. 
We should discuss here how we want to go about doing this(needs deprecation?).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Remove _item_cache #50547

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PERF: Remove _item_cache #50547

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions