Use double caching for re._compile()

The caching algorithm for `re._compile()` is one of the hottest sites in the stdlib. It was rewritten many times (it is not all changes):
* https://github.com/python/cpython/issues/76519
* https://github.com/python/cpython/issues/72480
* https://github.com/python/cpython/issues/66700
* https://github.com/python/cpython/issues/60593
* https://github.com/python/cpython/issues/57436
* https://github.com/python/cpython/commit/d9e8cc6249c7ff4ceeff3217a7671bee623d88a7
* https://github.com/python/cpython/issues/53642
* https://github.com/python/cpython/commit/5a63183a8b8a9e177f97feac975850df5e6f98aa

The patch for using `lru_cache()` was repeatedly applied and reverted 3 times! Eventually it turned out to be slower than a simple dict lookup. It does not fit very well in this case, because some results should be not cached, and additional checks should be added before calling the cached function, adding significant overhead.

But the LRU caching algorithm can have advantage over the current simple algorithm if not all compiler patterns fit in the cache. It removes entries from the cache more smarty.

I tested with random keys with exponential distribution. With the cache size 512, the largest difference is for lambda between 1/70 and 1/80. The LRU caching algorithm has 3 times less misses: 0.16% to 0.33% against 0.5 to 1.1% misses in the current algorithm. For significantly larger (almost no misses) or smaller (tens percent of misses) lambda both algorithms gives almost the same result.

Direct implementation of the LRU caching algorithm using OrderedDict or just dict would be slower, because every hit requires moving the found entry to the end. But we can use double caching. The primary smaller and faster cache does not reorder entries. But if the key is not found in the primary cache, we look it up in the secondary LRU cache. This algorithm has the same number of misses as the LRU caching algorithm, but it has faster hits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use double caching for re._compile() #96346

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Use double caching for re._compile() #96346

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions