Increase _PY_NSMALLPOSINTS size

# Feature or enhancement

### Proposal:

See link to faster CPython for initial analyses and backstory if interested, but I will try to summarise it all here:

---

So the intent is to make `int` creation faster:

```python
from collections import deque
consume = deque(maxlen=0).extend
INT_RANGE = list(range(100_000))
%timeit consume(range(100_000))    # 2.3 ms
%timeit consume(iter(INT_RANGE))   # 0.6 ms
```

Ant pre-storing `int` objects is straight forward path to achieve this:

```python
from itertools import chain
chain = chain.from_iterable
%timeit consume(range(100_000))                         # 2.1 ms
%timeit consume(chain([range(256)] * (100_000 // 256))) # 0.5 ms
```

While increasing value of `_PY_NSMALLPOSINTS` does exactly that.

---

Benefits of this can be observed in benchmarks in `pyperformance`.
With each incremental increase of this number new statistically significant benefits surface:
1) 1024 results in 10% faster `regex_v8`
2) 2048 adds 4% faster `regex_dna` and 8% faster `regex_effbot`
3) 8192 adds the following in the range of 4-11%: `genshi_*`, `regex_compile`, `scimark_*`, `spectral_norm`, `xml_*` and few more
4) 100K adds 9% faster `scimark_monte_carlo`, 6% faster `scimark_sparse_mat_mult` and 12% faster `spectral_norm`

As it can be seen, each range of integers have benefits for different applications:
1) <2048 covers most of standard everyday use cases. Additionally it would benefit some cases of iterative algorithms such as string parsing
2) 8192 benefits a certain portion of scientific application of linear algebra with dimensions up to this number and integer operations where results largely fall into this range
3) 100K+ has observable impact for scientific (and not only) applications of high dimensionality, such as sparse matrices, large graphs and similar

This number can be increased to 1M, 10M, etc and it is possible to find further observable benefits in specific applications.

---

Having that said, more involved scientific applications should aim to benefit from optimized libraries such as `numpy`, which most likely leaves (3) way out of scope of this change.

To attempt to find most optimal number the following two calculations can give some insight:

1. Determining what sort of integers are used during startup and while importing all standard library packages.
 
Python is launched with:
```python
from importlib import import_module
names = set(sys.stdlib_module_names)
names -= {'_msi', 'msilib', '_overlapped', '_winapi', '_wmi'}
names -= {'spwd', 'ossaudiodev', 'msvcrt', 'nt', 'winreg', 'winsound'}
names -= {'antigravity', 'this'}
mods = [import_module(name) for name in names]
```
Cumulative density graph of `PyLongObject` requests:

![Image](https://github.com/user-attachments/assets/17a85e6c-f588-4633-a818-6e647bb6db1c)

256 captures ~83% of all used numbers in startup and imports.
Beyond 256, there are 2 visible increments of at 512 and 4096.
4096 would increase coverage to 93%.

---

2. Github use case analysis

Above only captures imports and not the user use cases.
For that `github` search for `\brange(\d+)` function can provide some more insight. 

![Image](https://github.com/user-attachments/assets/cd13525c-33df-414a-9227-fd59e54c383b)
Total of 4.3M use cases.

It can be seen that 91% of ranges that are initialised with one argument have it set below 260. So current cache size covers it well.
However, the actual number of use cases is not indicative of performance benefit.
What is more appropriate here is to see how many integer objects would be re-used at each cache size.
Below is the graph of extra integers re-used with each incremental increase of this number.

![Image](https://github.com/user-attachments/assets/ae360672-3e61-4bd7-af81-a5ffba86432e)

So current number 256 allows re-using of 99M (11+54+34) integers that are generated by `range` cases.
Increasing it to 1000 would add extra 33M.
Increasing it to 10000 would add extra 240M (33 + 2 + 5 + 200).
Etc...

So this is in line with findings from benchmark results above:
1. There is benefit in increasing cache size to a medium sized number <= 10K.
2. Increasing this number further has performance benefits up to very high value. Although these can be significant, they are rare cases and average user should not be paying for them.

---

So my best guess is that the upper bound for this number is ~10K, which:
1. Increases startup time by 1% (with every extra 10K).
2. Takes 280KB (2-3% of of initial Python process) of memory

There should be a very good reason to sacrifice anything more only for integer storage.
(And I suspect that this can be higher than what others would agree on.)

From all the information above together with some analyses from @eendebakpt in https://github.com/faster-cpython/ideas/issues/725 it seems that both 1024 and 2048 could be good candidates.

---

So this is my best inference.
What do others think?
1. Is such change possible?
2. If yes, is upper bound above reasonable?
3. What is the most optimal number to set?

### Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

### Links to previous discussion of this feature:

https://github.com/faster-cpython/ideas/issues/725


### Linked PRs
* gh-133160

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase _PY_NSMALLPOSINTS size #133059

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Increase _PY_NSMALLPOSINTS size #133059

Description

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions