Implement "small substs optimization" for substs of length 1

One of the core types in the compiler is `ty::Subst`, which represents a list of "substitutions" - arguments to generic parameters. For example, `Option<u32>` has a `Substs` of `[Uint(u32)]` and `T: Sized` has a `Substs` of `[Param(T)]`.

A `ty::Substs` is an interned `&'tcx List<Kind<'tcx>>`. It is represented as a thin pointer to a struct of this form:

```Rust
pub struct List<T> {
    len: usize,
    data: [T; 0],
    opaque: OpaqueListContents,
}
```

Where a `List` of length 0 is represented by this bit of evil code:
```Rust
impl<T> List<T> {
    #[inline(always)]
    pub fn empty<'a>() -> &'a List<T> {
        #[repr(align(64), C)]
        struct EmptySlice([u8; 64]);
        static EMPTY_SLICE: EmptySlice = EmptySlice([0; 64]);
        assert!(mem::align_of::<T>() <= 64);
        unsafe {
            &*(&EMPTY_SLICE as *const _ as *const List<T>)
        }
    }
}
```

And `List`s of non-zero length are represented by a `List` that references an interner. This is inefficient, because I'm quite sure that many `Subst`s are of length 1 (e.g., consider all the trait bounds of traits with no type parameter).

When we look into the interner, there are 2 problems:
1. This requires a memory lookup, which increases CPU cache usage and can cause CPU cache misses.
2. When we are creating a new type, we need to intern the substs, which causes hashmap lookups and even more CPU cache misses.

## Potential for measurements

It might be nice to count the amount of `Substs` of each length while compiling popular crates. This should be rather easy if you modify the interner.

## Implementation Strategy

### Stage 1 - getting a newtype

There are a few representations for `Substs` that are possible, but in any case I think the best way to go forward would be to first hide the representation.

Currently a `Substs` is this typedef:
```Rust
pub type Substs<'tcx> = List<Kind<'tcx>>;
```

And it is normally used as `&'tcx Substs<'tcx>`.

We would need to replace it with something that has a hidden representation, i.e. initially a newtype of the form:
```Rust
pub struct SubstsRef<'tcx> {
    inner: &'tcx Substs<'tcx>
}
```

I think a reasonable way of going around this would be to first have a
```rust
pub type SubstsRef<'tcx> = &'tcx Substs<'tcx>;
```

Then retire the use of the old `Substs` typedef in most places, then switch `SubstsRef` to be a newtype.

When using a newtype, it's probably a good idea to have a `Deref` impl of this form:
```Rust
impl<'tcx> Deref for SubstsRef<'tcx> {
    type Target = [Kind<'tcx>];

    #[inline]
    fn deref(&self) -> &Self::Target { self.inner }
}
```

This will avoid needing to implement specific methods in most cases.

Also remember to implement the "derive" trait impls (`Hash`, `PartialEq`, etc.) as needed.

## Stage 2 - improving the representation

My preferred representation is as follows:

Use the first 2 bits as the tag, in the style of [`TAG_MASK`], e.g.
```rust
const TYPE_TAG: usize = 0b00;
const REGION_TAG: usize = 0b01;
const LIST_TAG: usize = 0b10;
```

Then represent things as follows:

A substs of length 0: `List::empty() | LIST_TAG`
A substs of a single type: `ty | TYPE_TAG`
A substs of a single region: `ty | REGION_TAG`
A substs of length >1: `substs | LIST_TAG`

You'll want to define `Substs` as follows:
```rust
#[repr(C)]
pub struct SubstsRef<'tcx> {
    ptr: NonZeroUsize,
    marker: PhantomData<Kind<'tcx>>
}
```

Then you can implement `Deref` in this style:
```Rust
impl<'tcx> Deref for SubstsRef<'tcx> {
    type Target = [Kind<'tcx>];

    #[inline]
    fn deref(&self) -> &Self::Target {
        let ptr = self.ptr.get();
        match ptr & TAG_MASK {
            REGION_TAG | TYPE_TAG => {
                // We still match layout with `Kind`.
                let this: &[Kind; 1] = mem::transmute(self);
                this
            }
            LIST_TAG => {
                let inner: &List<Kind> = &*((ptr & !TAG_MASK) as *const _);
                inner
            }
            _ => intrinsics::unreachable()
        }
    }
}
```

Then you'll basically only need to change the interning functions not to go through the interner for substs of length 1. As long as you always consistently map substs of length 1 to not use an interner, you can do `PartialEq`, `Hash` etc. "by value".

Then you can "cut the ribbon" and do a `@bors try` to see how much perf you gain.

## Stage 2a - Cleanup

Remove occurrences of `InternalSubsts` in comments.

## Items for later investigation

It might be worth investigating whether it's worth extending the small substs optimization to substs of length 2 to get rid of even more interner calls - that would increase the size of `TyS` and things, so it might be a space-time tradeoff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement "small substs optimization" for substs of length 1 #58310

Potential for measurements

Implementation Strategy

Stage 1 - getting a newtype

Stage 2 - improving the representation

Stage 2a - Cleanup

Items for later investigation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement "small substs optimization" for substs of length 1 #58310

Description

Potential for measurements

Implementation Strategy

Stage 1 - getting a newtype

Stage 2 - improving the representation

Stage 2a - Cleanup

Items for later investigation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions