Can we make AllocId actually uniquely "identify" an allocation?

The way `AllocId` works right now is super counter-intuitive: they are entirely a per-crate identifier, and when loading the metadata of another crate, we generate a fresh "local `AllocId`" for each ID we encounter in the other crate and re-map everything we load. (At least I think that's what happens, @oli-obk please correct me if I am wrong.)

Unfortunately this means that a `ConstValue` that holds a pointer isn't actually a "value" in the usual sense of the world: if the value is computed in one crate and then used in another crate, its `AllocId` gets re-mapped. During code generation, when we encounter such an `AllocId`, we just always generate a local copy of that allocation and point to there. This means the "same" `ConstValue`, codegen'd in different crates, can result in observably different values! That's extremely confusing for users and compiler devs alike (https://github.com/rust-lang/rust/issues/84581, https://github.com/rust-lang/rust/issues/123670). In many cases this will get de-duplicated later but we can't always rely on that.

So... I'd like to consider switching how `AllocId`s work, with the goal of making `ConstValue` actually be a value. This will make https://github.com/rust-lang/rust/pull/121644 unnecessary: we can just evaluate the static once, store its final value, and use that in all crates without running into issues like [this](https://github.com/rust-lang/rust/issues/79738#issuecomment-1937752137). This requires not re-mapping `AllocId`, and instead when crate B receives a `ConstValue` from crate A it should be able to point to the allocation already generates by crate A. Unfortunately I am largely unfamiliar with how we manage "cross-crate identity of objects" so I don't know what the possible options here look like.

Some first rough ideas that popped into my head:
1. We could pick `AllocId` uniformly at random and fail when loading two crates that happened to get the same ID. That's fundamentally non-reproducible so either we have to make sure these `AllocId` don't matter for *anything* except the question whether they are equal or not (that seems hard to enforce) or we have to pick some deterministic scheme based on this. Also, curing codegen, how would we know whether the allocation has been previously already generated or whether it is our job to generate it? We'd have to keep track of which `AllocId` are "local", or so.
2. Use the first 32bits of `AllocId` to store the `CrateNum` of the crate that generated the allocation, and the rest to store some sort of per-crate allocation ID. I guess this still has to be remapped on load, but then during codegen when we encounter another crate's allocation we'd import it instead of generating a copy.
3. When interning an allocation, we *always* generate something akin to a `DefId`. `AllocId` outside of an interpreter session basically becomes `DefId` (or a new kind of ID with the same properties). We don't even need an `alloc_map` in `tcx` any more, we just have a new kind of "definition" that represents "global allocations" and a query taking a `DefId` and returning a `GlobalAlloc`. (That query would mostly, if not exclusively, be computed by feeding, maybe except for `static`s that it could evaluate directly. I guess if it is *exclusively* feeding it doesn't make much sense to make this a query rather than a normal hash map.)
    Inside the interpreter, we certainly don't want to generate a `DefId` for each allocation. I can imagine two schemes here:
    1. Reserve a  `CrateNum` value to indicate "local interpreter instance" so that we can just make up `DefIndex`es locally while the interpreter runs and still know which allocations need to be looked up where. During interning, we generate proper `DefId` inside `LOCAL_CRATE` and remap everything we encounter.
    2. Still use the same `AllocId` type that we do now, but make it valid only inside an interpreter instance, and track a per-interpreter-instance mapping between global `DefId` and local `AllocId`. Unfortunately this means extra work whenever we "import" a global allocation into an interpreter instance as we need to apply that mapping (and then map back during interning).

The last two schemes (2 and 3) seem fairly similar, given that `DefId` is just `CrateNum` + per-crate `DefIndex`. The only difference is whether there's a single shared "index" namespace for everything or a dedicated namespace for allocations. My main concern with the single shared namespace is that we'd quite like to use some bits for other purposes inside `AllocId`: we want it to have a niche. We also probably need to distinguish allocations inside the current interpreter instance from "global allocations" (and do a remapping during interning), and at least inside an interpreter instance we are using some bits to track whether the pointer is derived from a shared reference and whether that shared reference had interior mutability. Option 2 could possibly entirely avoid doing any kind of mapping during interning, if we think that 2^30 total allocations are enough for every crate -- though I assume interning is already quite expensive so maybe it's not worth optimizing for that. It does seem worth optimizing for "no remapping when accessing previously interned global allocations", which excludes 3ii (which might otherwise be my favorite as it keeps everything fairly clear).

@oli-obk @rust-lang/wg-const-eval  any thoughts?
@compiler-errors @wesleywiser I know you're not const-eval experts but maybe you know the query system sufficiently well to provide some helpful input. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we make AllocId actually uniquely "identify" an allocation? #128775

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can we make AllocId actually uniquely "identify" an allocation? #128775

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions