Description
The way AllocId
works right now is super counter-intuitive: they are entirely a per-crate identifier, and when loading the metadata of another crate, we generate a fresh "local AllocId
" for each ID we encounter in the other crate and re-map everything we load. (At least I think that's what happens, @oli-obk please correct me if I am wrong.)
Unfortunately this means that a ConstValue
that holds a pointer isn't actually a "value" in the usual sense of the world: if the value is computed in one crate and then used in another crate, its AllocId
gets re-mapped. During code generation, when we encounter such an AllocId
, we just always generate a local copy of that allocation and point to there. This means the "same" ConstValue
, codegen'd in different crates, can result in observably different values! That's extremely confusing for users and compiler devs alike (#84581, #123670). In many cases this will get de-duplicated later but we can't always rely on that.
So... I'd like to consider switching how AllocId
s work, with the goal of making ConstValue
actually be a value. This will make #121644 unnecessary: we can just evaluate the static once, store its final value, and use that in all crates without running into issues like this. This requires not re-mapping AllocId
, and instead when crate B receives a ConstValue
from crate A it should be able to point to the allocation already generates by crate A. Unfortunately I am largely unfamiliar with how we manage "cross-crate identity of objects" so I don't know what the possible options here look like.
Some first rough ideas that popped into my head:
- We could pick
AllocId
uniformly at random and fail when loading two crates that happened to get the same ID. That's fundamentally non-reproducible so either we have to make sure theseAllocId
don't matter for anything except the question whether they are equal or not (that seems hard to enforce) or we have to pick some deterministic scheme based on this. Also, curing codegen, how would we know whether the allocation has been previously already generated or whether it is our job to generate it? We'd have to keep track of whichAllocId
are "local", or so. - Use the first 32bits of
AllocId
to store theCrateNum
of the crate that generated the allocation, and the rest to store some sort of per-crate allocation ID. I guess this still has to be remapped on load, but then during codegen when we encounter another crate's allocation we'd import it instead of generating a copy. - When interning an allocation, we always generate something akin to a
DefId
.AllocId
outside of an interpreter session basically becomesDefId
(or a new kind of ID with the same properties). We don't even need analloc_map
intcx
any more, we just have a new kind of "definition" that represents "global allocations" and a query taking aDefId
and returning aGlobalAlloc
. (That query would mostly, if not exclusively, be computed by feeding, maybe except forstatic
s that it could evaluate directly. I guess if it is exclusively feeding it doesn't make much sense to make this a query rather than a normal hash map.)
Inside the interpreter, we certainly don't want to generate aDefId
for each allocation. I can imagine two schemes here:- Reserve a
CrateNum
value to indicate "local interpreter instance" so that we can just make upDefIndex
es locally while the interpreter runs and still know which allocations need to be looked up where. During interning, we generate properDefId
insideLOCAL_CRATE
and remap everything we encounter. - Still use the same
AllocId
type that we do now, but make it valid only inside an interpreter instance, and track a per-interpreter-instance mapping between globalDefId
and localAllocId
. Unfortunately this means extra work whenever we "import" a global allocation into an interpreter instance as we need to apply that mapping (and then map back during interning).
- Reserve a
The last two schemes (2 and 3) seem fairly similar, given that DefId
is just CrateNum
+ per-crate DefIndex
. The only difference is whether there's a single shared "index" namespace for everything or a dedicated namespace for allocations. My main concern with the single shared namespace is that we'd quite like to use some bits for other purposes inside AllocId
: we want it to have a niche. We also probably need to distinguish allocations inside the current interpreter instance from "global allocations" (and do a remapping during interning), and at least inside an interpreter instance we are using some bits to track whether the pointer is derived from a shared reference and whether that shared reference had interior mutability. Option 2 could possibly entirely avoid doing any kind of mapping during interning, if we think that 2^30 total allocations are enough for every crate -- though I assume interning is already quite expensive so maybe it's not worth optimizing for that. It does seem worth optimizing for "no remapping when accessing previously interned global allocations", which excludes 3ii (which might otherwise be my favorite as it keeps everything fairly clear).
@oli-obk @rust-lang/wg-const-eval any thoughts?
@compiler-errors @wesleywiser I know you're not const-eval experts but maybe you know the query system sufficiently well to provide some helpful input. :)