Consider adding second initialization phase after `start`

This issue captures the motivation, summary and sketch of an idea for improving how snapshots work in the component model.

## Motivation

There are a number of scenarios where we'd like to reduce component initialization time by capturing a "snapshot" of component state after some deterministic interval of execution so that starting from the snapshot is semantically equivalent to starting from the beginning.  For example, a snapshot can capture the result of:
* initializing a language runtime
* parsing and executing the "top-level" code like scripts or global constructors
* processing imported routing configuration rules

One way to do this is with [wizer](https://github.com/bytecodealliance/wizer), which is an impressive tool that is widely used for this purpose already.  However:
* because wizer needs to emit valid wasm, it has some expressive limitations and thus currently only handles a subset of core wasm, with anticipated problems expressing complex linked component DAGs;
* everything goes into linear memory (via an active data segment), which may increase a component's memory usage more than otherwise needed;
* the original file structure and granularity is lost, so editing a file requires re-wizening and updates the entire active data segment (which impedes our ability to de-dupe individual assets via, e.g., OCI layer machinery).

An alternative and complementary approach is to do snapshotting at "deployment time" as part of the process of AOT-compiling a component (the same step that is already used for fusing canonical adapters into core wasm and generating machine code).  Because of the [component invariant](https://github.com/WebAssembly/component-model/blob/main/design/mvp/Explainer.md#component-invariants) that functions executed during the `start` phase cannot call imports, when wasm is executed in [deterministic mode](https://github.com/WebAssembly/profiles/pull/5), a component's state at the end of the `start` phase is fully determined by its `value` imports.  Thus, as a *pure optimization*, a component AOT compiler could locally instantiate the root component being deployed with its expected value imports and include a snapshot of the post-`start` execution state in the final compiled representation of the component.

This snapshot-as-deployment-time-optimization approach has a number of advantages:
* The optimization doesn't depend on each individual producer toolchain (of which there will be many) figuring out how to integrate `wizer` (which is a bit tricky).
* The representation of the snapshot can be whatever internal format the host runtime wants, making it easier to implement complete coverage of the component model.
* The component binary uploaded for deployment can be smaller.
* The component binary stores the original static assets, allowing a component's static assets to be meaningfully edited in-place without having to re-compile the component and these assets can be de-duped on a per-asset basis.
* The execution platform can regenerate the snapshot offline whenever the value imports change (e.g., if value imports reflect configuration values), allowing the snapshot to be fully specialized to the configuration.

However, there's a significant limitation with this approach: not being able to call imports during the `start` phase means that `start` functions won't be able to do much other than purely component-internal initialization.  This limitation shows up when we try to execute guest initialization code (like top-level script execution or C++ global constructors) that may- or may-not call imports before the snapshot.  If we run this code during the `start` phase, we'll trap if an import is called.  If we can't run the code during `start`, our only other option is to run it lazily when the first export is called (which is definitely not included in the snapshot).  Thus, our only two options are either overly-restrictive or overly-unoptimized.

One motivating observation is that calls to imports during `start` may actually be deterministic in practice if:
* the imports are implemented by another component that doesn't itself call host imports
* the host knows the imports are invariant
* the host creates a new snapshot whenever the imported function would produce a different result for a different parameter

Simply relaxing the trapping rules to allow these cases would be anti-composable and anti-virtualizable, since now the same component may or may not trap depending on subtle host details and how it is linked, none of which is reflected in the component's signature.  So instead...

## Feature summary

The basic idea (which is an [old idea originating in core wasm](https://github.com/WebAssembly/design/issues/1160)) is to have a second phase of initialization that *is* allowed to call imports that runs *after* the `start` phase and *before* the first export is called.

As for what to call this second phase: based on discussion in [this](https://github.com/WebAssembly/spec/issues/1073) issue, calling the second phase "init" sounds like it will confuse at least some people (b/c "init" sounds like it goes *before* "start").  So to avoid that, as a strawperson, I'll just call this second phase of initialization `start2`.

Just like `start` sections in the component model, there can be multiple `start2` sections/functions in a component and they are run in order.  The component model would ensure that all `start` functions have finished before the first `start2` function runs and that all `start2` functions complete before the first export is allowed to be called.  Thus, there is a `start` phase followed by a `start2` phase that precedes general calls to exports.  Because `start2` functions can call imports, `start2` will be the default place for a language toolchains to execute arbitrary up-front/run-once/top-level/global-constructor user code that takes no arguments and produces no results.

Parent components get to choose when to execute their child instances' `start2` phases.  If the parent knows that a child component will not or cannot call the parent's own imports (which the parent is in a position to know, as the parent completely determines the child's imports), the parent may execute the child component's `start2` phase during the parent's `start` phase, thus including the child's post-`start2` execution state in the root component's post-`start` snapshot.  However, the parent can always execute a child's `start2` phase later, e.g., during the parent's own `start2` phase.  Because component instances form a tree, each parent going up the tree to the root has the option to run an entire child subtree's `start2` phase during the parent's own `start` phase, thereby including it in the final root snapshot.

An AOT compiler can also be more aggressive and execute the root component's `start2` phase *speculatively* and capture a snapshot if `start2` returns without calling an import (silently discarding the `start2` execution on trap, which will by design not be externally observable).  If the AOT compiler additionally has knowledge of the host's implementation of imports, the AOT compiler can be even more aggressive and allow-list host imports under various conditions.  In the limit, an AOT compiler could capture a snapshot at the first point of non-determinism.  Ultimately, this is all in the realm of pure runtime optimization and can be configured and improved over time.

## Sketch

Here's a sketch that seems like it could maybe work:
* Add a new `start2` section would be added that can call component-level function (just like the `start` function).  The `start2` section can call lifted core functions that execute the component's top-level core code.
* Add a new `(canon child.start2 <instanceidx> (func $f))` canon built-in would be added for creating a component-level function that, when called, executes the `start2` phase of the given instance.  This function `$f` can be called eagerly via a `(start2 $f)` section *or* lazily by `canon lower`ing and calling arbitrarily later from core wasm.
* Use dynamic traps ensure that a component instance's `start2` phase is executed exactly once before its first export is called.  This allows lazy initialization right before the first use.  In the eager-`start2` case, the dynamic traps could be trivially eliminated.
* Because initialization can call imports which can fail, `start2` functions would need to be able to return a `result` (with empty success and error payloads).  `start2` sections simply propagate failure.  Lowered calls to `start2` from core wasm can potentially handle and recover from failures.
* When async is added, initialization may require calling async imports, so `start2` functions could additionally return a `future<result>`.  This async-ness would need to somehow be reflected in the component's type so that its clients know that `child.start2` returns a `future<result>`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider adding second initialization phase after `start` #146

Motivation

Feature summary

Sketch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider adding second initialization phase after start #146

Description

Motivation

Feature summary

Sketch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Consider adding second initialization phase after `start` #146