Add a new `--orchestrator-id` flag to rustc

# Proposal (Updated on 2024-09-06)

## Context

The JSON documentation for a crate often refers to items (e.g. functions, traits, types, etc.) defined in one or more its dependencies, either direct or transitive.  
This can happen in a variety of scenarios:

- A local function uses a foreign type as one of input parameters or as its return type
- A foreign trait is implemented for a local type or used in a bound
- Re-exports
- Etc.

## Problem

The JSON documentation for a crate doesn't provide many details when it comes to foreign items. If you want to dig deeper, you must generate the JSON documentation for the crate where they are defined and then combine the information from those two JSON documents to get a complete picture[^intentional].  
In other words, you need to **walk the dependency graph** to perform analyses where foreign items are in scope.

To walk the graph, you need to go from "I have this foreign item" to "here's the JSON documentation for crate X, where that item was defined".  
This is the difficult part: the JSON documentation for a crate contains very little information about third party crates. In particular, it only reports [their names and the root URLs to their docs](https://docs.rs/rustdoc-types/0.21.0/rustdoc_types/struct.ExternalCrate.html). 
In the simplest scenario, that's not a problem: the crate name is enough to generate the JSON documentation if there is only one instance of that crate in the dependency graph.  
It becomes an issue when **multiple versions of the same crate** appear together in the same JSON document. There will be multiple entries in [`external_crates`](https://docs.rs/rustdoc-types/latest/rustdoc_types/struct.Crate.html#structfield.external_crates) with the same `name`, one for each version of that duplicated dependency.  
There is no bullet-proof mechanism to disambuigate between those entries and map them back to specific package entries in the dependency tree of the "root" crate[^zulip]. That's the problem this MCP tries to solve. 

## The proposed solution

### For `rustdoc`

`rustdoc` will provide an additional field in [`ExternalCrate`](https://docs.rs/rustdoc-types/0.29.1/rustdoc_types/struct.ExternalCrate.html), named `orchestrator_id`.  
`orchestrator_id` will be of type `Option<String>`.
`orchestrator_id` will be (optionally) provided by the tool orchestrating the overall build. This will be `cargo` in the most common case, but it may as well be an alternative orchestrator (`bazel`, `buck2`, etc.). If an orchestrator decides to provide an id for each build unit, it must guarantee that those ids uniquely identify that unit within the dependency graph.

### For `cargo`

`cargo` will populate the `orchestrator_id` field.  
The id will be a valid string for the `--package` flag in `cargo` (e.g. `cargo rustdoc -p <build-id>` should always work). This will allow tool builders to walk the graph without ambiguity when working with JSON documentation.

### Implementation strategy

How would `rustdoc` receive this `orchestrator_id`, in practice?  
`rustdoc` looks at the [`rmeta` file](https://rustc-dev-guide.rust-lang.org/backend/libs-and-metadata.html#rmeta) to gather information about third party crates. We want to include `orchestrator_id` in that file to get it from `cargo` to `rustdoc`. This requires `rustc` to cooperate. 

Adding a new (unstable) `--orchestrator-id` option to `rustc` seems to be the simplest way forward. 
This could then be stabilised as is or, if preferred, moved to a different "channel" (e.g. as part of `-Cmetadata`, its own `-C` flag, etc.).


# Mentors or Reviewers

A mentor would definitely be appreciated, I never touched any of the code involved here.

# Process

The main points of the [Major Change Process][MCP] are as follows:

* [x] File an issue describing the proposal.
* [ ] A compiler team member or contributor who is knowledgeable in the area can **second** by writing `@rustbot second`.
    * Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a `-C flag`, then full team check-off is required.
    * Compiler team members can initiate a check-off via `@rfcbot fcp merge` on either the MCP or the PR.
* [ ] Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered **approved**.

# Comments

**This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.**

[^intentional]: This is an intentional design choice, that's been discussed at length in the `rustdoc` team and that's been endorsed by all JSON-based tool builders that interface with the team.
[^zulip]: See [this Zulip thread](https://rust-lang.zulipchat.com/#narrow/stream/266220-rustdoc/topic/Identifying.20external.20crates.20in.20Rustdoc.20JSON/near/352551996) for more details.

## Notes

This proposal supercedes https://github.com/rust-lang/compiler-team/issues/622, incorporating the feedback and ideas that were surfaced in the associated Zulip stream.

<details>
 <summary>Previous proposal, before first round of feedback</summary>

## The problem

`rustdoc`'s JSON output for a crate must often refer to items (e.g. functions, traits, types, etc.) defined in one or more its dependencies.

Very little information is captured about those dependencies: [their name and the root URL to their docs](https://docs.rs/rustdoc-types/0.21.0/rustdoc_types/struct.ExternalCrate.html).

This causes issues when multiple versions of the same crate appear as direct dependencies (e.g. via renames) of the crate we are documenting—e.g. [it becomes impossible to find out where items are coming from](https://rust-lang.zulipchat.com/#narrow/stream/266220-rustdoc/topic/Identifying.20external.20crates.20in.20Rustdoc.20JSON/near/352551996).

## The proposed solution

Add a `--build-id` flag to `rustc`.
The value would be provided by `cargo` and then captured as part of the .rlib metadata, which would in turn make it available to `rustdoc`, as well as other parts of the compiler that might benefit from it (e.g. diagnostics, such as [version mismatches](https://github.com/rust-lang/rust/issues/110055) which currently limit themselves to `perhaps two different version of CRATE_NAME are being used?`).

The provided `build-id` must:

- Uniquely identify the crate it points to within the current workspace;
- Be a valid string for the `--package` flag in `cargo` (e.g. `cargo rustdoc -p <build-id>` should always work). This allows toolmakers to fetch more information on-demand when working with `rustdoc`'s JSON output;
- Be as human-friendly as possible (e.g. use just the crate name if there is no ambiguity) in order to be used in diagnostics.
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new `--orchestrator-id` flag to rustc #635

Proposal (Updated on 2024-09-06)

Context

Problem

The proposed solution

For `rustdoc`

For `cargo`

Implementation strategy

Mentors or Reviewers

Process

Comments

Notes

The problem

The proposed solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a new --orchestrator-id flag to rustc #635

Description

Proposal (Updated on 2024-09-06)

Context

Problem

The proposed solution

For rustdoc

For cargo

Implementation strategy

Mentors or Reviewers

Process

Comments

Notes

The problem

The proposed solution

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add a new `--orchestrator-id` flag to rustc #635

For `rustdoc`

For `cargo`