Tracking issue: Attribute refactor

While working on #125418 with @m-ou-se, I've interacted quite a bit with attributes in the compiler. I've got some thoughts about the way they currently work. I'm posting this as a mix between an explanation of the status quo and why I think that's an issue, in addition to also serving as a kind of tracking issue for these changes if I've convinced you that this is a problem.

# Quick Overview

From the ground up: There are several syntaxes for macros, one of those syntaxes is attributes which can have [several forms]. Attributes can be expanded, either as a user defined attribute macro, or as an "active" built in attribute like `#[test]`. However, some attributes are kept around for the entire compilation lifecycle. 

These [built-in attributes] are never expanded. Instead, they are kept around and serve as markers or metadata to guide the compilation process at various stages. There are currently around `100` of these.

[several forms]: https://doc.rust-lang.org/nightly/reference/attributes.html#meta-item-attribute-syntax
[built-in attributes]: https://rustc-dev-guide.rust-lang.org/attributes.html#builtininert-attributes

# The problem

<details>
<summary>While most of what is parsed, is later lowered during [`rustc_ast_lowering`], attributes are not, mostly. </summary>

Many crates under `compiler/`  depend on `rustc_ast` *just* to use `ast::Attribute`. Let's see what that means:

[`rustc_ast_lowering`]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_ast_lowering


## Partial lowering and impossible states

One part of attributes actually *is* lowered, attributes of the form `#[key = "value"]` aka `MetaNameValueStr`. To be able to do that, the ast contains an enum `AttrArgsEq` that already has a variant for when eventually it is lowered:

https://github.com/rust-lang/rust/blob/11ee3a830b8537976d54805331cc626604afbb63/compiler/rustc_ast/src/ast.rs#L1697-L1700

For one part of the compilation process, the `Ast` variant is always active and `Hir` is completely unused, while later in the compiler the reverse is true. In some places people didn't realize this and they provided implementations for both cases while only one could occur,
while in other places they are marked as unreachable, like here:

https://github.com/rust-lang/rust/blob/11ee3a830b8537976d54805331cc626604afbb63/compiler/rustc_ast/src/visit.rs#L1241

Another case of partial lowering is the tokens field:

https://github.com/rust-lang/rust/blob/11ee3a830b8537976d54805331cc626604afbb63/compiler/rustc_ast_lowering/src/lib.rs#L948

Which is later extensively defended against, making sure this really happened:

https://github.com/rust-lang/rust/blob/11ee3a830b8537976d54805331cc626604afbb63/compiler/rustc_query_system/src/ich/impls_syntax.rs#L41-L54

### Parse, don't validate.

I'm a big fan of the blog post [Parse, don't validate]. Generally rust's type system makes this pattern the most obvious thing to do and it's what I teach my university students every year. However, that is exactly what we aren't doing with attributes. In [`rustc_passes/check_attr.rs`] we first validate extensively, and emit various diagnostics. However, every single attribute is later parsed again where it is needed. I started making a small overview, but `100` attributes is a lot

![Image](https://github.com/user-attachments/assets/394414b5-cd7c-484a-9dc9-e96fd3573c26)

But basically, of the first 19 attributes I looked at, 5 are `Word` attributes and trivial, a few are parsed together, but in total I've found 11 completely distinct and custom parsing logics, not reusing any parts, spread over as many files and compiler crates. 

I lied a little there, the parsing does reuse some things. For example, the attributes are turned into `MetaItem`s using common logic. However, that doesn't change the fact that attributes are effectively re-validated scattered around the compiler, and many of these places have more diagnostics of their own, that could've happened during the earlier validation. It also means that at a very late stage in the compiler, we are still dealing with parsing `TokenStream`s, something that you'd think we should abstract away a little after parsing.

An example of such custom parsing logic: 

https://github.com/rust-lang/rust/blob/11ee3a830b8537976d54805331cc626604afbb63/compiler/rustc_middle/src/ty/context.rs#L1447-L1469

[Parse, don't validate]: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
[`rustc_passes/check_attr.rs`]: https://github.com/rust-lang/rust/blob/11ee3a830b8537976d54805331cc626604afbb63/compiler/rustc_passes/src/check_attr.rs

### Flexibility

Finally, though I have fewer concrete examples of this, sticking to `ast::Attribute` throughout the compiler removes quite some flexibility. Everything has to fit into an `ast::Attribute`, or if it doesn't, you'd have to create more variants like `AttrArgsEq::Hir` to support something in the ast that shouldn't even be part of the ast, forcing you to add a myriad of exceptions in parts of the compiler where such an extra variant isn't relevant yet. Specifically, for #125418 we noticed this because we wanted to do some limited form of name resolution for a path stored in an attribute, which proved next to impossible. 

</details>

# Ideas

<details>
<summary> Lower attributes during `rustc_ast_lowering`. </summary>

I've got 90% of a commit ready to do this, and it's what sparked the idea for this issue. It leads to some code duplication. I'm a little unhappy about it, because it forces a lot of changes across the entire compiler, exactly because attribute parsing now happens in so many places. However, it already means that a lot of assertions can be removed because at some part of the compiler, the fact that an `Attribute` can't have certain fields and values anymore becomes encoded in the type system. I'll open a PR for this soon, and we can discuss whether we think this is a good first step. 

What also doesn't help is that `rustc_attr` currently has logic to validate attributes, but these functions are called in wildly different parts of the compiler. Some functions here validate actual `ast::Attribute`s from before lowering, while other functions validate new `hir::Attribute`s. Bugs here seem easy to make, since even though currently these are the same type, they don't always contain the same fields....

</details>

<details>
<summary>The "real solution": parse, don't validate</summary>

As I see it, what would make attributes so much nicer to work with, is if there was a place in the compiler (something like the `rustc_attr` crate, but actually good) where all attributes are turned from their ast tokeny representation into some specific attribute representation. Something like the following, based on the examples I've looked at in the table I showed a little higher up:

```rust
enum InlineKind {
    Always,
    Never,
    Normal
}

enum Attribute {
    Diagnostic {
        message: Symbol,
        name: Symbol,
        notes: Vec<Symbol>
    },
    Inline(InlineKind),
    Coverage(bool),
    // ...
}
```

This structure contains only the information necessary to use each attribute, and all the diagnostics happen while parsing into this structure. That has the added benefit that this datastructure itself serves as great documentation as to what values an attribute allows. It's super clear here that a `#[diagnostic]` attributes contains a message, name and some notes. Currently, you'd have to make sure the written documentation for this attribute is up-to-date enough.

The translation from `ast::Attribute` to this new parsed attribute should, I think, happen during AST to HIR lowering.

I think the advantages of this should be pretty obvious, based on the examples I've given of the problems with the current approach. However, I could think of some potential blockers people might care about:

* some errors might now be thrown in different stages of the compiler (earlier). I'd say this can actually be an advantage, but I've not talked to enough people to know whether this is a problem anywhere
* A file with code to parse these attributes will contain code of many different features. I personally like that, but it also means that those features themselves become slightly less self-contained.
* Validity of attributes still needs to be checked on-site. (like `track_caller` not being valid on closures, given certain feature flags)
* Affects large parts of the compiler, also unstable parts, where people are actively opening merge requests and might run into conflicts.

Part two I have not worked on personally. I might, if I find enough time, but if someone feels very inspired to pick this up or lead this (or tell my why this is a dumb idea) feel free to.

</details>

---

Everything above was my original issue, that changes were needed to attributes. Everything below is tracking the progress of these changes

---

# Steps

## Already completed

- [x] remove attribute IDs from hir statistics https://github.com/rust-lang/rust/pull/132576
- [x] make clippy's attribute lints work on the ast instead of hir
    - [x] https://github.com/rust-lang/rust/pull/132598
    - [x] https://github.com/rust-lang/rust-clippy/pull/13657
    - [x] https://github.com/rust-lang/rust-clippy/pull/13658
- [x] introduce hir attributes https://github.com/rust-lang/rust/pull/131808
- [x] split up builtins.rs into files for individual attributes. Also move types to  `rustc_attr_data_structures` and rename `rustc_attr` to `rustc_attr_parsing`:  https://github.com/rust-lang/rust/pull/134381
- [ ]  Introduce new parser logic: https://github.com/rust-lang/rust/pull/135726
    - Fixes https://github.com/rust-lang/rust/issues/132391
 

## Future

### introduce `rustc_attr_validation`

At this point, not much has changed as to validation. Next to `rustc_attr_parsing` and `rustc_attr_data_structures`, I intend to create `rustc_attr_validation`. This will represent all the logic *after* ast lowering, for when a `tcx` is available and we can run queries. Some of this currently happens in `rustc_passes/check_attr.rs`. However, even the fact that we will be able to exhaustively match on an enum of attributes will make mistakes harder. I intend to make more changes, such as forcing new attributes to list what kinds of targets they're valid on.

### Document these changes

Of course, I'll already have documentation on all the previous changes in code. However, I intend to write a post on the dev guide as well to make sure that in the future, people know how to use the infrastructure for attributes

### Port all attributes to this system

At this point, with all infrastructure in place, I expect a few PRs porting all attributes to be parsed in `rustc_attr_parsing`. I might ask others to help here, which is now possible when things are documented in the devguide.

### Also introduce some parsed attributes in the AST

This is an idea of @oli-obk . It might be good to also make a smaller enum of parsed attributes in `ast::Attribute`. Especially for attributes that can be discarded when lowering, or which we need or need to validate earlier on. When we validate them while parsing, we can make fewer mistakes. These can then also contain fields that aren't just tokens to support for example resolving names like with the `defines` attribute.

### Smaller TODOs

* deprecate and remove [`tcx.get_attrs`](https://github.com/rust-lang/rust/blob/2f92f050e83bf3312ce4ba73c31fe843ad3cbc60/compiler/rustc_middle/src/ty/mod.rs#L1758)
* partition `AttributeKind`, for example by making codegen attributes a separate enum for exhaustivity reasons
* filter proc-macro-derive helper attributes

# Related issues

I intend to solve these systematically, as in, by rewriting how attributes are handled these should not be issues anymore.

- https://github.com/rust-lang/rust/issues/133791
- https://github.com/rust-lang/rust/issues/132464
- https://github.com/rust-lang/rust/issues/131787

	debug_assert!(!attr.ident().is_some_and(\|ident\| self.is_ignored_attr(ident.name)));
	debug_assert!(!attr.is_doc_comment());

	let ast::Attribute { kind, id: _, style, span } = attr;
	if let ast::AttrKind::Normal(normal) = kind {
	normal.item.hash_stable(self, hasher);
	style.hash_stable(self, hasher);
	span.hash_stable(self, hasher);
	assert_matches!(
	normal.tokens.as_ref(),
	None,
	"Tokens should have been removed during lowering!"
	);
	} else {

	let get = \|name\| {
	let Some(attr) = self.get_attr(def_id, name) else {
	return Bound::Unbounded;
	};
	debug!("layout_scalar_valid_range: attr={:?}", attr);
	if let Some(
	&[
	ast::NestedMetaItem::Lit(ast::MetaItemLit {
	kind: ast::LitKind::Int(a, _),
	..
	}),
	],
	) = attr.meta_item_list().as_deref()
	{
	Bound::Included(a.get())
	} else {
	self.dcx().span_delayed_bug(
	attr.span,
	"invalid rustc_layout_scalar_valid_range attribute",
	);
	Bound::Unbounded
	}
	};

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking issue: Attribute refactor #131229

Quick Overview

The problem

Partial lowering and impossible states

Parse, don't validate.

Flexibility

Ideas

Steps

Already completed

Future

introduce `rustc_attr_validation`

Document these changes

Port all attributes to this system

Also introduce some parsed attributes in the AST

Smaller TODOs

Related issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	pub enum AttrArgsEq {
	Ast(P<Expr>),
	Hir(MetaItemLit),
	}

Tracking issue: Attribute refactor #131229

Description

Quick Overview

The problem

Partial lowering and impossible states

Parse, don't validate.

Flexibility

Ideas

Steps

Already completed

Future

introduce rustc_attr_validation

Document these changes

Port all attributes to this system

Also introduce some parsed attributes in the AST

Smaller TODOs

Related issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

introduce `rustc_attr_validation`