make enum size not depend on the order of variants #131684

adwinwhite · 2024-10-14T13:33:09Z

I fixed the off-by-one error by allocating discriminant starting from the variant after the untagged variant and wrapping around to the first variant when it reaches the last variant.

enum {
    A,           // 1
    B,           // 2
    C(bool),     // untagged_variant, no discriminant
    D,           // has a discriminant of 0
}

Check the detailed algorithm in the definition of TagEncoding::Niche.
This is only applied when the untagged_variant is contained in the niche_variants wasting one value from the niche range.

Undoubtedly this complicates the codegen, but it should not have a large performance penalty.
It adds one arithmetic instruction in the general case and no additional branch instruction.

I'm not totally sure if the complexity is justified, considering the use case is rare.

r? @the8472

rustbot · 2024-10-14T13:33:18Z

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

RalfJung · 2024-10-15T06:14:27Z

compiler/rustc_const_eval/src/interpret/discriminant.rs

                        } else {
-                            untagged_variant
+                            if variant_index_relative < u128::from(discr_len) {


It looks like there is some duplication of logic here? The if variant_index_relative < u128::from(discr_len) { ... } else { untagged_variant } occurs twice. Can that be cleaned up?

They have different conditions. I deleted a subtraction expression by mistake.
This reinforces the need for more thorough test coverage.

RalfJung · 2024-10-15T06:15:42Z

compiler/rustc_const_eval/src/interpret/discriminant.rs

                // We need to use machine arithmetic when taking into account `niche_start`:
-                // tag_val = variant_index_relative + niche_start_val


Why did you remove this comment? The code below still does the same math as before, doesn't it?

I probably didn't notice that this is a comment rather than outdated code to be deleted.

RalfJung · 2024-10-15T06:17:43Z

tests/ui/layout/size-not-depend-on-variants-order.rs

Is there any test that actually runs the new get/set discriminant logic? Seems like this test only checks that the size is what it should be.

You added some new if in a critical codepath in codegen and the interpreter. Both of them need thorough test coverage to ensure all branches are tested.

You're right. I definitely should add more tests.

tests/ui/enum-discriminant/get_discr.rs should be the right place, it already covers a bunch of dense enum encodings.

rustbot · 2024-10-16T10:08:41Z

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

compiler/rustc_codegen_cranelift/src/discriminant.rs

bjorn3 · 2024-10-16T10:15:39Z

For reference: It took me a long time to implement the discriminant handling code in cg_clif with all corner cases like discriminants bigger than 64bits correctly handled.

adwinwhite · 2024-10-16T11:11:03Z

For reference: It took me a long time to implement the discriminant handling code in cg_clif with all corner cases like discriminants bigger than 64bits correctly handled.

I'll take a closer look. Is there any specific test to check this? Maybe I can add more test coverage.

bjorn3 · 2024-10-16T11:30:15Z

It's been a while, so I don't remember all the problems I've had. It was meant more of a cautionary tale for making the discriminant handling code even more complex.

adwinwhite · 2024-10-17T01:49:24Z

I am also kind of hesitant to make it more complex. It's totally fine to close this PR.

Would you mind sharing your thoughts on this? @the8472 @RalfJung

the8472 · 2024-10-24T21:56:49Z

compiler/rustc_abi/src/lib.rs

+    /// enum SomeEnum {
+    ///     A,       // 1
+    ///     B,       // 2
+    ///     C(bool), // untagged_variant, no discriminant
+    ///     D,       // has a discriminant of 0
+    /// }


Can you add the tag/payload values here too, not just the discriminants?

the8472 · 2024-10-24T22:06:36Z

compiler/rustc_codegen_cranelift/src/discriminant.rs

+                        }
+                        ty => fx.bcx.ins().iconst(ty, discr_len as i64),
+                    };
+                    let tagged_discr = fx.bcx.ins().urem(tagged_discr, discr_len);


discr_len is not a power of two, right? This doesn't look good for performance.

In the motivating issue (#117238) the full discriminant range is essentially packed 2 values for the bool, 254 values for the remaining variants. In that case we can just use wrapping arithmetic of the right type size without any remainder.

So for smaller cases like the A-D case above, can we rearrange the discriminants so that they also fit the natural tag wrap-around?

I didn't realize modulo operation has such bad performance. Now the foundation of this PR is mostly gone.

Also from this comment it sounds like it should be possible to fix this issue without changing codegen at all. IIUC, we don't need to change anything about TagEncoding::Niche, we just need to be more clever in how we construct enums with that encoding. Or did I misunderstand?

I basically implemented what that comment suggested, like allocating discriminant starting right after the untagged variant and wrapping around. This changes how we convert variant_index <--> tag, thus codegen.

Codegen needs to compute the tag from the discriminant. The mapping between variant index and tag only exists during compilation and can be adjusted without codegen impact. Currently, enums with Niche encoding make the tag equal to the variant index, but it doesn't have to be that way.

Anyway maybe @the8472 could clarify what they meant in that comment.

the8472 · 2024-10-24T22:11:11Z

compiler/rustc_abi/src/lib.rs

+    /// ```rust,ignore (pseudo-code)
+    /// adjusted_len = niche_variants.end - niche_variants.start + 1
+    /// d = tag_value.wrapping_sub(niche_start)
+    /// variant_index = if niche_variants.contains(untagged_variant) {


This code mixes compile-time and runtime behavior. I think that's unnecessarily confusing. It could be split into separate sections (one for interior untagged variants, one without) which each have their own code samples.

the8472 · 2024-10-25T06:36:33Z

compiler/rustc_abi/src/lib.rs

+    /// enum SomeEnum {
+    ///     A,       // 1
+    ///     B,       // 2
+    ///     C(bool), // untagged_variant, no discriminant
+    ///     D,       // has a discriminant of 0
+    /// }


Also, the given enum already has a size of 1. I think it's the edge-case in #117238 where we need the wraparound. If there's enough room the layout logic already shifts up the entire tag range, including a gap where the untagged variant would otherwise be. It's only when the tag range is full that that becomes a problem.

Well, maybe there also are cases with range-restricted payloads other than bool. Something using rustc_layout_scalar_valid_range_start and rustc_layout_scalar_valid_range_end might also benefit from wraparound.

But generally those should be exceptions, not the rule. So we shouldn't take a more expensive code-path for the normal case.

Yeah, we can still wrap around when the tag range is full to avoid the off-by-1 bug. But taking a step back, a clippy warning might be a better approach for this rare case than complicating codegen?

bors · 2024-10-28T17:10:44Z

☔ The latest upstream changes (presumably #132262) made this pull request unmergeable. Please resolve the merge conflicts.

alex-semenyuk · 2024-12-06T20:54:15Z

@adwinwhite
From wg-triage. Could you please take it seems not all comments were addressed? Beside that can you please rebase due to merge conflicts

adwinwhite · 2024-12-10T04:54:41Z

I think my wrapping around approach is not viable currently.

Let's say we only consider wrapping around when the tag range is full and untagged_variant is contained in niche_variants.
The divisor used by modulo operation is,

discr_len = niche_variants.end - niche_variants.start + 1

The problem is that discr_len is not a power of 2 when the tag range is full. In the specific case of #117238, the discr_len is 255.
Thus we can't simplify the modulo operation into wrapping arithmetic.
And modulo operation has bad performance when the divisor is not a power of 2.

rustbot assigned the8472 Oct 14, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 14, 2024