Skip to content

compiler-builtins: Int trait functions are not inlined on wasm #73135

Closed
@CryZe

Description

@CryZe

I'm currently doing some benchmarks and I've seen that __multi3 is taking a really long time to calculate on wasm. Turns out that none of the helper functions it uses are inlined at all, so it does a ton of unnecessary calls and misses a lot of potential optimizations:

(func $compiler_builtins::int::mul::__multi3::h134a3ac9dcea74e0 (type $t39) (param $p0 i32) (param $p1 i64) (param $p2 i64) (param $p3 i64) (param $p4 i64)
    (local $l5 i32) (local $l6 i64) (local $l7 i64) (local $l8 i64)
    global.get $g0
    i32.const 16
    i32.sub
    local.tee $l5
    global.set $g0
    local.get $p1
    local.get $p2
    call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
    i64.const 4294967295
    i64.and
    local.get $p3
    local.get $p4
    call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
    i64.const 4294967295
    i64.and
    call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
    local.set $l6
    local.get $p1
    local.get $p2
    call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
    i64.const 32
    i64.shr_u
    local.get $p3
    local.get $p4
    call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
    i64.const 4294967295
    i64.and
    call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
    local.get $l6
    i64.const 32
    i64.shr_u
    i64.add
    local.tee $l7
    i64.const 32
    i64.shr_u
    call $<i64_as_compiler_builtins::int::Int>::from_unsigned::h7efafe9ef60e0310
    local.set $l8
    local.get $l5
    local.get $p3
    local.get $p4
    call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
    i64.const 32
    i64.shr_u
    local.get $p1
    local.get $p2
    call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
    i64.const 4294967295
    i64.and
    call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
    local.get $l7
    i64.const 4294967295
    i64.and
    i64.add
    local.tee $l7
    i64.const 32
    i64.shl
    local.get $l6
    i64.const 4294967295
    i64.and
    i64.or
    local.get $l8
    local.get $l7
    i64.const 32
    i64.shr_u
    call $<i64_as_compiler_builtins::int::Int>::from_unsigned::h7efafe9ef60e0310
    i64.add
    local.get $p1
    local.get $p2
    call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
    i64.const 32
    i64.shr_u
    local.get $p3
    local.get $p4
    call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
    i64.const 32
    i64.shr_u
    call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
    call $<i64_as_compiler_builtins::int::Int>::from_unsigned::h7efafe9ef60e0310
    i64.add
    local.get $p1
    local.get $p2
    call $<i128_as_compiler_builtins::int::LargeInt>::high::hc703bd80007672ce
    local.get $p3
    local.get $p4
    call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
    call $<i64_as_compiler_builtins::int::Int>::from_unsigned::h7efafe9ef60e0310
    call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
    call $<i64_as_compiler_builtins::int::Int>::wrapping_add::h9472fc521f362e13
    local.get $p1
    local.get $p2
    call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
    call $<i64_as_compiler_builtins::int::Int>::from_unsigned::h7efafe9ef60e0310
    local.get $p3
    local.get $p4
    call $<i128_as_compiler_builtins::int::LargeInt>::high::hc703bd80007672ce
    call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
    call $<i64_as_compiler_builtins::int::Int>::wrapping_add::h9472fc521f362e13
    call $<i128_as_compiler_builtins::int::LargeInt>::from_parts::h8ea13ebf67bbf227
    local.get $l5
    i64.load
    local.set $p3
    local.get $p0
    local.get $l5
    i32.const 8
    i32.add
    i64.load
    i64.store offset=8
    local.get $p0
    local.get $p3
    i64.store
    local.get $l5
    i32.const 16
    i32.add
    global.set $g0)

The same is probably true for the Float trait.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-codegenArea: Code generationC-enhancementCategory: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.O-wasmTarget: WASM (WebAssembly), http://webassembly.org/P-mediumMedium priorityregression-from-stable-to-stablePerformance or correctness regression from one stable version to another.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions