Closed
Description
I'm currently doing some benchmarks and I've seen that __multi3
is taking a really long time to calculate on wasm. Turns out that none of the helper functions it uses are inlined at all, so it does a ton of unnecessary calls and misses a lot of potential optimizations:
(func $compiler_builtins::int::mul::__multi3::h134a3ac9dcea74e0 (type $t39) (param $p0 i32) (param $p1 i64) (param $p2 i64) (param $p3 i64) (param $p4 i64)
(local $l5 i32) (local $l6 i64) (local $l7 i64) (local $l8 i64)
global.get $g0
i32.const 16
i32.sub
local.tee $l5
global.set $g0
local.get $p1
local.get $p2
call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
i64.const 4294967295
i64.and
local.get $p3
local.get $p4
call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
i64.const 4294967295
i64.and
call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
local.set $l6
local.get $p1
local.get $p2
call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
i64.const 32
i64.shr_u
local.get $p3
local.get $p4
call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
i64.const 4294967295
i64.and
call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
local.get $l6
i64.const 32
i64.shr_u
i64.add
local.tee $l7
i64.const 32
i64.shr_u
call $<i64_as_compiler_builtins::int::Int>::from_unsigned::h7efafe9ef60e0310
local.set $l8
local.get $l5
local.get $p3
local.get $p4
call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
i64.const 32
i64.shr_u
local.get $p1
local.get $p2
call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
i64.const 4294967295
i64.and
call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
local.get $l7
i64.const 4294967295
i64.and
i64.add
local.tee $l7
i64.const 32
i64.shl
local.get $l6
i64.const 4294967295
i64.and
i64.or
local.get $l8
local.get $l7
i64.const 32
i64.shr_u
call $<i64_as_compiler_builtins::int::Int>::from_unsigned::h7efafe9ef60e0310
i64.add
local.get $p1
local.get $p2
call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
i64.const 32
i64.shr_u
local.get $p3
local.get $p4
call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
i64.const 32
i64.shr_u
call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
call $<i64_as_compiler_builtins::int::Int>::from_unsigned::h7efafe9ef60e0310
i64.add
local.get $p1
local.get $p2
call $<i128_as_compiler_builtins::int::LargeInt>::high::hc703bd80007672ce
local.get $p3
local.get $p4
call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
call $<i64_as_compiler_builtins::int::Int>::from_unsigned::h7efafe9ef60e0310
call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
call $<i64_as_compiler_builtins::int::Int>::wrapping_add::h9472fc521f362e13
local.get $p1
local.get $p2
call $<i128_as_compiler_builtins::int::LargeInt>::low::h654cf3febd0eecf8
call $<i64_as_compiler_builtins::int::Int>::from_unsigned::h7efafe9ef60e0310
local.get $p3
local.get $p4
call $<i128_as_compiler_builtins::int::LargeInt>::high::hc703bd80007672ce
call $<i64_as_compiler_builtins::int::Int>::wrapping_mul::h558368f41dd93d09
call $<i64_as_compiler_builtins::int::Int>::wrapping_add::h9472fc521f362e13
call $<i128_as_compiler_builtins::int::LargeInt>::from_parts::h8ea13ebf67bbf227
local.get $l5
i64.load
local.set $p3
local.get $p0
local.get $l5
i32.const 8
i32.add
i64.load
i64.store offset=8
local.get $p0
local.get $p3
i64.store
local.get $l5
i32.const 16
i32.add
global.set $g0)
The same is probably true for the Float trait.
Metadata
Metadata
Assignees
Labels
Area: Code generationCategory: An issue proposing an enhancement or a PR with one.Issue: Problems and improvements with respect to performance of generated code.Target: WASM (WebAssembly), http://webassembly.org/Medium priorityPerformance or correctness regression from one stable version to another.