Skip to content

Use stack_store instead of stack_addr+store when building structs #1265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 17, 2022

Conversation

afonso360
Copy link
Contributor

👋 Hey,

When building structs for calling functions we currently emit a stack_addr and a store, this is a little bit verbose and it annoyed me a little bit when looking at the generated clif code.

This changes the ABI building code to use a stack_store instead.

The previous code looks something like this:

    v10 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(13) }, None): bool <- ByVal(v10): bool
    v30 = stack_addr.i64 ss0+13
    store notrap aligned v10, v30

And now we generate this:

     v10 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(13) }, None): bool <- ByVal(v10): bool
    stack_store v10, ss0+13

Which is slightly nicer to read.

It doesn't make a lot of difference, since after optimizations it goes back to the stack_addr + store format anyway

@bjorn3
Copy link
Member

bjorn3 commented Aug 17, 2022

If I recall correctly I did this because stack_store/stack_load would legalize into multiple stack_addr instructions. One for each stack access, which was at least back then slower.

@afonso360
Copy link
Contributor Author

afonso360 commented Aug 17, 2022

I think that's whats happening anyway, here's what i pulled out of a testcase (with the current master branch):

bool_11_perturbed_big

function u0:8() windows_fastcall {
; symbol _ZN21bool_11_perturbed_big4main17h1d0542c81fd5f88dE
; instance Instance { def: Item(WithOptConstParam { did: DefId(0:23 ~ bool_11_perturbed_big[c589]::main), const_param_did: None }), substs: [] }
; abi FnAbi { args: [], ret: ArgAbi { layout: TyAndLayout { ty: (), layout: Layout { size: Size(0 bytes), align: AbiAndPrefAlign { abi: Align(1 bytes), pref: Align(8 bytes) }, abi: Aggregate { sized: true }, fields: Arbitrary { offsets: [], memory_index: [] }, largest_niche: None, variants: Single { index: 0 } } }, pad: None, mode: Ignore }, c_variadic: false, fixed_count: 0, conv: Rust, can_unwind: false }

; kind  loc.idx   param    pass mode                            ty
; zst   _0    ()                                0b 1, 8              align=8,offset=
; ret   _0      -          Ignore                               ()

; kind  local ty                              size align (abi,pref)
; stack _1    bool_11_perturbed_big            20b 4, 8              storage=ss0
; zst   _2    ()                                0b 1, 8              align=8,offset=
; stack _3    bool_11_perturbed_big            20b 4, 8              storage=ss1

  ss0 = explicit_slot 32
  ss1 = explicit_slot 32
  sig0 = (i64, i64, i64) windows_fastcall
  sig1 = (i64) windows_fastcall
  fn0 = %Memcpy sig0
  fn1 = u0:10 sig1 ; Instance { def: Item(WithOptConstParam { did: DefId(0:21 ~ bool_11_perturbed_big[c589]::{extern#0}::bool_struct_in_11_perturbed_big), const_param_did: None }), substs: [] }

block0:
                                  jump block1

block1:
  v0 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(0) }, None): bool <- ByVal(v0): bool
  v20 = stack_addr.i64 ss0
  store notrap aligned v0, v20
  v1 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(1) }, None): bool <- ByVal(v1): bool
  v21 = stack_addr.i64 ss0+1
  store notrap aligned v1, v21
  v2 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(2) }, None): bool <- ByVal(v2): bool
  v22 = stack_addr.i64 ss0+2
  store notrap aligned v2, v22
  v3 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(3) }, None): bool <- ByVal(v3): bool
  v23 = stack_addr.i64 ss0+3
  store notrap aligned v3, v23
  v4 = f32const 0x1.848280p7
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(4) }, None): f32 <- ByVal(v4): f32
  v24 = stack_addr.i64 ss0+4
  store notrap aligned v4, v24
  v5 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(8) }, None): bool <- ByVal(v5): bool
  v25 = stack_addr.i64 ss0+8
  store notrap aligned v5, v25
  v6 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(9) }, None): bool <- ByVal(v6): bool
  v26 = stack_addr.i64 ss0+9
  store notrap aligned v6, v26
  v7 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(10) }, None): bool <- ByVal(v7): bool
  v27 = stack_addr.i64 ss0+10
  store notrap aligned v7, v27
  v8 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(11) }, None): bool <- ByVal(v8): bool
  v28 = stack_addr.i64 ss0+11
  store notrap aligned v8, v28
  v9 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(12) }, None): bool <- ByVal(v9): bool
  v29 = stack_addr.i64 ss0+12
  store notrap aligned v9, v29
  v10 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(13) }, None): bool <- ByVal(v10): bool
  v30 = stack_addr.i64 ss0+13
  store notrap aligned v10, v30
  v11 = iconst.i8 176
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(14) }, None): u8 <- ByVal(v11): u8
  v31 = stack_addr.i64 ss0+14
  store notrap aligned v11, v31
  v12 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(15) }, None): bool <- ByVal(v12): bool
  v32 = stack_addr.i64 ss0+15
  store notrap aligned v12, v32
  v13 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(16) }, None): bool <- ByVal(v13): bool
  v33 = stack_addr.i64 ss0+16
  store notrap aligned v13, v33
  v14 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(17) }, None): bool <- ByVal(v14): bool
  v34 = stack_addr.i64 ss0+17
  store notrap aligned v14, v34
  v15 = iconst.i8 1
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(18) }, None): bool <- ByVal(v15): bool
  v35 = stack_addr.i64 ss0+18
  store notrap aligned v15, v35
; write_cvalue: Addr(Pointer { base: Stack(ss1), offset: Offset32(0) }, None): bool_11_perturbed_big <- ByRef(Pointer { base: Stack(ss0), offset: Offset32(0) }, None): bool_11_perturbed_big
  v16 = stack_addr.i64 ss0
  v17 = stack_addr.i64 ss1
  v18 = iconst.i64 20
  call fn0(v17, v16, v18)
;
; _2 = bool_struct_in_11_perturbed_big(move _3)
  v19 = stack_addr.i64 ss1
  call fn1(v19)
  jump block2

block2:
  return
}

Is there a way for me to benchmark this, so that we can check?

Edit: But the comments do point at a Addr(Pointer { base: Stack(ss0), offset: Offset32(15) }, None), which seems to already use a Stack as a base 🤔 , I think i may have mixed up something along the way

@afonso360
Copy link
Contributor Author

Ok, I think I originally looked at the opt version of the clif output and then fixed it here, and after fixing looked at the unopt version of the files.

If I go back to master, we already use stack_store pretty much everywhere (except here, but I wasn't even triggering this part of the code).

Up to you if you want to merge this or close it, it looks like my problem was looking at the opt files by accident 😅

@bjorn3
Copy link
Member

bjorn3 commented Aug 17, 2022

This seems to be a slight improvement for simple-raytracer:

Benchmark 1: ./cg_clif_master
  Time (mean ± σ):      4.968 s ±  0.396 s    [User: 4.960 s, System: 0.006 s]
  Range (min … max):    4.785 s …  6.620 s    100 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: ./cg_clif_stack_store
  Time (mean ± σ):      4.827 s ±  0.028 s    [User: 4.822 s, System: 0.005 s]
  Range (min … max):    4.774 s …  4.990 s    100 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  './cg_clif_stack_store' ran
    1.03 ± 0.08 times faster than './cg_clif_master'

@bjorn3 bjorn3 merged commit 4dac65f into rust-lang:master Aug 17, 2022
@afonso360
Copy link
Contributor Author

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants