Missed Optimization - Replacement of rint/lrint with X87/SSE specific instructions

X87 and SSE have simple rounding and converting store instructions, which are essentially equivalent to `l{0,2}rint[fl]?`

Clang/LLVM does not seem to replace calls to `rint` with these, and neither does it vectorise these when used to round/convert vectors in all cases.
(truncation is properly replaced)

Some examples follow below

GCC is listed aswell,
The main difference to them is, that they do schedule their `fldcw` for truncation earlier and replace `rintl`, as well as use some bit-magic for `rintf`

Note: Using `f32x4` for `float __vector(4)` and `i32x4` for `int __vector(4)`
Note: `cvtss2si` != `cvttss2si`
Note: Assuming Overflows etc are UB, and HW's behaviour is acceptable

Scenario                   |LLVM                            |GCC                             |Effective instruction(s)
---------------------------|------------------------------- |------------------------------- |-------------------
`rintl`                    | `call    rintl@PLT`            | `frndint`                      | `frndint`
`(int)rintl`               | `call    rintl@PLT` +truncation| `call    rintl@PLT` +truncation| `fistp m16/m32/m64`
`lrintl`                   | `call    lrintl`               | `call    lrintl`               | `fistp m16/m32/m64`
|||
`lrint`                    | `call    lrintl`               | `call    lrintl`               | `cvtss2si r32/r64, xmmX`
`(int)rintf`               | `call    rintf@PLT;cvttss2si`  | Bit magic+`cvttss2si`          | `cvtss2si r32, xmmX`
`(int)rintf (SSE4.2)`      | `roundss + cvttss2si`          | `roundss + cvttss2si`          | `cvtss2si r32, xmmX`
`4x lrintf (f32x4->i32x4)` | 4x (shuffle+`call    lrintl`)  |  4x (shuffle+`call    lrintl`  | `cvtps2dq xmmY, xmmX` 
 
Tested using glodbolt and `x86_64 Clang 14.0.0` as well as `x86_64 GCC 11.2` with O2 and O3 

----

Update: Seems like most cases are now cought, only coalecsing `cvtss2si`s to `cvtps2dq`


Scenario	LLVM	GCC	Effective instruction(s)
`rintl`	`call rintl@PLT`	`frndint`	`frndint`
`(int)rintl`	`call rintl@PLT` +truncation	`call rintl@PLT` +truncation	`fistp m16/m32/m64`
`lrintl`	`call lrintl`	`call lrintl`	`fistp m16/m32/m64`

`lrint`	`call lrintl`	`call lrintl`	`cvtss2si r32/r64, xmmX`
`(int)rintf`	`call rintf@PLT;cvttss2si`	Bit magic+`cvttss2si`	`cvtss2si r32, xmmX`
`(int)rintf (SSE4.2)`	`roundss + cvttss2si`	`roundss + cvttss2si`	`cvtss2si r32, xmmX`
`4x lrintf (f32x4->i32x4)`	4x (shuffle+`call lrintl`)	4x (shuffle+`call lrintl`	`cvtps2dq xmmY, xmmX`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missed Optimization - Replacement of rint/lrint with X87/SSE specific instructions #55202

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missed Optimization - Replacement of rint/lrint with X87/SSE specific instructions #55202

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions