Use `fclass.{s|d|q}` instruction for float point classification in RISC-V targets

Recently I came up with floating point variable classification. I found the function [`f32::classify`](https://doc.rust-lang.org/std/primitive.f32.html#method.classify) useful. Regardless of instruction set architecture, this function is implemented currently like this in standard library:

```rust
#[stable(feature = "rust1", since = "1.0.0")]
pub fn classify(self) -> FpCategory {
    const EXP_MASK: u32 = 0x7f800000;
    const MAN_MASK: u32 = 0x007fffff;
    
    let bits = self.to_bits();
    match (bits & MAN_MASK, bits & EXP_MASK) {
        (0, 0) => FpCategory::Zero,
        (_, 0) => FpCategory::Subnormal,
        (0, EXP_MASK) => FpCategory::Infinite,
        (_, EXP_MASK) => FpCategory::Nan,
        _ => FpCategory::Normal,
    }
}
```

However, this standard library's function compiles to very long bunches of instructions. On RISC-V RV64GC, it compiles into:

<details>
<summary>very long assembly code</summary>

```asm
example::classify_std:
  fmv.w.x ft0, a0
  fsw ft0, -20(s0)
  lwu a0, -20(s0)
  sd a0, -48(s0)
  j .LBB0_1
.LBB0_1:
  lui a0, 2048
  addiw a0, a0, -1
  ld a1, -48(s0)
  and a0, a0, a1
  lui a2, 522240
  and a2, a2, a1
  sw a0, -32(s0)
  sw a2, -28(s0)
  mv a2, zero
  bne a0, a2, .LBB0_3
  j .LBB0_2
.LBB0_2:
  lw a0, -28(s0)
  mv a1, zero
  beq a0, a1, .LBB0_7
  j .LBB0_3
.LBB0_3:
  lwu a0, -28(s0)
  mv a1, zero
  sd a0, -56(s0)
  beq a0, a1, .LBB0_8
  j .LBB0_4
.LBB0_4:
  lui a0, 522240
  ld a1, -56(s0)
  bne a1, a0, .LBB0_6
  j .LBB0_5
.LBB0_5:
  lw a0, -32(s0)
  mv a1, zero
  beq a0, a1, .LBB0_9
  j .LBB0_10
.LBB0_6:
  addi a0, zero, 4
  sb a0, -33(s0)
  j .LBB0_11
.LBB0_7:
  addi a0, zero, 2
  sb a0, -33(s0)
  j .LBB0_11
.LBB0_8:
  addi a0, zero, 3
  sb a0, -33(s0)
  j .LBB0_11
.LBB0_9:
  addi a0, zero, 1
  sb a0, -33(s0)
  j .LBB0_11
.LBB0_10:
  mv a0, zero
  sb a0, -33(s0)
  j .LBB0_11
.LBB0_11:
  lb a0, -33(s0)
  ret
```

</details>

To solve this problem, RISC-V provided with `fclass.{s|d|q}` instructions. According to RISC-V's spec Section 11.9, instruction `fclass.s rd, rs1` examines `rs1` as 32-bit floating point number, and store its type into `rd`. By this way we use register `rd` and to judge which enum value of Rust standard library we should return. 

I'd like to explain this procedure in Rust code. The new way looks like this:

```rust
pub fn classify_riscv_rvf(input: f32) -> FpCategory {
    let ans: usize;
    // step 1: map this f32 value into RISC-V defined integer type number
    // this procedure could be built in into compiler
    unsafe { llvm_asm!(
        "fclass.s a0, fa0"
        :"={a0}"(ans)  
        :"{fa0}"(input)
        :
        :"intel"
    ) };
    // step 2: convert from return flags to FpCategory enum value
    if ans & 0b10000001 != 0 {
        return FpCategory::Infinite;
    }
    if ans & 0b01000010 != 0 {
        return FpCategory::Normal;
    }
    if ans & 0b00100100 != 0 {
        return FpCategory::Subnormal;
    }
    if ans & 0b00011000 != 0 {
        return FpCategory::Zero;
    }
    FpCategory::Nan
}
```

It compiles into the following assembly code which is shorter and could be executed faster:

```asm
example::classify_riscv_rvf:
        fclass.s a0, fa0
        andi    a2, a0, 129
        addi    a1, zero, 1
        beqz    a2, .LBB0_2
.LBB0_1:
        add     a0, zero, a1
        ret
.LBB0_2:
        andi    a2, a0, 66
        addi    a1, zero, 4
        bnez    a2, .LBB0_1
        andi    a2, a0, 36
        addi    a1, zero, 3
        bnez    a2, .LBB0_1
        andi    a0, a0, 24
        snez    a0, a0
        slli    a1, a0, 1
        add     a0, zero, a1
        ret
```

For `f64` types, we could use `fclass.d` instruction other than `fclass.s` for `f32`s. If in the future we had a chance to introduce `f128` primitive type, there's also `fclass.q` instruction. After using these instructions, it improves speed on this function in RISC-V platforms. This enhancement is especially significant for embedded devices. I suggest to change the implementation of this function in the standard library. We may implement it by any of following ways:

1. Implement `fclassf32` and `fclassf64` intrinsics function into `core::instrinsics`, and call them in `f32::classify` or `f64::classify`. These functions can be implemented with special instruction or fallback in other platforms;
2. Use inline assembly directly in standard library and add a `#[cfg(..)]` to compile it only in RISC-V targets with floating extension `F` or `D` respectively, or fallback in other platforms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `fclass.{s|d|q}` instruction for float point classification in RISC-V targets #73015

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use fclass.{s|d|q} instruction for float point classification in RISC-V targets #73015

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Use `fclass.{s|d|q}` instruction for float point classification in RISC-V targets #73015