|
10 | 10 | //
|
11 | 11 | //===----------------------------------------------------------------------===//
|
12 | 12 |
|
| 13 | +//===----------------------------------------------------------------------===// |
| 14 | +// Guidelines on NVPTX Intrinsic design |
| 15 | +//===----------------------------------------------------------------------===// |
| 16 | +// |
| 17 | +// The NVPTX intrinsics are used to model instructions in the PTX ISA. |
| 18 | +// While simpler intrinsics can represent certain features effectively, |
| 19 | +// more complex instructions like TMA and MMA are not as straightforward |
| 20 | +// to model. A single variant of these complex instructions can expand |
| 21 | +// into hundreds of intrinsics. Additionally, any expansion in the |
| 22 | +// corresponding ISA can exponentially increase these numbers, making it |
| 23 | +// difficult to manage them in the IR and backend passes. Therefore, |
| 24 | +// a careful design of intrinsic interfaces can ease maintenance and |
| 25 | +// contribute to a sustainable, long-term solution. |
| 26 | +// |
| 27 | +// Below are a set of guidelines that may help in choosing |
| 28 | +// an appropriate design for the complex intrinsics: |
| 29 | +// |
| 30 | +// 1. If there are only a few intrinsics, prefer a flat design |
| 31 | +// where the intrinsic name encodes all relevant details, |
| 32 | +// and includes only the arguments used by the actual instruction. |
| 33 | +// 2. As the number of intrinsics grows, it is desirable to consolidate |
| 34 | +// them. NVPTX uses a 'flags'-based design where each flag argument |
| 35 | +// represents one set of instruction modifiers. These flags are |
| 36 | +// compile-time integer constants. |
| 37 | +// |
| 38 | +// 3. When an intrinsic uses flags, document it with details of the |
| 39 | +// flag usage in the ``NVPTXUsage.rst`` file. |
| 40 | +// 4. Use i1 for boolean flags and i8 for other flag types. |
| 41 | +// 5. Annotate all flag arguments with ImmArg<ArgIdx<>>. |
| 42 | +// 6. Place the flag arguments at the end of the (actual)argument list. |
| 43 | +// |
| 44 | +// 7. Identify the key features of an intrinsic and distinguish between |
| 45 | +// first-order and supplementary information. Typically, encoding the |
| 46 | +// first-order information in the intrinsic name while using flags |
| 47 | +// for supplementary details improves readability. |
| 48 | +// For example: |
| 49 | +// |
| 50 | +// i. For MMA intrinsics, 'dense' vs. 'sparse' is a fundamental feature, |
| 51 | +// whereas an optional scaling applied to matrices is relatively secondary. |
| 52 | +// |
| 53 | +// ii. For TMAs, the mode of copy (e.g., 'Tile' or 'Im2col') is a first-order |
| 54 | +// information, while features like an optional cache hint tend to be |
| 55 | +// secondary. |
| 56 | +// |
| 57 | +// 8. If there are invalid combinations within a set of modifiers, avoid |
| 58 | +// encoding them as flags, as much as possible. This helps reduce the |
| 59 | +// need for error handling of unsupported cases in the backend. |
| 60 | +// For example, some 'cvt' intrinsics support only a subset of the |
| 61 | +// possible rounding modes; so it is preferable not to encode the |
| 62 | +// rounding modes as flags. |
| 63 | +// 9. Similarly, when there are invalid combinations across a set of |
| 64 | +// modifiers, avoid encoding them as flags to prevent additional |
| 65 | +// complexity in error handling. |
| 66 | +// |
| 67 | +// 10. Maintain a consistent design within an intrinsic family, including |
| 68 | +// argument ordering as well as the usage and ordering of flags. |
| 69 | +// 11. When designing an intrinsic corresponding to an instruction or its variant, |
| 70 | +// consider the entire instruction family. This may reveal common features |
| 71 | +// that can be modelled consistently across the family. |
| 72 | +// |
| 73 | +// In summary, strive to balance the aspects mentioned above, to achieve |
| 74 | +// a scalable design with maximum readability. |
| 75 | +//===----------------------------------------------------------------------===// |
| 76 | + |
13 | 77 | // The following intrinsics were once defined here, but are now auto-upgraded
|
14 | 78 | // to target-generic LLVM intrinsics.
|
15 | 79 | //
|
|
0 commit comments