Skip to content

Commit 27c4e4a

Browse files
committed
[NVPTX][Docs] [NFC] Update docs on intrinsics
Recently, we have added a set of complex intrinsics on TMA, tcgen05 and Cvt family of instructions. This patch captures the key learnings from our experience so far and documents them as guidelines for future design. Signed-off-by: Durgadoss R <[email protected]>
1 parent 9269aae commit 27c4e4a

File tree

1 file changed

+64
-0
lines changed

1 file changed

+64
-0
lines changed

llvm/include/llvm/IR/IntrinsicsNVVM.td

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,70 @@
1010
//
1111
//===----------------------------------------------------------------------===//
1212

13+
//===----------------------------------------------------------------------===//
14+
// Guidelines on NVPTX Intrinsic design
15+
//===----------------------------------------------------------------------===//
16+
//
17+
// The NVPTX intrinsics are used to model instructions in the PTX ISA.
18+
// While simpler intrinsics can represent certain features effectively,
19+
// more complex instructions like TMA and MMA are not as straightforward
20+
// to model. A single variant of these complex instructions can expand
21+
// into hundreds of intrinsics. Additionally, any expansion in the
22+
// corresponding ISA can exponentially increase these numbers, making it
23+
// difficult to manage them in the IR and backend passes. Therefore,
24+
// a careful design of intrinsic interfaces can ease maintenance and
25+
// contribute to a sustainable, long-term solution.
26+
//
27+
// Below are a set of guidelines that may help in choosing
28+
// an appropriate design for the complex intrinsics:
29+
//
30+
// 1. If there are only a few intrinsics, prefer a flat design
31+
// where the intrinsic name encodes all relevant details,
32+
// and includes only the arguments used by the actual instruction.
33+
// 2. As the number of intrinsics grows, it is desirable to consolidate
34+
// them. NVPTX uses a 'flags'-based design where each flag argument
35+
// represents one set of instruction modifiers. These flags are
36+
// compile-time integer constants.
37+
//
38+
// 3. When an intrinsic uses flags, document it with details of the
39+
// flag usage in the ``NVPTXUsage.rst`` file.
40+
// 4. Use i1 for boolean flags and i8 for other flag types.
41+
// 5. Annotate all flag arguments with ImmArg<ArgIdx<>>.
42+
// 6. Place the flag arguments at the end of the (actual)argument list.
43+
//
44+
// 7. Identify the key features of an intrinsic and distinguish between
45+
// first-order and supplementary information. Typically, encoding the
46+
// first-order information in the intrinsic name while using flags
47+
// for supplementary details improves readability.
48+
// For example:
49+
//
50+
// i. For MMA intrinsics, 'dense' vs. 'sparse' is a fundamental feature,
51+
// whereas an optional scaling applied to matrices is relatively secondary.
52+
//
53+
// ii. For TMAs, the mode of copy (e.g., 'Tile' or 'Im2col') is a first-order
54+
// information, while features like an optional cache hint tend to be
55+
// secondary.
56+
//
57+
// 8. If there are invalid combinations within a set of modifiers, avoid
58+
// encoding them as flags, as much as possible. This helps reduce the
59+
// need for error handling of unsupported cases in the backend.
60+
// For example, some 'cvt' intrinsics support only a subset of the
61+
// possible rounding modes; so it is preferable not to encode the
62+
// rounding modes as flags.
63+
// 9. Similarly, when there are invalid combinations across a set of
64+
// modifiers, avoid encoding them as flags to prevent additional
65+
// complexity in error handling.
66+
//
67+
// 10. Maintain a consistent design within an intrinsic family, including
68+
// argument ordering as well as the usage and ordering of flags.
69+
// 11. When designing an intrinsic corresponding to an instruction or its variant,
70+
// consider the entire instruction family. This may reveal common features
71+
// that can be modelled consistently across the family.
72+
//
73+
// In summary, strive to balance the aspects mentioned above, to achieve
74+
// a scalable design with maximum readability.
75+
//===----------------------------------------------------------------------===//
76+
1377
// The following intrinsics were once defined here, but are now auto-upgraded
1478
// to target-generic LLVM intrinsics.
1579
//

0 commit comments

Comments
 (0)