Skip to content

Commit fdfede6

Browse files
committed
[InstCombine] Add contributor guide
1 parent 8f68022 commit fdfede6

File tree

2 files changed

+386
-0
lines changed

2 files changed

+386
-0
lines changed
Lines changed: 381 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,381 @@
1+
# InstCombine contributor guide
2+
3+
This guide lays out a series of rules that contributions to InstCombine should
4+
follow. **Following these rules will results in much faster PR approvals.**
5+
6+
## Tests
7+
8+
### Precommit tests
9+
10+
Tests for new optimizations or miscompilation fixes should be pre-committed.
11+
This means that you first commit the test with CHECK lines prior to your fix.
12+
Your actual change will then only contain CHECK line diffs relative to that
13+
baseline.
14+
15+
This means that pull requests should generally contain two commits: First,
16+
one commit adding new tests with baseline check lines. Second, a commit with
17+
functional changes and test diffs.
18+
19+
If the second commit in your PR does not contain test diffs, you did something
20+
wrong. Either you made a mistake when generating CHECK lines, or your tests are
21+
not actually affected by your patch.
22+
23+
Exceptions: When fixing assertion failures or infinite loops, do not pre-commit
24+
tests.
25+
26+
### Use `update_test_checks.py`
27+
28+
CHECK lines should be generated using the `update_test_checks.py` script. Do
29+
**not** manually edit check lines after using it.
30+
31+
Be sure to use the correct opt binary when using the script. For example, if
32+
your build directory is `build`, then you'll want to run:
33+
34+
```sh
35+
llvm/utils/update_test_checks.py --opt-binary build/bin/opt \
36+
llvm/test/Transforms/InstCombine/the_test.ll
37+
```
38+
39+
Exceptions: Hand-written CHECK lines are allowed for debuginfo tests.
40+
41+
### General testing considerations
42+
43+
Place all tests relating to a transform into a single file. If you are adding
44+
a regression test for a crash/miscompile in an existing transform, find the
45+
file where the existing tests are located. A good way to do that is to comment
46+
out the transform and see which tests fail.
47+
48+
Make tests minimal. Only test exactly the pattern being transformed. If your
49+
original motivating case is a larger pattern that your fold enables to
50+
optimize in some non-trivial way, you may add it as well -- however, the bulk
51+
of the test coverage should be minimal.
52+
53+
Give tests short, but meaningful names. Don't call them `@test1`, `@test2` etc.
54+
For example, a test checking multi-use behavior of a fold involving the
55+
addition of two selects might be called `@add_of_selects_multi_use`.
56+
57+
Add representative tests for each test category (discussed below), but don't
58+
test all combinations of everything. If you have multi-use tests, and you have
59+
commuted tests, you shouldn't also add commuted multi-use tests.
60+
61+
Prefer to keep bit-widths for tests low to improve performance of proof checking using alive2. Using `i8` is better than `i128` where possible.
62+
63+
### Add negative tests
64+
65+
Make sure to add tests for which your transform does **not** apply. Start with
66+
one of the test cases that succeeds and then create a sequence of negative
67+
tests, such that **exactly one** different pre-condition of your transform is
68+
not satisfied in each test.
69+
70+
### Add multi-use tests
71+
72+
Add multi-use tests that ensures your transform does not increase instruction
73+
count if some instructions have additional uses. The standard pattern is to
74+
introduce extra uses with function calls:
75+
76+
```llvm
77+
declare void @use(i8)
78+
79+
define i8 @add_mul_const_multi_use(i8 %x) {
80+
%add = add i8 %x, 1
81+
call void @use(i8 %add)
82+
%mul = mul i8 %add, 3
83+
ret i8 %mul
84+
}
85+
```
86+
87+
Exceptions: For transform that only produce one instruction, multi-use tests
88+
may be omitted.
89+
90+
### Add commuted tests
91+
92+
If the transform involves commutative operations, add tests with commuted
93+
(swapped) operands.
94+
95+
Make sure that the operand order stays intact in the CHECK lines of your
96+
pre-commited tests. You should not see something like this:
97+
98+
```llvm
99+
; CHECK-NEXT: [[OR:%.*]] = or i8 [[X]], [[Y]]
100+
; ...
101+
%or = or i8 %y, %x
102+
```
103+
104+
If this happens, you may need to change one of the operands to have higher
105+
complexity (include the "thwart" comment in that case):
106+
107+
```llvm
108+
%y2 = mul i8 %y, %y ; thwart complexity-based canonicalization
109+
%or = or i8 %y, %x
110+
```
111+
112+
### Add vector tests
113+
114+
When possible, it is recommended to add at least one test that uses vectors
115+
instead of scalars.
116+
117+
For patterns that include constants, we distinguish three kinds of tests.
118+
The first are "splat" vectors, where all the vector elements are the same.
119+
These tests *should* usually fold without additional effort.
120+
121+
```llvm
122+
define <2 x i8> @add_mul_const_vec_splat(<2 x i8> %x) {
123+
%add = add <2 x i8> %x, <i8 1, i8 1>
124+
%mul = mul <2 x i8> %add, <i8 3, i8 3>
125+
ret <2 x i8> %mul
126+
}
127+
```
128+
129+
A minor variant is to replace some of the splat elements with poison. These
130+
will often also fold without additional effort.
131+
132+
```llvm
133+
define <2 x i8> @add_mul_const_vec_splat_poison(<2 x i8> %x) {
134+
%add = add <2 x i8> %x, <i8 1, i8 poison>
135+
%mul = mul <2 x i8> %add, <i8 3, i8 poison>
136+
ret <2 x i8> %mul
137+
}
138+
```
139+
140+
Finally, you can have non-splat vectors, where the vector elements are not
141+
the same:
142+
143+
```llvm
144+
define <2 x i8> @add_mul_const_vec_non_splat(<2 x i8> %x) {
145+
%add = add <2 x i8> %x, <i8 1, i8 5>
146+
%mul = mul <2 x i8> %add, <i8 3, i8 6>
147+
ret <2 x i8> %mul
148+
}
149+
```
150+
151+
Non-splat vectors will often not fold by default. You should **not** try to
152+
make them fold, unless doing so does not add **any** additional complexity.
153+
You should still add the test though, even if it does not fold.
154+
155+
### Poison flag tests
156+
157+
If your transform involves instructions that can have poison-generating flags,
158+
such as `nuw` and `nsw` on `add`, you should test how these interact with the
159+
transform.
160+
161+
If your transform *requires* a certain flag for correctness, make sure to add
162+
negative tests missing the required flag.
163+
164+
If your transform doesn't require flags for correctness, you should have tests
165+
for preservation behavior. If the input instructions have certain flags, are
166+
they preserved in the output instructions, if it is valid to preserve them?
167+
(This depends on the transform. Check with alive2.)
168+
169+
### Other tests
170+
171+
The test categories mentioned above are non-exhaustive. There may be more tests
172+
to be added, depending on the instructions involved in the transform. Some
173+
examples:
174+
175+
* For folds involving memory accesses like load/store, check that scalable vectors and non-byte-size types (like i3) are handled correctly. Also check that volatile/atomic are handled.
176+
* For folds that interact with the bitwidth in some non-trivial way, check an illegal type like i13. Also confirm that the transform is correct for i1.
177+
* For folds that involve phis, you may want to check that the case of multiple incoming values from one block is handled correctly.
178+
179+
## Proofs
180+
181+
Your pull request description should contain one or more
182+
[alive2 proofs](https://alive2.llvm.org/ce/) for the correctness of the
183+
proposed transform.
184+
185+
### Basics
186+
187+
Proofs are written using LLVM IR, by specifying a `@src` and `@tgt` function.
188+
It is possible to include multiple proofs in a single file by giving the src
189+
and tgt functions matching suffixes.
190+
191+
For example, here is a pair of proofs that both `(x-y)+y` and `(x+y)-y` can
192+
be simplified to `x` ([online](https://alive2.llvm.org/ce/z/MsPPGz)):
193+
194+
```llvm
195+
define i8 @src_add_sub(i8 %x, i8 %y) {
196+
%add = add i8 %x, %y
197+
%sub = sub i8 %add, %y
198+
ret i8 %sub
199+
}
200+
201+
define i8 @tgt_add_sub(i8 %x, i8 %y) {
202+
ret i8 %x
203+
}
204+
205+
206+
define i8 @src_sub_add(i8 %x, i8 %y) {
207+
%sub = sub i8 %x, %y
208+
%add = add i8 %sub, %y
209+
ret i8 %add
210+
}
211+
212+
define i8 @tgt_sub_add(i8 %x, i8 %y) {
213+
ret i8 %x
214+
}
215+
```
216+
217+
### Use generic values in proofs
218+
219+
Proofs should operate on generic values, rather than specific constants, to the degree that this is possible.
220+
221+
For example, this is a **bad** proof:
222+
223+
```llvm
224+
; Don't do this!
225+
define i8 @src(i8 %x) {
226+
%smax = call i8 @llvm.smax.i8(i8 %x, i8 5)
227+
%umax = call i8 @llvm.umax.i8(i8 %smax, i8 7)
228+
ret i8 %umax
229+
}
230+
231+
define i8 @tgt(i8 %x) {
232+
%smax = call i8 @llvm.smax.i8(i8 %x, i8 7)
233+
ret i8 %smax
234+
}
235+
```
236+
237+
The correct way to write this proof is as follows
238+
([online](https://alive2.llvm.org/ce/z/Sgfq37)):
239+
240+
```
241+
define i8 @src(i8 %x, i8 %c1, i8 %c2) {
242+
%cond1 = icmp sgt i8 %c1, -1
243+
call void @llvm.assume(i1 %cond1)
244+
%cond2 = icmp sgt i8 %c2, -1
245+
call void @llvm.assume(i1 %cond2)
246+
247+
%smax = call i8 @llvm.smax.i8(i8 %x, i8 %c1)
248+
%umax = call i8 @llvm.umax.i8(i8 %smax, i8 %c2)
249+
ret i8 %umax
250+
}
251+
252+
define i8 @tgt(i8 %x, i8 %c1, i8 %c2) {
253+
%umax = call i8 @llvm.umax.i8(i8 %c1, i8 %c2)
254+
%smax = call i8 @llvm.smax.i8(i8 %x, i8 %umax)
255+
ret i8 %smax
256+
}
257+
```
258+
259+
Note that the `@llvm.assume` intrinsic is used to specify pre-conditions for
260+
the transform. In this case, it specifies that the constants `%c1` and `%c2`
261+
must be non-negative.
262+
263+
It should be emphasized that there is, in general, no expectation that the
264+
IR in the proofs will be transformed by the implemented fold. In the above
265+
example, the transform would only actually apply if `%c1` and `%c2` are
266+
actually constants, but we need to use non-constants in the proof.
267+
268+
### Common pre-conditions
269+
270+
Here are some examples of common preconditions.
271+
272+
```llvm
273+
; %x is non-negative:
274+
%nonneg = icmp sgt i8 %x, -1
275+
call void @llvm.assume(i1 %nonneg)
276+
277+
; %x is a power of two:
278+
%ctpop = call i8 @llvm.ctpop.i8(i8 %x)
279+
%pow2 = icmp eq i8 %x, 1
280+
call void @llvm.assume(i1 %pow2)
281+
282+
; %x is a power of two or zero:
283+
%ctpop = call i8 @llvm.ctpop.i8(i8 %x)
284+
%pow2orzero = icmp ult i8 %x, 2
285+
call void @llvm.assume(i1 %pow2orzero)
286+
287+
; Adding %x and %y does not overflow in a signed sense:
288+
%wo = call { i8, i1 } @llvm.sadd.with.overflow(i8 %x, i8 %y)
289+
%ov = extractvalue { i8, i1 } %wo, 1
290+
%ov.not = xor i1 %ov, true
291+
call void @llvm.assume(i1 %ov.not)
292+
```
293+
294+
### Timeouts
295+
296+
Alive2 proofs will sometimes produce a timeout with the following message:
297+
298+
```
299+
Alive2 timed out while processing your query.
300+
There are a few things you can try:
301+
302+
- remove extraneous instructions, if any
303+
304+
- reduce variable widths, for example to i16, i8, or i4
305+
306+
- add the --disable-undef-input command line flag, which
307+
allows Alive2 to assume that arguments to your IR are not
308+
undef. This is, in general, unsound: it can cause Alive2
309+
to miss bugs.
310+
```
311+
312+
This is good advice, follow it!
313+
314+
Reducing the bitwidth usually helps. For floating point numbers, you can use
315+
the `half` type for bitwidth reduction purposes. For pointers, you can reduce
316+
the bitwidth by specifying a custom data layout:
317+
318+
```
319+
; For 16-bit pointers
320+
target datalayout = "p:16:16"
321+
```
322+
323+
If reducing the bitwidth does not help, try `-disable-undef-input`. If this
324+
still does not work, submit your proof despite the timeout.
325+
326+
## Implementation
327+
328+
### Real-world usefulness
329+
330+
There is a very large number of transforms that *could* be implemented, but
331+
only a tiny fraction of them are useful for real-world code.
332+
333+
Transforms that do not have real-world usefulness provide *negative* value to
334+
the LLVM project, by taking up valuable reviewer time, increasing code
335+
complexity and increasing compile-time overhead.
336+
337+
We do not require explicit proof of real-world usefulness for every transform
338+
-- in many cases the usefulness is fairly "obvious". However, the question may
339+
come up for complex or unusual folds. Keep this in mind when chosing what you
340+
work on.
341+
342+
In particular, fixes for fuzzer-generated missed optimization reports will
343+
likely be rejected if there is no evidence of broader usefulness.
344+
345+
### Pick the correct optimization pass
346+
347+
There are a number of passes and utilities in the InstCombine family, and it
348+
is important to pick the right place when implementing a fold.
349+
350+
* `ConstantFolding`: For folding instructions with constant arguments to a constant. (Mainly relevant for intrinsics.)
351+
* `ValueTracking`: For analyzing instructions, e.g. for known bits, non-zero, etc. Tests should usually use `-passes=instsimplify`.
352+
* `InstructionSimplify`: For folds that do not create new instructions (either fold to existing value or constant).
353+
* `InstCombine`: For folds that create or modify instructions.
354+
* `AggressiveInstCombine`: For folds that are expensive, or violate InstCombine requirements.
355+
* `VectorCombine`: For folds of vector operations that require target-dependent cost-modelling.
356+
357+
Sometimes, folds that logically belong in InstSimplify are placed in InstCombine instead, for example because they are too expensive, or because they are structurally simpler to implement in InstCombine.
358+
359+
For example, if a fold produces new instructions in some cases but returns an existing value in others, it may be preferable to keep all cases in InstCombine, rather than trying to split them among InstCombine and InstSimplify.
360+
361+
### Target-dependence
362+
363+
InstCombine is a target-independent canonicalization pass. Transforms that
364+
depend on target-specific cost-modelling **will be rejected**. The only
365+
permitted target-dependence is on DataLayout and TargetLibraryInfo.
366+
367+
The use of TargetTransformInfo is only allowed for hooks for target-specific
368+
intrinsics, such as `TargetTransformInfo::instCombineIntrinsic()`. These are
369+
already inherently target-dependent anyway.
370+
371+
Approved alternatives to the use of TTI in InstCombine are:
372+
373+
* For vector-specific transforms in particular, use VectorCombine instead.
374+
* Perform a target-independent canonicalization in InstCombine, and then undo
375+
it in the backend for targets where it is not profitable.
376+
* If there are no other possible solutions, a fold using target-dependent cost
377+
modelling may be accepted into AggressiveInstCombine.
378+
379+
### Multi-use handling
380+
381+
### Generalization

0 commit comments

Comments
 (0)