Skip to content

Commit a897dcf

Browse files
committed
PeepholeOpt: Fix looking for def of current copy to coalesce
This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.
1 parent bd710cb commit a897dcf

File tree

105 files changed

+2727
-2725
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

105 files changed

+2727
-2725
lines changed

llvm/lib/CodeGen/PeepholeOptimizer.cpp

Lines changed: 28 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -465,7 +465,8 @@ class PeepholeOptimizer : private MachineFunction::Delegate {
465465
bool optimizeUncoalescableCopy(MachineInstr &MI,
466466
SmallPtrSetImpl<MachineInstr *> &LocalMIs);
467467
bool optimizeRecurrence(MachineInstr &PHI);
468-
bool findNextSource(RegSubRegPair RegSubReg, RewriteMapTy &RewriteMap);
468+
bool findNextSource(const TargetRegisterClass *DefRC, unsigned DefSubReg,
469+
RegSubRegPair RegSubReg, RewriteMapTy &RewriteMap);
469470
bool isMoveImmediate(MachineInstr &MI, SmallSet<Register, 4> &ImmDefRegs,
470471
DenseMap<Register, MachineInstr *> &ImmDefMIs);
471472
bool foldImmediate(MachineInstr &MI, SmallSet<Register, 4> &ImmDefRegs,
@@ -1002,17 +1003,15 @@ bool PeepholeOptimizer::optimizeCondBranch(MachineInstr &MI) {
10021003
/// share the same register file as \p Reg and \p SubReg. The client should
10031004
/// then be capable to rewrite all intermediate PHIs to get the next source.
10041005
/// \return False if no alternative sources are available. True otherwise.
1005-
bool PeepholeOptimizer::findNextSource(RegSubRegPair RegSubReg,
1006+
bool PeepholeOptimizer::findNextSource(const TargetRegisterClass *DefRC,
1007+
unsigned DefSubReg,
1008+
RegSubRegPair RegSubReg,
10061009
RewriteMapTy &RewriteMap) {
10071010
// Do not try to find a new source for a physical register.
10081011
// So far we do not have any motivating example for doing that.
10091012
// Thus, instead of maintaining untested code, we will revisit that if
10101013
// that changes at some point.
10111014
Register Reg = RegSubReg.Reg;
1012-
if (Reg.isPhysical())
1013-
return false;
1014-
const TargetRegisterClass *DefRC = MRI->getRegClass(Reg);
1015-
10161015
SmallVector<RegSubRegPair, 4> SrcToLook;
10171016
RegSubRegPair CurSrcPair = RegSubReg;
10181017
SrcToLook.push_back(CurSrcPair);
@@ -1076,7 +1075,7 @@ bool PeepholeOptimizer::findNextSource(RegSubRegPair RegSubReg,
10761075

10771076
// Keep following the chain if the value isn't any better yet.
10781077
const TargetRegisterClass *SrcRC = MRI->getRegClass(CurSrcPair.Reg);
1079-
if (!TRI->shouldRewriteCopySrc(DefRC, RegSubReg.SubReg, SrcRC,
1078+
if (!TRI->shouldRewriteCopySrc(DefRC, DefSubReg, SrcRC,
10801079
CurSrcPair.SubReg))
10811080
continue;
10821081

@@ -1184,21 +1183,33 @@ bool PeepholeOptimizer::optimizeCoalescableCopyImpl(Rewriter &&CpyRewriter) {
11841183
bool Changed = false;
11851184
// Get the right rewriter for the current copy.
11861185
// Rewrite each rewritable source.
1187-
RegSubRegPair Src;
1186+
RegSubRegPair Dst;
11881187
RegSubRegPair TrackPair;
1189-
while (CpyRewriter.getNextRewritableSource(Src, TrackPair)) {
1188+
while (CpyRewriter.getNextRewritableSource(TrackPair, Dst)) {
1189+
if (Dst.Reg.isPhysical()) {
1190+
// Do not try to find a new source for a physical register.
1191+
// So far we do not have any motivating example for doing that.
1192+
// Thus, instead of maintaining untested code, we will revisit that if
1193+
// that changes at some point.
1194+
continue;
1195+
}
1196+
1197+
const TargetRegisterClass *DefRC = MRI->getRegClass(Dst.Reg);
1198+
11901199
// Keep track of PHI nodes and its incoming edges when looking for sources.
11911200
RewriteMapTy RewriteMap;
11921201
// Try to find a more suitable source. If we failed to do so, or get the
11931202
// actual source, move to the next source.
1194-
if (!findNextSource(TrackPair, RewriteMap))
1203+
if (!findNextSource(DefRC, Dst.SubReg, TrackPair, RewriteMap))
11951204
continue;
11961205

11971206
// Get the new source to rewrite. TODO: Only enable handling of multiple
11981207
// sources (PHIs) once we have a motivating example and testcases for it.
11991208
RegSubRegPair NewSrc = getNewSource(MRI, TII, TrackPair, RewriteMap,
12001209
/*HandleMultipleSources=*/false);
1201-
if (Src.Reg == NewSrc.Reg || NewSrc.Reg == 0)
1210+
assert(TrackPair.Reg != NewSrc.Reg &&
1211+
"should not rewrite source to original value");
1212+
if (!NewSrc.Reg)
12021213
continue;
12031214

12041215
// Rewrite source.
@@ -1325,9 +1336,14 @@ bool PeepholeOptimizer::optimizeUncoalescableCopy(
13251336
if (Def.Reg.isPhysical())
13261337
return false;
13271338

1339+
// FIXME: Uncoalescable copies are treated differently by
1340+
// UncoalescableRewriter, and this probably should not share
1341+
// API. getNextRewritableSource really finds rewritable defs.
1342+
const TargetRegisterClass *DefRC = MRI->getRegClass(Def.Reg);
1343+
13281344
// If we do not know how to rewrite this definition, there is no point
13291345
// in trying to kill this instruction.
1330-
if (!findNextSource(Def, RewriteMap))
1346+
if (!findNextSource(DefRC, Def.SubReg, Def, RewriteMap))
13311347
return false;
13321348

13331349
RewritePairs.push_back(Def);

llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lse2.ll

Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ define dso_local i8 @atomicrmw_xchg_i8_aligned_monotonic(ptr %ptr, i8 %value) {
1212
; -O0: subs w8, w8, w10, uxtb
1313
;
1414
; -O1-LABEL: atomicrmw_xchg_i8_aligned_monotonic:
15-
; -O1: ldxrb w8, [x0]
16-
; -O1: stxrb w9, w1, [x0]
15+
; -O1: ldxrb w0, [x8]
16+
; -O1: stxrb w9, w1, [x8]
1717
%r = atomicrmw xchg ptr %ptr, i8 %value monotonic, align 1
1818
ret i8 %r
1919
}
@@ -27,8 +27,8 @@ define dso_local i8 @atomicrmw_xchg_i8_aligned_acquire(ptr %ptr, i8 %value) {
2727
; -O0: subs w8, w8, w10, uxtb
2828
;
2929
; -O1-LABEL: atomicrmw_xchg_i8_aligned_acquire:
30-
; -O1: ldaxrb w8, [x0]
31-
; -O1: stxrb w9, w1, [x0]
30+
; -O1: ldaxrb w0, [x8]
31+
; -O1: stxrb w9, w1, [x8]
3232
%r = atomicrmw xchg ptr %ptr, i8 %value acquire, align 1
3333
ret i8 %r
3434
}
@@ -42,8 +42,8 @@ define dso_local i8 @atomicrmw_xchg_i8_aligned_release(ptr %ptr, i8 %value) {
4242
; -O0: subs w8, w8, w10, uxtb
4343
;
4444
; -O1-LABEL: atomicrmw_xchg_i8_aligned_release:
45-
; -O1: ldxrb w8, [x0]
46-
; -O1: stlxrb w9, w1, [x0]
45+
; -O1: ldxrb w0, [x8]
46+
; -O1: stlxrb w9, w1, [x8]
4747
%r = atomicrmw xchg ptr %ptr, i8 %value release, align 1
4848
ret i8 %r
4949
}
@@ -57,8 +57,8 @@ define dso_local i8 @atomicrmw_xchg_i8_aligned_acq_rel(ptr %ptr, i8 %value) {
5757
; -O0: subs w8, w8, w10, uxtb
5858
;
5959
; -O1-LABEL: atomicrmw_xchg_i8_aligned_acq_rel:
60-
; -O1: ldaxrb w8, [x0]
61-
; -O1: stlxrb w9, w1, [x0]
60+
; -O1: ldaxrb w0, [x8]
61+
; -O1: stlxrb w9, w1, [x8]
6262
%r = atomicrmw xchg ptr %ptr, i8 %value acq_rel, align 1
6363
ret i8 %r
6464
}
@@ -72,8 +72,8 @@ define dso_local i8 @atomicrmw_xchg_i8_aligned_seq_cst(ptr %ptr, i8 %value) {
7272
; -O0: subs w8, w8, w10, uxtb
7373
;
7474
; -O1-LABEL: atomicrmw_xchg_i8_aligned_seq_cst:
75-
; -O1: ldaxrb w8, [x0]
76-
; -O1: stlxrb w9, w1, [x0]
75+
; -O1: ldaxrb w0, [x8]
76+
; -O1: stlxrb w9, w1, [x8]
7777
%r = atomicrmw xchg ptr %ptr, i8 %value seq_cst, align 1
7878
ret i8 %r
7979
}
@@ -86,8 +86,8 @@ define dso_local i16 @atomicrmw_xchg_i16_aligned_monotonic(ptr %ptr, i16 %value)
8686
; -O0: subs w8, w8, w9, uxth
8787
;
8888
; -O1-LABEL: atomicrmw_xchg_i16_aligned_monotonic:
89-
; -O1: ldxrh w8, [x0]
90-
; -O1: stxrh w9, w1, [x0]
89+
; -O1: ldxrh w0, [x8]
90+
; -O1: stxrh w9, w1, [x8]
9191
%r = atomicrmw xchg ptr %ptr, i16 %value monotonic, align 2
9292
ret i16 %r
9393
}
@@ -100,8 +100,8 @@ define dso_local i16 @atomicrmw_xchg_i16_aligned_acquire(ptr %ptr, i16 %value) {
100100
; -O0: subs w8, w8, w9, uxth
101101
;
102102
; -O1-LABEL: atomicrmw_xchg_i16_aligned_acquire:
103-
; -O1: ldaxrh w8, [x0]
104-
; -O1: stxrh w9, w1, [x0]
103+
; -O1: ldaxrh w0, [x8]
104+
; -O1: stxrh w9, w1, [x8]
105105
%r = atomicrmw xchg ptr %ptr, i16 %value acquire, align 2
106106
ret i16 %r
107107
}
@@ -114,8 +114,8 @@ define dso_local i16 @atomicrmw_xchg_i16_aligned_release(ptr %ptr, i16 %value) {
114114
; -O0: subs w8, w8, w9, uxth
115115
;
116116
; -O1-LABEL: atomicrmw_xchg_i16_aligned_release:
117-
; -O1: ldxrh w8, [x0]
118-
; -O1: stlxrh w9, w1, [x0]
117+
; -O1: ldxrh w0, [x8]
118+
; -O1: stlxrh w9, w1, [x8]
119119
%r = atomicrmw xchg ptr %ptr, i16 %value release, align 2
120120
ret i16 %r
121121
}
@@ -128,8 +128,8 @@ define dso_local i16 @atomicrmw_xchg_i16_aligned_acq_rel(ptr %ptr, i16 %value) {
128128
; -O0: subs w8, w8, w9, uxth
129129
;
130130
; -O1-LABEL: atomicrmw_xchg_i16_aligned_acq_rel:
131-
; -O1: ldaxrh w8, [x0]
132-
; -O1: stlxrh w9, w1, [x0]
131+
; -O1: ldaxrh w0, [x8]
132+
; -O1: stlxrh w9, w1, [x8]
133133
%r = atomicrmw xchg ptr %ptr, i16 %value acq_rel, align 2
134134
ret i16 %r
135135
}
@@ -142,8 +142,8 @@ define dso_local i16 @atomicrmw_xchg_i16_aligned_seq_cst(ptr %ptr, i16 %value) {
142142
; -O0: subs w8, w8, w9, uxth
143143
;
144144
; -O1-LABEL: atomicrmw_xchg_i16_aligned_seq_cst:
145-
; -O1: ldaxrh w8, [x0]
146-
; -O1: stlxrh w9, w1, [x0]
145+
; -O1: ldaxrh w0, [x8]
146+
; -O1: stlxrh w9, w1, [x8]
147147
%r = atomicrmw xchg ptr %ptr, i16 %value seq_cst, align 2
148148
ret i16 %r
149149
}
@@ -392,8 +392,8 @@ define dso_local i8 @atomicrmw_xchg_i8_unaligned_monotonic(ptr %ptr, i8 %value)
392392
; -O0: subs w8, w8, w10, uxtb
393393
;
394394
; -O1-LABEL: atomicrmw_xchg_i8_unaligned_monotonic:
395-
; -O1: ldxrb w8, [x0]
396-
; -O1: stxrb w9, w1, [x0]
395+
; -O1: ldxrb w0, [x8]
396+
; -O1: stxrb w9, w1, [x8]
397397
%r = atomicrmw xchg ptr %ptr, i8 %value monotonic, align 1
398398
ret i8 %r
399399
}
@@ -407,8 +407,8 @@ define dso_local i8 @atomicrmw_xchg_i8_unaligned_acquire(ptr %ptr, i8 %value) {
407407
; -O0: subs w8, w8, w10, uxtb
408408
;
409409
; -O1-LABEL: atomicrmw_xchg_i8_unaligned_acquire:
410-
; -O1: ldaxrb w8, [x0]
411-
; -O1: stxrb w9, w1, [x0]
410+
; -O1: ldaxrb w0, [x8]
411+
; -O1: stxrb w9, w1, [x8]
412412
%r = atomicrmw xchg ptr %ptr, i8 %value acquire, align 1
413413
ret i8 %r
414414
}
@@ -422,8 +422,8 @@ define dso_local i8 @atomicrmw_xchg_i8_unaligned_release(ptr %ptr, i8 %value) {
422422
; -O0: subs w8, w8, w10, uxtb
423423
;
424424
; -O1-LABEL: atomicrmw_xchg_i8_unaligned_release:
425-
; -O1: ldxrb w8, [x0]
426-
; -O1: stlxrb w9, w1, [x0]
425+
; -O1: ldxrb w0, [x8]
426+
; -O1: stlxrb w9, w1, [x8]
427427
%r = atomicrmw xchg ptr %ptr, i8 %value release, align 1
428428
ret i8 %r
429429
}
@@ -437,8 +437,8 @@ define dso_local i8 @atomicrmw_xchg_i8_unaligned_acq_rel(ptr %ptr, i8 %value) {
437437
; -O0: subs w8, w8, w10, uxtb
438438
;
439439
; -O1-LABEL: atomicrmw_xchg_i8_unaligned_acq_rel:
440-
; -O1: ldaxrb w8, [x0]
441-
; -O1: stlxrb w9, w1, [x0]
440+
; -O1: ldaxrb w0, [x8]
441+
; -O1: stlxrb w9, w1, [x8]
442442
%r = atomicrmw xchg ptr %ptr, i8 %value acq_rel, align 1
443443
ret i8 %r
444444
}
@@ -452,8 +452,8 @@ define dso_local i8 @atomicrmw_xchg_i8_unaligned_seq_cst(ptr %ptr, i8 %value) {
452452
; -O0: subs w8, w8, w10, uxtb
453453
;
454454
; -O1-LABEL: atomicrmw_xchg_i8_unaligned_seq_cst:
455-
; -O1: ldaxrb w8, [x0]
456-
; -O1: stlxrb w9, w1, [x0]
455+
; -O1: ldaxrb w0, [x8]
456+
; -O1: stlxrb w9, w1, [x8]
457457
%r = atomicrmw xchg ptr %ptr, i8 %value seq_cst, align 1
458458
ret i8 %r
459459
}

0 commit comments

Comments
 (0)