Skip to content

Commit e6c145e

Browse files
committed
[DAGCombiner] widen zext of popcount based on target support
zext (ctpop X) --> ctpop (zext X) This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688: https://bugs.llvm.org/show_bug.cgi?id=43688 I'm not sure if any other targets are affected, but I found a missing fold for PPC, so added tests based on that. The reason we widen all the way to 64-bit in these tests is because the initial DAG looks something like this: t5: i8 = ctpop t4 t6: i32 = zero_extend t5 <-- created based on IR, but unused node? t7: i64 = zero_extend t5 Differential Revision: https://reviews.llvm.org/D69127
1 parent c35b358 commit e6c145e

File tree

2 files changed

+18
-9
lines changed

2 files changed

+18
-9
lines changed

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9921,6 +9921,18 @@ SDValue DAGCombiner::visitZERO_EXTEND(SDNode *N) {
99219921
if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
99229922
return NewVSel;
99239923

9924+
// If the target does not support a pop-count in the narrow source type but
9925+
// does support it in the destination type, widen the pop-count to this type:
9926+
// zext (ctpop X) --> ctpop (zext X)
9927+
// TODO: Generalize this to handle starting from anyext.
9928+
if (N0.getOpcode() == ISD::CTPOP && N0.hasOneUse() &&
9929+
!TLI.isOperationLegalOrCustom(ISD::CTPOP, N0.getValueType()) &&
9930+
TLI.isOperationLegalOrCustom(ISD::CTPOP, VT)) {
9931+
SDLoc DL(N);
9932+
SDValue NewZext = DAG.getZExtOrTrunc(N0.getOperand(0), DL, VT);
9933+
return DAG.getNode(ISD::CTPOP, DL, VT, NewZext);
9934+
}
9935+
99249936
return SDValue();
99259937
}
99269938

llvm/test/CodeGen/PowerPC/popcnt-zext.ll

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,8 @@ define i16 @zpop_i8_i16(i8 %x) {
4141
define i16 @popz_i8_i16(i8 %x) {
4242
; FAST-LABEL: popz_i8_i16:
4343
; FAST: # %bb.0:
44-
; FAST-NEXT: rlwinm 3, 3, 0, 24, 31
45-
; FAST-NEXT: popcntw 3, 3
46-
; FAST-NEXT: clrldi 3, 3, 32
44+
; FAST-NEXT: clrldi 3, 3, 56
45+
; FAST-NEXT: popcntd 3, 3
4746
; FAST-NEXT: blr
4847
;
4948
; SLOW-LABEL: popz_i8_i16:
@@ -114,9 +113,8 @@ define i32 @zpop_i8_i32(i8 %x) {
114113
define i32 @popz_i8_32(i8 %x) {
115114
; FAST-LABEL: popz_i8_32:
116115
; FAST: # %bb.0:
117-
; FAST-NEXT: rlwinm 3, 3, 0, 24, 31
118-
; FAST-NEXT: popcntw 3, 3
119-
; FAST-NEXT: clrldi 3, 3, 32
116+
; FAST-NEXT: clrldi 3, 3, 56
117+
; FAST-NEXT: popcntd 3, 3
120118
; FAST-NEXT: blr
121119
;
122120
; SLOW-LABEL: popz_i8_32:
@@ -187,9 +185,8 @@ define i32 @zpop_i16_i32(i16 %x) {
187185
define i32 @popz_i16_32(i16 %x) {
188186
; FAST-LABEL: popz_i16_32:
189187
; FAST: # %bb.0:
190-
; FAST-NEXT: rlwinm 3, 3, 0, 16, 31
191-
; FAST-NEXT: popcntw 3, 3
192-
; FAST-NEXT: clrldi 3, 3, 32
188+
; FAST-NEXT: clrldi 3, 3, 48
189+
; FAST-NEXT: popcntd 3, 3
193190
; FAST-NEXT: blr
194191
;
195192
; SLOW-LABEL: popz_i16_32:

0 commit comments

Comments
 (0)