Skip to content

[OpenMP 6.0 ]Codegen for Reduction over private variables with reduction clause #134709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
a05af19
Codegen for Reduction over private variables with reduction clause
Apr 7, 2025
4e6eea6
review comment changes incorporated
Apr 8, 2025
18e1708
review comment , removing redundant code
Apr 9, 2025
59ab4be
fix for user-defined reduction op
Apr 10, 2025
e45c30a
Handle user-defined reduction and updated lit test
May 1, 2025
980bc06
conditional checks
May 1, 2025
a103dfa
lit update
May 1, 2025
526314c
Support for UDR for private variables
May 5, 2025
c77fb0e
Implicit reduction identifier fix
May 5, 2025
f202eaa
updated with comments, unified logic and docs
May 7, 2025
9d2370b
Update OpenMPSupport.rst
chandraghale May 7, 2025
0ca2f86
Handle UDR init and updated lit
May 7, 2025
9335af1
multiple reduced value change
May 8, 2025
e1a1998
UDR init logic leveraged from emitInitWithReductionInitializer fn
May 8, 2025
efd69bb
runtime tests
May 9, 2025
c01671e
Update omp_for_private_reduction.cpp
chandraghale May 9, 2025
ad0d2f0
Update omp_for_private_reduction.cpp
chandraghale May 9, 2025
4df2910
update test
May 9, 2025
2468be3
test update
May 9, 2025
9576c87
Resolve mergeconflict rel notes
May 9, 2025
7e324bd
Resolve mergeconflict rel notes
May 9, 2025
262a861
Release notes update
May 9, 2025
a0d29ab
address comments,support all types
May 13, 2025
0c2978c
complex type test for priv redn
May 13, 2025
384cd4a
add addtional complex test
May 14, 2025
76db75a
Merge branch 'main' into codegen_private_variable_reducn
chandraghale May 14, 2025
0b59740
format error fix
chandraghale May 14, 2025
70b7e90
Merge branch 'main' into codegen_private_variable_reducn
chandraghale May 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion clang/docs/OpenMPSupport.rst
Original file line number Diff line number Diff line change
Expand Up @@ -406,7 +406,8 @@ implementation.
+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
| Extensions to atomic construct | :none:`unclaimed` | :none:`unclaimed` | |
+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
| Private reductions | :part:`partial` | :none:`unclaimed` | Parse/Sema:https://github.com/llvm/llvm-project/pull/129938 |
| Private reductions | :good:`mostly` | :none:`unclaimed` | Parse/Sema:https://github.com/llvm/llvm-project/pull/129938 |
| | | | Codegen: https://github.com/llvm/llvm-project/pull/134709 |
+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
| Self maps | :part:`partial` | :none:`unclaimed` | parsing/sema done: https://github.com/llvm/llvm-project/pull/129888 |
+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
Expand Down
1 change: 1 addition & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -950,6 +950,7 @@ OpenMP Support
open parenthesis. (#GH139665)
- An error is now emitted when OpenMP ``collapse`` and ``ordered`` clauses have
an argument larger than what can fit within a 64-bit integer.
- Added support for private variable reduction.

Improvements
^^^^^^^^^^^^
Expand Down
245 changes: 244 additions & 1 deletion clang/lib/CodeGen/CGOpenMPRuntime.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4898,6 +4898,234 @@ void CGOpenMPRuntime::emitSingleReductionCombiner(CodeGenFunction &CGF,
}
}

void CGOpenMPRuntime::emitPrivateReduction(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add the comments, describing the logic of the function.
  2. How does it differ from the regular reductions codegen? Can you try to unify the logic to reduce the maintenance cost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Updated the code with comments and clarified the logic
  • Updated OpenMPSupport.rst and the release notes.
  • Unified the logic as much as possible.
  • In regular reduction codegen, threads update a shared variable directly. With private reduction, each thread first works with its own private copy, and then all these partial results are combined into a shared variable. The final result is copied back to each thread’s original variable

CodeGenFunction &CGF, SourceLocation Loc, const Expr *Privates,
const Expr *LHSExprs, const Expr *RHSExprs, const Expr *ReductionOps) {

// Create a shared global variable (__shared_reduction_var) to accumulate the
// final result.
//
// Call __kmpc_barrier to synchronize threads before initialization.
//
// The master thread (thread_id == 0) initializes __shared_reduction_var
// with the identity value or initializer.
//
// Call __kmpc_barrier to synchronize before combining.
// For each i:
// - Thread enters critical section.
// - Reads its private value from LHSExprs[i].
// - Updates __shared_reduction_var[i] = RedOp_i(__shared_reduction_var[i],
// LHSExprs[i]).
// - Exits critical section.
//
// Call __kmpc_barrier after combining.
//
// Each thread copies __shared_reduction_var[i] back to LHSExprs[i].
//
// Final __kmpc_barrier to synchronize after broadcasting
QualType PrivateType = Privates->getType();
llvm::Type *LLVMType = CGF.ConvertTypeForMem(PrivateType);

const OMPDeclareReductionDecl *UDR = getReductionInit(ReductionOps);
std::string ReductionVarNameStr;
if (const auto *DRE = dyn_cast<DeclRefExpr>(Privates->IgnoreParenCasts()))
ReductionVarNameStr = DRE->getDecl()->getNameAsString();
else
ReductionVarNameStr = "unnamed_priv_var";

// Create an internal shared variable
std::string SharedName =
CGM.getOpenMPRuntime().getName({"internal_pivate_", ReductionVarNameStr});
llvm::GlobalVariable *SharedVar = new llvm::GlobalVariable(
CGM.getModule(), LLVMType, false, llvm::GlobalValue::InternalLinkage,
llvm::Constant::getNullValue(LLVMType), ".omp.reduction." + SharedName,
nullptr, llvm::GlobalVariable::NotThreadLocal);

SharedVar->setAlignment(
llvm::MaybeAlign(CGF.getContext().getTypeAlign(PrivateType) / 8));

Address SharedResult(SharedVar, SharedVar->getValueType(),
CGF.getContext().getTypeAlignInChars(PrivateType));

llvm::Value *ThreadId = getThreadID(CGF, Loc);
llvm::Value *BarrierLoc = emitUpdateLocation(CGF, Loc, OMP_ATOMIC_REDUCE);
llvm::Value *BarrierArgs[] = {BarrierLoc, ThreadId};

llvm::BasicBlock *InitBB = CGF.createBasicBlock("init");
llvm::BasicBlock *InitEndBB = CGF.createBasicBlock("init.end");

llvm::Value *IsWorker = CGF.Builder.CreateICmpEQ(
ThreadId, llvm::ConstantInt::get(ThreadId->getType(), 0));
CGF.Builder.CreateCondBr(IsWorker, InitBB, InitEndBB);

CGF.EmitBlock(InitBB);

auto EmitSharedInit = [&]() {
if (UDR) { // Check if it's a User-Defined Reduction
if (const Expr *UDRInitExpr = UDR->getInitializer()) {
std::pair<llvm::Function *, llvm::Function *> FnPair =
getUserDefinedReduction(UDR);
llvm::Function *InitializerFn = FnPair.second;
if (InitializerFn) {
if (const auto *CE =
dyn_cast<CallExpr>(UDRInitExpr->IgnoreParenImpCasts())) {
const auto *OutDRE = cast<DeclRefExpr>(
cast<UnaryOperator>(CE->getArg(0)->IgnoreParenImpCasts())
->getSubExpr());
const VarDecl *OutVD = cast<VarDecl>(OutDRE->getDecl());

CodeGenFunction::OMPPrivateScope LocalScope(CGF);
LocalScope.addPrivate(OutVD, SharedResult);

(void)LocalScope.Privatize();
if (const auto *OVE = dyn_cast<OpaqueValueExpr>(
CE->getCallee()->IgnoreParenImpCasts())) {
CodeGenFunction::OpaqueValueMapping OpaqueMap(
CGF, OVE, RValue::get(InitializerFn));
CGF.EmitIgnoredExpr(CE);
} else {
CGF.EmitAnyExprToMem(UDRInitExpr, SharedResult,
PrivateType.getQualifiers(), true);
}
} else {
CGF.EmitAnyExprToMem(UDRInitExpr, SharedResult,
PrivateType.getQualifiers(), true);
}
} else {
CGF.EmitAnyExprToMem(UDRInitExpr, SharedResult,
PrivateType.getQualifiers(), true);
}
} else {
// EmitNullInitialization handles default construction for C++ classes
// and zeroing for scalars, which is a reasonable default.
CGF.EmitNullInitialization(SharedResult, PrivateType);
}
return; // UDR initialization handled
}
if (const auto *DRE = dyn_cast<DeclRefExpr>(Privates)) {
if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
if (const Expr *InitExpr = VD->getInit()) {
CGF.EmitAnyExprToMem(InitExpr, SharedResult,
PrivateType.getQualifiers(), true);
return;
}
}
}
CGF.EmitNullInitialization(SharedResult, PrivateType);
};
EmitSharedInit();
CGF.Builder.CreateBr(InitEndBB);
CGF.EmitBlock(InitEndBB);

CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_barrier),
BarrierArgs);

const Expr *ReductionOp = ReductionOps;
const OMPDeclareReductionDecl *CurrentUDR = getReductionInit(ReductionOp);
LValue SharedLV = CGF.MakeAddrLValue(SharedResult, PrivateType);
LValue LHSLV = CGF.EmitLValue(LHSExprs);

auto EmitCriticalReduction = [&](auto ReductionGen) {
std::string CriticalName = getName({"reduction_critical"});
emitCriticalRegion(CGF, CriticalName, ReductionGen, Loc);
};

if (CurrentUDR) {
// Handle user-defined reduction.
auto ReductionGen = [&](CodeGenFunction &CGF, PrePostActionTy &Action) {
Action.Enter(CGF);
std::pair<llvm::Function *, llvm::Function *> FnPair =
getUserDefinedReduction(CurrentUDR);
if (FnPair.first) {
if (const auto *CE = dyn_cast<CallExpr>(ReductionOp)) {
const auto *OutDRE = cast<DeclRefExpr>(
cast<UnaryOperator>(CE->getArg(0)->IgnoreParenImpCasts())
->getSubExpr());
const auto *InDRE = cast<DeclRefExpr>(
cast<UnaryOperator>(CE->getArg(1)->IgnoreParenImpCasts())
->getSubExpr());
CodeGenFunction::OMPPrivateScope LocalScope(CGF);
LocalScope.addPrivate(cast<VarDecl>(OutDRE->getDecl()),
SharedLV.getAddress());
LocalScope.addPrivate(cast<VarDecl>(InDRE->getDecl()),
LHSLV.getAddress());
(void)LocalScope.Privatize();
emitReductionCombiner(CGF, ReductionOp);
}
}
};
EmitCriticalReduction(ReductionGen);
} else {
// Handle built-in reduction operations.
#ifndef NDEBUG
const Expr *ReductionClauseExpr = ReductionOp->IgnoreParenCasts();
if (const auto *Cleanup = dyn_cast<ExprWithCleanups>(ReductionClauseExpr))
ReductionClauseExpr = Cleanup->getSubExpr()->IgnoreParenCasts();

const Expr *AssignRHS = nullptr;
if (const auto *BinOp = dyn_cast<BinaryOperator>(ReductionClauseExpr)) {
if (BinOp->getOpcode() == BO_Assign)
AssignRHS = BinOp->getRHS();
} else if (const auto *OpCall =
dyn_cast<CXXOperatorCallExpr>(ReductionClauseExpr)) {
if (OpCall->getOperator() == OO_Equal)
AssignRHS = OpCall->getArg(1);
}

assert(AssignRHS &&
"Private Variable Reduction : Invalid ReductionOp expression");
#endif

auto ReductionGen = [&](CodeGenFunction &CGF, PrePostActionTy &Action) {
Action.Enter(CGF);
const auto *OmpOutDRE =
dyn_cast<DeclRefExpr>(LHSExprs->IgnoreParenImpCasts());
const auto *OmpInDRE =
dyn_cast<DeclRefExpr>(RHSExprs->IgnoreParenImpCasts());
if (!OmpOutDRE || !OmpInDRE)
return;
const VarDecl *OmpOutVD = cast<VarDecl>(OmpOutDRE->getDecl());
const VarDecl *OmpInVD = cast<VarDecl>(OmpInDRE->getDecl());
CodeGenFunction::OMPPrivateScope LocalScope(CGF);
LocalScope.addPrivate(OmpOutVD, SharedLV.getAddress());
LocalScope.addPrivate(OmpInVD, LHSLV.getAddress());
(void)LocalScope.Privatize();
// Emit the actual reduction operation
CGF.EmitIgnoredExpr(ReductionOp);
};
EmitCriticalReduction(ReductionGen);
}

CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_barrier),
BarrierArgs);

// Broadcast final result
bool IsAggregate = PrivateType->isAggregateType();
LValue SharedLV1 = CGF.MakeAddrLValue(SharedResult, PrivateType);
llvm::Value *FinalResultVal = nullptr;
Address FinalResultAddr = Address::invalid();

if (IsAggregate)
FinalResultAddr = SharedResult;
else
FinalResultVal = CGF.EmitLoadOfScalar(SharedLV1, Loc);

LValue TargetLHSLV = CGF.EmitLValue(LHSExprs);
if (IsAggregate) {
CGF.EmitAggregateCopy(TargetLHSLV,
CGF.MakeAddrLValue(FinalResultAddr, PrivateType),
PrivateType, AggValueSlot::DoesNotOverlap, false);
} else {
CGF.EmitStoreOfScalar(FinalResultVal, TargetLHSLV);
}
// Final synchronization barrier
CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_barrier),
BarrierArgs);
}

void CGOpenMPRuntime::emitReduction(CodeGenFunction &CGF, SourceLocation Loc,
ArrayRef<const Expr *> Privates,
ArrayRef<const Expr *> LHSExprs,
Expand Down Expand Up @@ -5153,7 +5381,7 @@ void CGOpenMPRuntime::emitReduction(CodeGenFunction &CGF, SourceLocation Loc,
} else {
// Emit as a critical region.
auto &&CritRedGen = [E, Loc](CodeGenFunction &CGF, const Expr *,
const Expr *, const Expr *) {
const Expr *, const Expr *) {
CGOpenMPRuntime &RT = CGF.CGM.getOpenMPRuntime();
std::string Name = RT.getName({"atomic_reduction"});
RT.emitCriticalRegion(
Expand Down Expand Up @@ -5200,6 +5428,21 @@ void CGOpenMPRuntime::emitReduction(CodeGenFunction &CGF, SourceLocation Loc,

CGF.EmitBranch(DefaultBB);
CGF.EmitBlock(DefaultBB, /*IsFinished=*/true);
assert(!LHSExprs.empty() && "PrivateVarReduction: LHSExprs is empty");
assert(!Privates.empty() && "PrivateVarReduction: Privates is empty");
assert(!ReductionOps.empty() && "PrivateVarReduction: ReductionOps is empty");
assert(LHSExprs.size() == Privates.size() &&
"PrivateVarReduction: Privates size mismatch");
assert(LHSExprs.size() == ReductionOps.size() &&
"PrivateVarReduction: ReductionOps size mismatch");
assert(LHSExprs.size() == Options.IsPrivateVarReduction.size() &&
"PrivateVarReduction: IsPrivateVarReduction size mismatch");
for (unsigned I :
llvm::seq<unsigned>(std::min(ReductionOps.size(), LHSExprs.size()))) {
if (Options.IsPrivateVarReduction[I])
emitPrivateReduction(CGF, Loc, Privates[I], LHSExprs[I], RHSExprs[I],
ReductionOps[I]);
}
}

/// Generates unique name for artificial threadprivate variables.
Expand Down
12 changes: 12 additions & 0 deletions clang/lib/CodeGen/CGOpenMPRuntime.h
Original file line number Diff line number Diff line change
Expand Up @@ -1201,8 +1201,20 @@ class CGOpenMPRuntime {
struct ReductionOptionsTy {
bool WithNowait;
bool SimpleReduction;
llvm::SmallVector<bool, 8> IsPrivateVarReduction;
OpenMPDirectiveKind ReductionKind;
};

/// Emits code for private variable reduction
/// \param Privates List of private copies for original reduction arguments.
/// \param LHSExprs List of LHS in \a ReductionOps reduction operations.
/// \param RHSExprs List of RHS in \a ReductionOps reduction operations.
/// \param ReductionOps List of reduction operations in form 'LHS binop RHS'
/// or 'operator binop(LHS, RHS)'.
void emitPrivateReduction(CodeGenFunction &CGF, SourceLocation Loc,
const Expr *Privates, const Expr *LHSExprs,
const Expr *RHSExprs, const Expr *ReductionOps);

/// Emit a code for reduction clause. Next code should be emitted for
/// reduction:
/// \code
Expand Down
11 changes: 8 additions & 3 deletions clang/lib/CodeGen/CGStmtOpenMP.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1471,6 +1471,7 @@ void CodeGenFunction::EmitOMPReductionClauseFinal(
llvm::SmallVector<const Expr *, 8> LHSExprs;
llvm::SmallVector<const Expr *, 8> RHSExprs;
llvm::SmallVector<const Expr *, 8> ReductionOps;
llvm::SmallVector<bool, 8> IsPrivateVarReduction;
bool HasAtLeastOneReduction = false;
bool IsReductionWithTaskMod = false;
for (const auto *C : D.getClausesOfKind<OMPReductionClause>()) {
Expand All @@ -1481,6 +1482,8 @@ void CodeGenFunction::EmitOMPReductionClauseFinal(
Privates.append(C->privates().begin(), C->privates().end());
LHSExprs.append(C->lhs_exprs().begin(), C->lhs_exprs().end());
RHSExprs.append(C->rhs_exprs().begin(), C->rhs_exprs().end());
IsPrivateVarReduction.append(C->private_var_reduction_flags().begin(),
C->private_var_reduction_flags().end());
ReductionOps.append(C->reduction_ops().begin(), C->reduction_ops().end());
IsReductionWithTaskMod =
IsReductionWithTaskMod || C->getModifier() == OMPC_REDUCTION_task;
Expand All @@ -1502,7 +1505,7 @@ void CodeGenFunction::EmitOMPReductionClauseFinal(
// parallel directive (it always has implicit barrier).
CGM.getOpenMPRuntime().emitReduction(
*this, D.getEndLoc(), Privates, LHSExprs, RHSExprs, ReductionOps,
{WithNowait, SimpleReduction, ReductionKind});
{WithNowait, SimpleReduction, IsPrivateVarReduction, ReductionKind});
}
}

Expand Down Expand Up @@ -3943,7 +3946,8 @@ static void emitScanBasedDirective(
PrivScope.Privatize();
CGF.CGM.getOpenMPRuntime().emitReduction(
CGF, S.getEndLoc(), Privates, LHSs, RHSs, ReductionOps,
{/*WithNowait=*/true, /*SimpleReduction=*/true, OMPD_unknown});
{/*WithNowait=*/true, /*SimpleReduction=*/true,
/*IsPrivateVarReduction*/ {false}, OMPD_unknown});
}
llvm::Value *NextIVal =
CGF.Builder.CreateNUWSub(IVal, llvm::ConstantInt::get(CGF.SizeTy, 1));
Expand Down Expand Up @@ -5748,7 +5752,8 @@ void CodeGenFunction::EmitOMPScanDirective(const OMPScanDirective &S) {
}
CGM.getOpenMPRuntime().emitReduction(
*this, ParentDir.getEndLoc(), Privates, LHSs, RHSs, ReductionOps,
{/*WithNowait=*/true, /*SimpleReduction=*/true, OMPD_simd});
{/*WithNowait=*/true, /*SimpleReduction=*/true,
/*IsPrivateVarReduction*/ {false}, OMPD_simd});
for (unsigned I = 0, E = CopyArrayElems.size(); I < E; ++I) {
const Expr *PrivateExpr = Privates[I];
LValue DestLVal;
Expand Down
Loading