Skip to content

[LV] Autovectorization for the all-in-one histogram intrinsic #91458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 44 additions & 8 deletions llvm/include/llvm/Analysis/LoopAccessAnalysis.h
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,9 @@ class MemoryDepChecker {
// on MinDepDistBytes.
BackwardVectorizable,
// Same, but may prevent store-to-load forwarding.
BackwardVectorizableButPreventsForwarding
BackwardVectorizableButPreventsForwarding,
// Access is to a loop loaded value, but is part of a histogram operation.
Histogram
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, IIUC this will change the meaning LoopAccessInfo::canVectorizeMemory to can vectorize memory, if you guarantee to generate histogram intrinsics for all identified histograms?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Would you prefer a second method to indicate that it's safe to vectorize with histograms? Or an argument or flag to allow it?

I've noticed that decision making has been migrating to using the cost model instead of outright rejecting loops in legality or LAA, so structured it that way. We can scalarize the intrinsic for targets without appropriate instructions (implemented in ScalarizeMaskedMemIntrin.cpp), so it seems safe enough for LoopVectorize.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a separate interface (canVectorizeMemoryWithHistogram), along with a flag to enable vectorization of histograms. Currently off by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm but the current version still has to analyze all stores for histograms even if the option is off?. Just trying to see if there are alternatives to avoid pulling some very specific analysis logic into LAA.

IIUC without the histogram analysis, we would have an unknown dependence, that LV could analyze separately if histograms are supported, like other passes like LoopDistribute or LoopLoadeleimination do?

Would be great if it would be possible to support this without needing to increase the complexity of dependence analysis in LAA

};

/// String version of the types.
Expand Down Expand Up @@ -201,7 +203,8 @@ class MemoryDepChecker {
/// Only checks sets with elements in \p CheckDeps.
bool areDepsSafe(DepCandidates &AccessSets, MemAccessInfoList &CheckDeps,
const DenseMap<Value *, SmallVector<const Value *, 16>>
&UnderlyingObjects);
&UnderlyingObjects,
const SmallPtrSetImpl<const Value *> &HistogramPtrs);

/// No memory dependence was encountered that would inhibit
/// vectorization.
Expand Down Expand Up @@ -352,7 +355,8 @@ class MemoryDepChecker {
isDependent(const MemAccessInfo &A, unsigned AIdx, const MemAccessInfo &B,
unsigned BIdx,
const DenseMap<Value *, SmallVector<const Value *, 16>>
&UnderlyingObjects);
&UnderlyingObjects,
const SmallPtrSetImpl<const Value *> &HistogramPtrs);

/// Check whether the data dependence could prevent store-load
/// forwarding.
Expand Down Expand Up @@ -393,7 +397,8 @@ class MemoryDepChecker {
const MemAccessInfo &A, Instruction *AInst, const MemAccessInfo &B,
Instruction *BInst,
const DenseMap<Value *, SmallVector<const Value *, 16>>
&UnderlyingObjects);
&UnderlyingObjects,
const SmallPtrSetImpl<const Value *> &HistogramPtrs);
};

class RuntimePointerChecking;
Expand Down Expand Up @@ -445,6 +450,15 @@ struct PointerDiffInfo {
NeedsFreeze(NeedsFreeze) {}
};

struct HistogramInfo {
LoadInst *Load;
Instruction *Update;
StoreInst *Store;

HistogramInfo(LoadInst *Load, Instruction *Update, StoreInst *Store)
: Load(Load), Update(Update), Store(Store) {}
};

/// Holds information about the memory runtime legality checks to verify
/// that a group of pointers do not overlap.
class RuntimePointerChecking {
Expand Down Expand Up @@ -625,6 +639,13 @@ class RuntimePointerChecking {
/// Checks for both memory dependences and the SCEV predicates contained in the
/// PSE must be emitted in order for the results of this analysis to be valid.
class LoopAccessInfo {
/// Represents whether the memory access dependencies in the loop:
/// * Prohibit vectorization
/// * Allow for vectorization (possibly with runtime checks)
/// * Allow for vectorization (possibly with runtime checks),
/// as long as histogram operations are supported.
enum VecMemPossible { CantVec = 0, NormalVec = 1, HistogramVec = 2 };

public:
LoopAccessInfo(Loop *L, ScalarEvolution *SE, const TargetTransformInfo *TTI,
const TargetLibraryInfo *TLI, AAResults *AA, DominatorTree *DT,
Expand All @@ -636,7 +657,11 @@ class LoopAccessInfo {
/// hasStoreStoreDependenceInvolvingLoopInvariantAddress and
/// hasLoadStoreDependenceInvolvingLoopInvariantAddress also need to be
/// checked.
bool canVectorizeMemory() const { return CanVecMem; }
bool canVectorizeMemory() const { return CanVecMem == NormalVec; }

bool canVectorizeMemoryWithHistogram() const {
return CanVecMem == NormalVec || CanVecMem == HistogramVec;
}

/// Return true if there is a convergent operation in the loop. There may
/// still be reported runtime pointer checks that would be required, but it is
Expand Down Expand Up @@ -664,6 +689,10 @@ class LoopAccessInfo {
unsigned getNumStores() const { return NumStores; }
unsigned getNumLoads() const { return NumLoads;}

const SmallVectorImpl<HistogramInfo> &getHistograms() const {
return Histograms;
}

/// The diagnostics report generated for the analysis. E.g. why we
/// couldn't analyze the loop.
const OptimizationRemarkAnalysis *getReport() const { return Report.get(); }
Expand Down Expand Up @@ -715,8 +744,8 @@ class LoopAccessInfo {
private:
/// Analyze the loop. Returns true if all memory access in the loop can be
/// vectorized.
bool analyzeLoop(AAResults *AA, LoopInfo *LI, const TargetLibraryInfo *TLI,
DominatorTree *DT);
VecMemPossible analyzeLoop(AAResults *AA, LoopInfo *LI,
const TargetLibraryInfo *TLI, DominatorTree *DT);

/// Check if the structure of the loop allows it to be analyzed by this
/// pass.
Expand Down Expand Up @@ -757,7 +786,7 @@ class LoopAccessInfo {
unsigned NumStores = 0;

/// Cache the result of analyzeLoop.
bool CanVecMem = false;
VecMemPossible CanVecMem = CantVec;
bool HasConvergentOp = false;

/// Indicator that there are two non vectorizable stores to the same uniform
Expand All @@ -777,6 +806,13 @@ class LoopAccessInfo {
/// If an access has a symbolic strides, this maps the pointer value to
/// the stride symbol.
DenseMap<Value *, const SCEV *> SymbolicStrides;

/// Holds the load, update, and store instructions for all histogram-style
/// operations found in the loop.
SmallVector<HistogramInfo, 2> Histograms;

/// Storing Histogram Pointers
SmallPtrSet<const Value *, 2> HistogramPtrs;
};

/// Return the SCEV corresponding to a pointer with the symbolic stride
Expand Down
17 changes: 17 additions & 0 deletions llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
Original file line number Diff line number Diff line change
Expand Up @@ -387,6 +387,23 @@ class LoopVectorizationLegality {
unsigned getNumStores() const { return LAI->getNumStores(); }
unsigned getNumLoads() const { return LAI->getNumLoads(); }

std::optional<const HistogramInfo *> getHistogramInfo(Instruction *I) const {
for (const HistogramInfo &HGram : LAI->getHistograms())
if (HGram.Load == I || HGram.Update == I || HGram.Store == I)
return &HGram;

return std::nullopt;
}

std::optional<const HistogramInfo *>
getHistogramForStore(StoreInst *SI) const {
for (const HistogramInfo &HGram : LAI->getHistograms())
if (HGram.Store == SI)
return &HGram;

return std::nullopt;
}

PredicatedScalarEvolution *getPredicatedScalarEvolution() const {
return &PSE;
}
Expand Down
Loading
Loading