-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[StaticDataLayout][PGO] Add profile format for static data layout, and the classes to operate on the profiles. #138170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
6cd7d8d
4727529
80249bc
b69c993
6fe9b48
df08094
6dd04e4
4b25d67
2ecc621
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
//===- DataAccessProf.h - Data access profile format support ---------*- C++ | ||
//-*-===// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// This file contains support to construct and use data access profiles. | ||
// | ||
// For the original RFC of this pass please see | ||
// https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744 | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#ifndef LLVM_PROFILEDATA_DATAACCESSPROF_H_ | ||
#define LLVM_PROFILEDATA_DATAACCESSPROF_H_ | ||
|
||
#include "llvm/ADT/DenseMap.h" | ||
#include "llvm/ADT/DenseMapInfoVariant.h" | ||
#include "llvm/ADT/MapVector.h" | ||
#include "llvm/ADT/STLExtras.h" | ||
#include "llvm/ADT/SetVector.h" | ||
#include "llvm/ADT/SmallVector.h" | ||
#include "llvm/ADT/StringRef.h" | ||
#include "llvm/ProfileData/InstrProf.h" | ||
#include "llvm/Support/Allocator.h" | ||
#include "llvm/Support/Error.h" | ||
#include "llvm/Support/StringSaver.h" | ||
|
||
#include <cstdint> | ||
#include <variant> | ||
|
||
namespace llvm { | ||
|
||
namespace data_access_prof { | ||
// The location of data in the source code. | ||
struct DataLocation { | ||
// The filename where the data is located. | ||
StringRef FileName; | ||
// The line number in the source code. | ||
uint32_t Line; | ||
}; | ||
|
||
// The data access profiles for a symbol. | ||
struct DataAccessProfRecord { | ||
// Represents a data symbol. The semantic comes in two forms: a symbol index | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When would the different forms be used? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The semantic of this field depends on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understood that, but my question was more about why in practice some would be string literals and some would be hashes. Might be useful to note this in a comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This makes sense. Added comment at L55 to explain why two forms are used. |
||
// for symbol name if `IsStringLiteral` is false, or the hash of a string | ||
// content if `IsStringLiteral` is true. Required. | ||
uint64_t SymbolID; | ||
mingmingl-llvm marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a little confusing that SymbolID is a different thing (a type) in the following class. Suggest making these different. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed. I renamed the type |
||
|
||
// The access count of symbol. Required. | ||
uint64_t AccessCount; | ||
|
||
// True iff this is a record for string literal (symbols with name pattern | ||
// `.str.*` in the symbol table). Required. | ||
bool IsStringLiteral; | ||
|
||
// The locations of data in the source code. Optional. | ||
llvm::SmallVector<DataLocation> Locations; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC, we can have many instances of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'd expect that many records have small single-digit number of locations, and definitely within the range of Done. |
||
}; | ||
|
||
/// Encapsulates the data access profile data and the methods to operate on it. | ||
/// This class provides profile look-up, serialization and deserialization. | ||
class DataAccessProfData { | ||
public: | ||
// SymbolID is either a string representing symbol name, or a uint64_t | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to question for above class - what determines which type is used? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The type is determined by the symbol itself. This variant is |
||
// representing the content hash of a string literal. | ||
using SymbolID = std::variant<StringRef, uint64_t>; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the use of StringRef, is there a risk of use-after-free error (if the caller passes in a Ref to short lived string)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good question. This class keeps owned copy of strings (inside There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably add a comment here and in the below type that the StringRef is owned by this class's saver object. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also fixed a potential use-after-free bug in L69 with |
||
using StringToIndexMap = llvm::MapVector<StringRef, uint64_t>; | ||
|
||
DataAccessProfData() : saver(Allocator) {} | ||
|
||
/// Serialize profile data to the output stream. | ||
/// Storage layout: | ||
/// - Serialized strings. | ||
/// - The encoded hashes. | ||
/// - Records. | ||
Error serialize(ProfOStream &OS) const; | ||
|
||
/// Deserialize this class from the given buffer. | ||
Error deserialize(const unsigned char *&Ptr); | ||
|
||
/// Returns a pointer of profile record for \p SymbolID, or nullptr if there | ||
/// isn't a record. Internally, this function will canonicalize the symbol | ||
/// name before the lookup. | ||
const DataAccessProfRecord *getProfileRecord(const SymbolID SymID) const; | ||
|
||
/// Returns true if \p SymID is seen in profiled binaries and cold. | ||
bool isKnownColdSymbol(const SymbolID SymID) const; | ||
|
||
/// Methods to add symbolized data access profile. Returns error if duplicated | ||
/// symbol names or content hashes are seen. The user of this class should | ||
/// aggregate counters that corresponds to the same symbol name or with the | ||
mingmingl-llvm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// same string literal hash before calling 'add*' methods. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 'add*' and 'set*' methods? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
Error addSymbolizedDataAccessProfile(SymbolID SymbolID, uint64_t AccessCount); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shorten the name --> addDataAccessProfile There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given the semantics of the method, may be change the name to 'setDataAccessProfile' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
Error addSymbolizedDataAccessProfile( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. document location parameter. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
SymbolID SymbolID, uint64_t AccessCount, | ||
const llvm::SmallVector<DataLocation> &Locations); | ||
Error addKnownSymbolWithoutSamples(SymbolID SymbolID); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it the same as setting the symbol access count to zero? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, and the idea is similar to SamplePGO's function symbol list. |
||
|
||
/// Returns a iterable StringRef for strings in the order they are added. | ||
mingmingl-llvm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
auto getStrings() const { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Prefer spelling out the type instead of auto for the return type. Same for the one below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://google.github.io/styleguide/cppguide.html#Type_deduction also favors spelling out the return type [1], so did it for the return type below. For this function, updated it to return an ArrayRef of
[1] Around Return type deduction, it has |
||
ArrayRef<std::pair<StringRef, uint64_t>> RefSymbolNames( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use SymbolID instead of the std::pair for consistency and simplicity There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The updated change used new |
||
StrToIndexMap.begin(), StrToIndexMap.end()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you use your new MapVector::getArrayRef here instead? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes! done. |
||
return llvm::make_first_range(RefSymbolNames); | ||
} | ||
|
||
/// Returns array reference for various internal data structures. | ||
inline ArrayRef< | ||
mingmingl-llvm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
std::pair<std::variant<StringRef, uint64_t>, DataAccessProfRecord>> | ||
getRecords() const { | ||
return Records.getArrayRef(); | ||
} | ||
inline ArrayRef<StringRef> getKnownColdSymbols() const { | ||
mingmingl-llvm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return KnownColdSymbols.getArrayRef(); | ||
} | ||
inline ArrayRef<uint64_t> getKnownColdHashes() const { | ||
mingmingl-llvm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return KnownColdHashes.getArrayRef(); | ||
} | ||
|
||
private: | ||
/// Serialize the symbol strings into the output stream. | ||
Error serializeStrings(ProfOStream &OS) const; | ||
|
||
/// Deserialize the symbol strings from \p Ptr and increment \p Ptr to the | ||
/// start of the next payload. | ||
Error deserializeStrings(const unsigned char *&Ptr, | ||
const uint64_t NumSampledSymbols, | ||
uint64_t NumColdKnownSymbols); | ||
|
||
/// Decode the records and increment \p Ptr to the start of the next payload. | ||
Error deserializeRecords(const unsigned char *&Ptr); | ||
|
||
/// A helper function to compute a storage index for \p SymbolID. | ||
uint64_t getEncodedIndex(const SymbolID SymbolID) const; | ||
|
||
// `Records` stores the records and `SymbolToRecordIndex` maps a symbol ID to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see SymbolToRecordIndex defined anywhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. whoops, Originally I had
as opposed to Later I realized that it's simpler to use the latter and add |
||
// its record index. | ||
MapVector<SymbolID, DataAccessProfRecord> Records; | ||
|
||
// Use MapVector to keep input order of strings for serialization and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Move this comment to the typedef? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
// deserialization. | ||
StringToIndexMap StrToIndexMap; | ||
llvm::SetVector<uint64_t> KnownColdHashes; | ||
llvm::SetVector<StringRef> KnownColdSymbols; | ||
// Keeps owned copies of the input strings. | ||
llvm::BumpPtrAllocator Allocator; | ||
llvm::UniqueStringSaver saver; | ||
}; | ||
|
||
} // namespace data_access_prof | ||
} // namespace llvm | ||
|
||
#endif // LLVM_PROFILEDATA_DATAACCESSPROF_H_ |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -357,6 +357,12 @@ void createPGONameMetadata(GlobalObject &GO, StringRef PGOName); | |
/// the duplicated profile variables for Comdat functions. | ||
bool needsComdatForCounter(const GlobalObject &GV, const Module &M); | ||
|
||
/// \c NameStrings is a string composed of one of more possibly encoded | ||
mingmingl-llvm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// sub-strings. The substrings are separated by 0 or more zero bytes. This | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe "null" bytes? the "0" and "zero" are confusing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done by mentioning the substrings are separated by |
||
/// method decodes the string and calls `NameCallback` for each substring. | ||
Error readAndDecodeStrings(StringRef NameStrings, | ||
std::function<Error(StringRef)> NameCallback); | ||
|
||
/// An enum describing the attributes of an instrumented profile. | ||
enum class InstrProfKind { | ||
Unknown = 0x0, | ||
|
@@ -493,6 +499,11 @@ class InstrProfSymtab { | |
public: | ||
using AddrHashMap = std::vector<std::pair<uint64_t, uint64_t>>; | ||
|
||
// Returns the canonial name of the given PGOName. In a canonical name, all | ||
snehasish marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// suffixes that begins with "." except ".__uniq." are stripped. | ||
// FIXME: Unify this with `FunctionSamples::getCanonicalFnName`. | ||
static StringRef getCanonicalName(StringRef PGOName); | ||
|
||
private: | ||
using AddrIntervalMap = | ||
IntervalMap<uint64_t, uint64_t, 4, IntervalMapHalfOpenInfo<uint64_t>>; | ||
|
@@ -528,11 +539,6 @@ class InstrProfSymtab { | |
|
||
static StringRef getExternalSymbol() { return "** External Symbol **"; } | ||
|
||
// Returns the canonial name of the given PGOName. In a canonical name, all | ||
// suffixes that begins with "." except ".__uniq." are stripped. | ||
// FIXME: Unify this with `FunctionSamples::getCanonicalFnName`. | ||
static StringRef getCanonicalName(StringRef PGOName); | ||
|
||
// Add the function into the symbol table, by creating the following | ||
// map entries: | ||
// name-set = {PGOFuncName} union {getCanonicalName(PGOFuncName)} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
add_llvm_component_library(LLVMProfileData | ||
DataAccessProf.cpp | ||
GCOV.cpp | ||
IndexedMemProfData.cpp | ||
InstrProf.cpp | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably put this into a separate prep patch with a test in llvm/unittests/ADT/MapVectorTest.cpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in #138726