-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[StaticDataLayout][PGO] Add profile format for static data layout, and the classes to operate on the profiles. #138170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
6cd7d8d
4727529
80249bc
b69c993
6fe9b48
df08094
6dd04e4
4b25d67
2ecc621
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
//===- DataAccessProf.h - Data access profile format support ---------*- C++ | ||
//-*-===// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// This file contains support to construct and use data access profiles. | ||
// | ||
// For the original RFC of this pass please see | ||
// https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744 | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#ifndef LLVM_PROFILEDATA_DATAACCESSPROF_H_ | ||
#define LLVM_PROFILEDATA_DATAACCESSPROF_H_ | ||
|
||
#include "llvm/ADT/DenseMap.h" | ||
#include "llvm/ADT/DenseMapInfoVariant.h" | ||
#include "llvm/ADT/MapVector.h" | ||
#include "llvm/ADT/STLExtras.h" | ||
#include "llvm/ADT/SetVector.h" | ||
#include "llvm/ADT/SmallVector.h" | ||
#include "llvm/ADT/StringRef.h" | ||
#include "llvm/ProfileData/InstrProf.h" | ||
#include "llvm/Support/Allocator.h" | ||
#include "llvm/Support/Error.h" | ||
#include "llvm/Support/StringSaver.h" | ||
|
||
#include <cstdint> | ||
#include <variant> | ||
|
||
namespace llvm { | ||
|
||
namespace data_access_prof { | ||
// The location of data in the source code. | ||
struct DataLocation { | ||
// The filename where the data is located. | ||
StringRef FileName; | ||
// The line number in the source code. | ||
uint32_t Line; | ||
}; | ||
|
||
// The data access profiles for a symbol. | ||
struct DataAccessProfRecord { | ||
DataAccessProfRecord(uint64_t SymbolID, uint64_t AccessCount, | ||
bool IsStringLiteral) | ||
: SymbolID(SymbolID), AccessCount(AccessCount), | ||
IsStringLiteral(IsStringLiteral) {} | ||
|
||
// Represents a data symbol. The semantic comes in two forms: a symbol index | ||
// for symbol name if `IsStringLiteral` is false, or the hash of a string | ||
// content if `IsStringLiteral` is true. For most of the symbolizable static | ||
// data, the mangled symbol names remain stable relative to the source code | ||
// and therefore used to identify symbols across binary releases. String | ||
// literals have unstable name patterns like `.str.N[.llvm.hash]`, so we use | ||
// the content hash instead. This is a required field. | ||
uint64_t SymbolID; | ||
mingmingl-llvm marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a little confusing that SymbolID is a different thing (a type) in the following class. Suggest making these different. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed. I renamed the type |
||
|
||
// The access count of symbol. Required. | ||
uint64_t AccessCount; | ||
|
||
// True iff this is a record for string literal (symbols with name pattern | ||
// `.str.*` in the symbol table). Required. | ||
bool IsStringLiteral; | ||
|
||
// The locations of data in the source code. Optional. | ||
llvm::SmallVector<DataLocation, 0> Locations; | ||
}; | ||
|
||
/// Encapsulates the data access profile data and the methods to operate on it. | ||
/// This class provides profile look-up, serialization and deserialization. | ||
class DataAccessProfData { | ||
public: | ||
// SymbolID is either a string representing symbol name if the symbol has | ||
// stable mangled name relative to source code, or a uint64_t representing the | ||
// content hash of a string literal (with unstable name patterns like | ||
// `.str.N[.llvm.hash]`). The StringRef is owned by the class's saver object. | ||
using SymbolHandle = std::variant<StringRef, uint64_t>; | ||
using StringToIndexMap = llvm::MapVector<StringRef, uint64_t>; | ||
|
||
DataAccessProfData() : Saver(Allocator) {} | ||
|
||
/// Serialize profile data to the output stream. | ||
/// Storage layout: | ||
/// - Serialized strings. | ||
/// - The encoded hashes. | ||
/// - Records. | ||
Error serialize(ProfOStream &OS) const; | ||
|
||
/// Deserialize this class from the given buffer. | ||
Error deserialize(const unsigned char *&Ptr); | ||
|
||
/// Returns a pointer of profile record for \p SymbolID, or nullptr if there | ||
/// isn't a record. Internally, this function will canonicalize the symbol | ||
/// name before the lookup. | ||
const DataAccessProfRecord *getProfileRecord(const SymbolHandle SymID) const; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know the future intended usage but handing out a pointer to a record makes it hard to reason about the lifetime. How can we prevent users from accidentally introducing bugs? In practice I think the user will query the record for a symbol and then take some action based on the record. So usually 1 (or a small number of) DataAccessProfRecord will be live at a time. If so we can inline the contents (the string for the location etc) to avoid any lifetime issues. This will also allow us to move the optimized implementation out of the public api. Wdyt? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the look-up API to return a record with owned strings (i.e., of type |
||
|
||
/// Returns true if \p SymID is seen in profiled binaries and cold. | ||
bool isKnownColdSymbol(const SymbolHandle SymID) const; | ||
|
||
/// Methods to set symbolized data access profile. Returns error if duplicated | ||
/// symbol names or content hashes are seen. The user of this class should | ||
/// aggregate counters that correspond to the same symbol name or with the | ||
/// same string literal hash before calling 'set*' methods. | ||
Error setDataAccessProfile(SymbolHandle SymbolID, uint64_t AccessCount); | ||
/// Similar to the method above, for records with \p Locations representing | ||
/// the `filename:line` where this symbol shows up. Note because of linker's | ||
/// merge of identical symbols (e.g., unnamed_addr string literals), one | ||
/// symbol is likely to have multiple locations. | ||
Error setDataAccessProfile(SymbolHandle SymbolID, uint64_t AccessCount, | ||
ArrayRef<DataLocation> Locations); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In actual usage (outside of the existing tests) will we need to accept an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The source location of a global variable is recorded in That said, it's more general to name them |
||
Error addKnownSymbolWithoutSamples(SymbolHandle SymbolID); | ||
|
||
/// Returns an iterable StringRef for strings in the order they are added. | ||
/// Each string may be a symbol name or a file name. | ||
auto getStrings() const { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Prefer spelling out the type instead of auto for the return type. Same for the one below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://google.github.io/styleguide/cppguide.html#Type_deduction also favors spelling out the return type [1], so did it for the return type below. For this function, updated it to return an ArrayRef of
[1] Around Return type deduction, it has |
||
return llvm::make_first_range(StrToIndexMap.getArrayRef()); | ||
} | ||
|
||
/// Returns array reference for various internal data structures. | ||
auto getRecords() const { return Records.getArrayRef(); } | ||
ArrayRef<StringRef> getKnownColdSymbols() const { | ||
return KnownColdSymbols.getArrayRef(); | ||
} | ||
ArrayRef<uint64_t> getKnownColdHashes() const { | ||
return KnownColdHashes.getArrayRef(); | ||
} | ||
|
||
private: | ||
/// Serialize the symbol strings into the output stream. | ||
Error serializeSymbolsAndFilenames(ProfOStream &OS) const; | ||
|
||
/// Deserialize the symbol strings from \p Ptr and increment \p Ptr to the | ||
/// start of the next payload. | ||
Error deserializeSymbolsAndFilenames(const unsigned char *&Ptr, | ||
const uint64_t NumSampledSymbols, | ||
mingmingl-llvm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
const uint64_t NumColdKnownSymbols); | ||
|
||
/// Decode the records and increment \p Ptr to the start of the next payload. | ||
Error deserializeRecords(const unsigned char *&Ptr); | ||
|
||
/// A helper function to compute a storage index for \p SymbolID. | ||
uint64_t getEncodedIndex(const SymbolHandle SymbolID) const; | ||
|
||
// Keeps owned copies of the input strings. | ||
// NOTE: Keep `Saver` initialized before other class members that reference | ||
// its string copies and destructed after they are destructed. | ||
llvm::BumpPtrAllocator Allocator; | ||
llvm::UniqueStringSaver Saver; | ||
|
||
// `Records` stores the records. | ||
MapVector<SymbolHandle, DataAccessProfRecord> Records; | ||
|
||
// Use MapVector to keep input order of strings for serialization and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Move this comment to the typedef? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
// deserialization. | ||
StringToIndexMap StrToIndexMap; | ||
llvm::SetVector<uint64_t> KnownColdHashes; | ||
llvm::SetVector<StringRef> KnownColdSymbols; | ||
}; | ||
|
||
} // namespace data_access_prof | ||
} // namespace llvm | ||
|
||
#endif // LLVM_PROFILEDATA_DATAACCESSPROF_H_ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
add_llvm_component_library(LLVMProfileData | ||
DataAccessProf.cpp | ||
GCOV.cpp | ||
IndexedMemProfData.cpp | ||
InstrProf.cpp | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would the different forms be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic of this field depends on the
IsStringLiteral
field below. For a string literal,IsStringLiteral
is true andSymbolID
has the semantic of 'hash'; otherwise,IsStringLiteral
is false andSymbolID
represent an index.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood that, but my question was more about why in practice some would be string literals and some would be hashes. Might be useful to note this in a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. Added comment at L55 to explain why two forms are used.