Skip to content

Commit b09d3f4

Browse files
ChuanqiXu9Groverkss
authored andcommitted
[clangd] [Modules] Support Reusable Modules Builder (llvm#106683)
This is the following patch of llvm#66462 to optimize its performance. # Motivation To avoid data races, we choose "per file owns its dependent modules" model. That said, every TU will own all the required module files so that we don't need to worry about thread safety. And it looks like we succeeded that we focus on the interfaces and structure of modules support in clangd. But after all, this model is not good for performance. Image we have 10000 TUs import std, we will have 10000 std.pcm in the memory. That is terrible both in time and space. Given the current modules support in clangd works pretty well (almost every issue report I received is more or less a clang's issue), I'd like to improve the performance. # High Level Changes After this patch, the built module files will be owned by the module builder and each TU will only have a reference to the built module files. The module builder have a map from module names to built module files. When a new TU ask for a module file, the module builder will check if the module file lives in the map and if the module file are up to date. If yes, the module file will be returned. If no, the module file entry would be erased in the module builder. We use `shared_ptr<>` to track module file here so that the other TU owning the out dated module file won't be affected. The out dated module file will be removed automatically if other TU gets update or closed. (I know the out dated module file may not exist due to the `CanReuse` mechanism. But the design here is natural and can be seen as a redundant design to make it more robust.) When we a build a module, we will use the mutex and the condition variable in the working thread to build it exclusively. All other threads that also want the module file would have to wait for that working thread. It might not sounds great but I think if we want to make it asynchronous, we have to refactor TUScheduler as far as I know. # Code Structure Changes Thanks for the previous hard working reviewing, the interfaces almost don't change in this patch. Almost all the work are isolated in ModulesBuilder.cpp. A outliner is that we convert `ModulesBuilder` to an abstract class since the implementation class needs to own the module files. And the core function to review is `ReusableModulesBuilder::getOrBuildModuleFile`. It implements the core logic to fetch the module file from the cache or build it if the module file is not in the cache or out of date. And other important entities are `BuildingModuleMutexes`, `BuildingModuleCVs`, `BuildingModules` and `ModulesBuildingMutex`. These are mutexes and condition variables to make sure the thread safety. # User experience I've implemented this in our downstream and ask our users to use it. I also sent it https://github.com/ChuanqiXu9/clangd-for-modules here as pre-version. The feedbacks are pretty good. And I didn't receive any bug reports (about the reusable modules builder) yet. # Other potential improvement The are other two potential improvements can be done: 1. Scanning cache and a mechanism to get the required module information more quickly. (Like the module maps in https://github.com/ChuanqiXu9/clangd-for-modules) 2. Persist the module files. So that after we close the vscode and reopen it, we can reuse the built module files since the last invocation.
1 parent 13f9ce8 commit b09d3f4

File tree

3 files changed

+290
-107
lines changed

3 files changed

+290
-107
lines changed

clang-tools-extra/clangd/ModulesBuilder.cpp

Lines changed: 213 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
#include "clang/Frontend/FrontendActions.h"
1414
#include "clang/Serialization/ASTReader.h"
1515
#include "clang/Serialization/InMemoryModuleCache.h"
16+
#include "llvm/ADT/ScopeExit.h"
17+
#include <queue>
1618

1719
namespace clang {
1820
namespace clangd {
@@ -124,10 +126,58 @@ struct ModuleFile {
124126
llvm::sys::fs::remove(ModuleFilePath);
125127
}
126128

129+
StringRef getModuleName() const { return ModuleName; }
130+
131+
StringRef getModuleFilePath() const { return ModuleFilePath; }
132+
133+
private:
127134
std::string ModuleName;
128135
std::string ModuleFilePath;
129136
};
130137

138+
// ReusablePrerequisiteModules - stands for PrerequisiteModules for which all
139+
// the required modules are built successfully. All the module files
140+
// are owned by the modules builder.
141+
class ReusablePrerequisiteModules : public PrerequisiteModules {
142+
public:
143+
ReusablePrerequisiteModules() = default;
144+
145+
ReusablePrerequisiteModules(const ReusablePrerequisiteModules &Other) =
146+
default;
147+
ReusablePrerequisiteModules &
148+
operator=(const ReusablePrerequisiteModules &) = default;
149+
ReusablePrerequisiteModules(ReusablePrerequisiteModules &&) = delete;
150+
ReusablePrerequisiteModules
151+
operator=(ReusablePrerequisiteModules &&) = delete;
152+
153+
~ReusablePrerequisiteModules() override = default;
154+
155+
void adjustHeaderSearchOptions(HeaderSearchOptions &Options) const override {
156+
// Appending all built module files.
157+
for (const auto &RequiredModule : RequiredModules)
158+
Options.PrebuiltModuleFiles.insert_or_assign(
159+
RequiredModule->getModuleName().str(),
160+
RequiredModule->getModuleFilePath().str());
161+
}
162+
163+
bool canReuse(const CompilerInvocation &CI,
164+
llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>) const override;
165+
166+
bool isModuleUnitBuilt(llvm::StringRef ModuleName) const {
167+
return BuiltModuleNames.contains(ModuleName);
168+
}
169+
170+
void addModuleFile(std::shared_ptr<const ModuleFile> ModuleFile) {
171+
BuiltModuleNames.insert(ModuleFile->getModuleName());
172+
RequiredModules.emplace_back(std::move(ModuleFile));
173+
}
174+
175+
private:
176+
llvm::SmallVector<std::shared_ptr<const ModuleFile>, 8> RequiredModules;
177+
// A helper class to speedup the query if a module is built.
178+
llvm::StringSet<> BuiltModuleNames;
179+
};
180+
131181
bool IsModuleFileUpToDate(PathRef ModuleFilePath,
132182
const PrerequisiteModules &RequisiteModules,
133183
llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS) {
@@ -192,89 +242,20 @@ bool IsModuleFilesUpToDate(
192242
});
193243
}
194244

195-
// StandalonePrerequisiteModules - stands for PrerequisiteModules for which all
196-
// the required modules are built successfully. All the module files
197-
// are owned by the StandalonePrerequisiteModules class.
198-
//
199-
// Any of the built module files won't be shared with other instances of the
200-
// class. So that we can avoid worrying thread safety.
201-
//
202-
// We don't need to worry about duplicated module names here since the standard
203-
// guarantees the module names should be unique to a program.
204-
class StandalonePrerequisiteModules : public PrerequisiteModules {
205-
public:
206-
StandalonePrerequisiteModules() = default;
207-
208-
StandalonePrerequisiteModules(const StandalonePrerequisiteModules &) = delete;
209-
StandalonePrerequisiteModules
210-
operator=(const StandalonePrerequisiteModules &) = delete;
211-
StandalonePrerequisiteModules(StandalonePrerequisiteModules &&) = delete;
212-
StandalonePrerequisiteModules
213-
operator=(StandalonePrerequisiteModules &&) = delete;
214-
215-
~StandalonePrerequisiteModules() override = default;
216-
217-
void adjustHeaderSearchOptions(HeaderSearchOptions &Options) const override {
218-
// Appending all built module files.
219-
for (auto &RequiredModule : RequiredModules)
220-
Options.PrebuiltModuleFiles.insert_or_assign(
221-
RequiredModule.ModuleName, RequiredModule.ModuleFilePath);
222-
}
223-
224-
bool canReuse(const CompilerInvocation &CI,
225-
llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>) const override;
226-
227-
bool isModuleUnitBuilt(llvm::StringRef ModuleName) const {
228-
return BuiltModuleNames.contains(ModuleName);
229-
}
230-
231-
void addModuleFile(llvm::StringRef ModuleName,
232-
llvm::StringRef ModuleFilePath) {
233-
RequiredModules.emplace_back(ModuleName, ModuleFilePath);
234-
BuiltModuleNames.insert(ModuleName);
235-
}
236-
237-
private:
238-
llvm::SmallVector<ModuleFile, 8> RequiredModules;
239-
// A helper class to speedup the query if a module is built.
240-
llvm::StringSet<> BuiltModuleNames;
241-
};
242-
243-
// Build a module file for module with `ModuleName`. The information of built
244-
// module file are stored in \param BuiltModuleFiles.
245-
llvm::Error buildModuleFile(llvm::StringRef ModuleName,
246-
const GlobalCompilationDatabase &CDB,
247-
const ThreadsafeFS &TFS, ProjectModules &MDB,
248-
PathRef ModuleFilesPrefix,
249-
StandalonePrerequisiteModules &BuiltModuleFiles) {
250-
if (BuiltModuleFiles.isModuleUnitBuilt(ModuleName))
251-
return llvm::Error::success();
252-
253-
PathRef ModuleUnitFileName = MDB.getSourceForModuleName(ModuleName);
254-
// It is possible that we're meeting third party modules (modules whose
255-
// source are not in the project. e.g, the std module may be a third-party
256-
// module for most projects) or something wrong with the implementation of
257-
// ProjectModules.
258-
// FIXME: How should we treat third party modules here? If we want to ignore
259-
// third party modules, we should return true instead of false here.
260-
// Currently we simply bail out.
261-
if (ModuleUnitFileName.empty())
262-
return llvm::createStringError("Failed to get the primary source");
263-
245+
/// Build a module file for module with `ModuleName`. The information of built
246+
/// module file are stored in \param BuiltModuleFiles.
247+
llvm::Expected<ModuleFile>
248+
buildModuleFile(llvm::StringRef ModuleName, PathRef ModuleUnitFileName,
249+
const GlobalCompilationDatabase &CDB, const ThreadsafeFS &TFS,
250+
const ReusablePrerequisiteModules &BuiltModuleFiles) {
264251
// Try cheap operation earlier to boil-out cheaply if there are problems.
265252
auto Cmd = CDB.getCompileCommand(ModuleUnitFileName);
266253
if (!Cmd)
267254
return llvm::createStringError(
268255
llvm::formatv("No compile command for {0}", ModuleUnitFileName));
269256

270-
for (auto &RequiredModuleName : MDB.getRequiredModules(ModuleUnitFileName)) {
271-
// Return early if there are errors building the module file.
272-
if (llvm::Error Err = buildModuleFile(RequiredModuleName, CDB, TFS, MDB,
273-
ModuleFilesPrefix, BuiltModuleFiles))
274-
return llvm::createStringError(
275-
llvm::formatv("Failed to build dependency {0}: {1}",
276-
RequiredModuleName, llvm::toString(std::move(Err))));
277-
}
257+
llvm::SmallString<256> ModuleFilesPrefix =
258+
getUniqueModuleFilesPath(ModuleUnitFileName);
278259

279260
Cmd->Output = getModuleFilePath(ModuleName, ModuleFilesPrefix);
280261

@@ -316,58 +297,186 @@ llvm::Error buildModuleFile(llvm::StringRef ModuleName,
316297
if (Clang->getDiagnostics().hasErrorOccurred())
317298
return llvm::createStringError("Compilation failed");
318299

319-
BuiltModuleFiles.addModuleFile(ModuleName, Inputs.CompileCommand.Output);
320-
return llvm::Error::success();
300+
return ModuleFile{ModuleName, Inputs.CompileCommand.Output};
321301
}
302+
303+
bool ReusablePrerequisiteModules::canReuse(
304+
const CompilerInvocation &CI,
305+
llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS) const {
306+
if (RequiredModules.empty())
307+
return true;
308+
309+
llvm::SmallVector<llvm::StringRef> BMIPaths;
310+
for (auto &MF : RequiredModules)
311+
BMIPaths.push_back(MF->getModuleFilePath());
312+
return IsModuleFilesUpToDate(BMIPaths, *this, VFS);
313+
}
314+
315+
class ModuleFileCache {
316+
public:
317+
ModuleFileCache(const GlobalCompilationDatabase &CDB) : CDB(CDB) {}
318+
const GlobalCompilationDatabase &getCDB() const { return CDB; }
319+
320+
std::shared_ptr<const ModuleFile> getModule(StringRef ModuleName);
321+
322+
void add(StringRef ModuleName, std::shared_ptr<const ModuleFile> ModuleFile) {
323+
std::lock_guard<std::mutex> Lock(ModuleFilesMutex);
324+
325+
ModuleFiles[ModuleName] = ModuleFile;
326+
}
327+
328+
void remove(StringRef ModuleName);
329+
330+
private:
331+
const GlobalCompilationDatabase &CDB;
332+
333+
llvm::StringMap<std::weak_ptr<const ModuleFile>> ModuleFiles;
334+
// Mutex to guard accesses to ModuleFiles.
335+
std::mutex ModuleFilesMutex;
336+
};
337+
338+
std::shared_ptr<const ModuleFile>
339+
ModuleFileCache::getModule(StringRef ModuleName) {
340+
std::lock_guard<std::mutex> Lock(ModuleFilesMutex);
341+
342+
auto Iter = ModuleFiles.find(ModuleName);
343+
if (Iter == ModuleFiles.end())
344+
return nullptr;
345+
346+
if (auto Res = Iter->second.lock())
347+
return Res;
348+
349+
ModuleFiles.erase(Iter);
350+
return nullptr;
351+
}
352+
353+
void ModuleFileCache::remove(StringRef ModuleName) {
354+
std::lock_guard<std::mutex> Lock(ModuleFilesMutex);
355+
356+
ModuleFiles.erase(ModuleName);
357+
}
358+
359+
/// Collect the directly and indirectly required module names for \param
360+
/// ModuleName in topological order. The \param ModuleName is guaranteed to
361+
/// be the last element in \param ModuleNames.
362+
llvm::SmallVector<StringRef> getAllRequiredModules(ProjectModules &MDB,
363+
StringRef ModuleName) {
364+
llvm::SmallVector<llvm::StringRef> ModuleNames;
365+
llvm::StringSet<> ModuleNamesSet;
366+
367+
auto VisitDeps = [&](StringRef ModuleName, auto Visitor) -> void {
368+
ModuleNamesSet.insert(ModuleName);
369+
370+
for (StringRef RequiredModuleName :
371+
MDB.getRequiredModules(MDB.getSourceForModuleName(ModuleName)))
372+
if (ModuleNamesSet.insert(RequiredModuleName).second)
373+
Visitor(RequiredModuleName, Visitor);
374+
375+
ModuleNames.push_back(ModuleName);
376+
};
377+
VisitDeps(ModuleName, VisitDeps);
378+
379+
return ModuleNames;
380+
}
381+
322382
} // namespace
323383

384+
class ModulesBuilder::ModulesBuilderImpl {
385+
public:
386+
ModulesBuilderImpl(const GlobalCompilationDatabase &CDB) : Cache(CDB) {}
387+
388+
const GlobalCompilationDatabase &getCDB() const { return Cache.getCDB(); }
389+
390+
llvm::Error
391+
getOrBuildModuleFile(StringRef ModuleName, const ThreadsafeFS &TFS,
392+
ProjectModules &MDB,
393+
ReusablePrerequisiteModules &BuiltModuleFiles);
394+
395+
private:
396+
ModuleFileCache Cache;
397+
};
398+
399+
llvm::Error ModulesBuilder::ModulesBuilderImpl::getOrBuildModuleFile(
400+
StringRef ModuleName, const ThreadsafeFS &TFS, ProjectModules &MDB,
401+
ReusablePrerequisiteModules &BuiltModuleFiles) {
402+
if (BuiltModuleFiles.isModuleUnitBuilt(ModuleName))
403+
return llvm::Error::success();
404+
405+
PathRef ModuleUnitFileName = MDB.getSourceForModuleName(ModuleName);
406+
/// It is possible that we're meeting third party modules (modules whose
407+
/// source are not in the project. e.g, the std module may be a third-party
408+
/// module for most project) or something wrong with the implementation of
409+
/// ProjectModules.
410+
/// FIXME: How should we treat third party modules here? If we want to ignore
411+
/// third party modules, we should return true instead of false here.
412+
/// Currently we simply bail out.
413+
if (ModuleUnitFileName.empty())
414+
return llvm::createStringError(
415+
llvm::formatv("Don't get the module unit for module {0}", ModuleName));
416+
417+
// Get Required modules in topological order.
418+
auto ReqModuleNames = getAllRequiredModules(MDB, ModuleName);
419+
for (llvm::StringRef ReqModuleName : ReqModuleNames) {
420+
if (BuiltModuleFiles.isModuleUnitBuilt(ModuleName))
421+
continue;
422+
423+
if (auto Cached = Cache.getModule(ReqModuleName)) {
424+
if (IsModuleFileUpToDate(Cached->getModuleFilePath(), BuiltModuleFiles,
425+
TFS.view(std::nullopt))) {
426+
log("Reusing module {0} from {1}", ModuleName,
427+
Cached->getModuleFilePath());
428+
BuiltModuleFiles.addModuleFile(std::move(Cached));
429+
continue;
430+
}
431+
Cache.remove(ReqModuleName);
432+
}
433+
434+
llvm::Expected<ModuleFile> MF = buildModuleFile(
435+
ModuleName, ModuleUnitFileName, getCDB(), TFS, BuiltModuleFiles);
436+
if (llvm::Error Err = MF.takeError())
437+
return Err;
438+
439+
log("Built module {0} to {1}", ModuleName, MF->getModuleFilePath());
440+
auto BuiltModuleFile = std::make_shared<const ModuleFile>(std::move(*MF));
441+
Cache.add(ModuleName, BuiltModuleFile);
442+
BuiltModuleFiles.addModuleFile(std::move(BuiltModuleFile));
443+
}
444+
445+
return llvm::Error::success();
446+
}
447+
324448
std::unique_ptr<PrerequisiteModules>
325449
ModulesBuilder::buildPrerequisiteModulesFor(PathRef File,
326-
const ThreadsafeFS &TFS) const {
327-
std::unique_ptr<ProjectModules> MDB = CDB.getProjectModules(File);
450+
const ThreadsafeFS &TFS) {
451+
std::unique_ptr<ProjectModules> MDB = Impl->getCDB().getProjectModules(File);
328452
if (!MDB) {
329453
elog("Failed to get Project Modules information for {0}", File);
330454
return std::make_unique<FailedPrerequisiteModules>();
331455
}
332456

333457
std::vector<std::string> RequiredModuleNames = MDB->getRequiredModules(File);
334458
if (RequiredModuleNames.empty())
335-
return std::make_unique<StandalonePrerequisiteModules>();
336-
337-
llvm::SmallString<256> ModuleFilesPrefix = getUniqueModuleFilesPath(File);
338-
339-
log("Trying to build required modules for {0} in {1}", File,
340-
ModuleFilesPrefix);
341-
342-
auto RequiredModules = std::make_unique<StandalonePrerequisiteModules>();
459+
return std::make_unique<ReusablePrerequisiteModules>();
343460

461+
auto RequiredModules = std::make_unique<ReusablePrerequisiteModules>();
344462
for (llvm::StringRef RequiredModuleName : RequiredModuleNames) {
345463
// Return early if there is any error.
346-
if (llvm::Error Err =
347-
buildModuleFile(RequiredModuleName, CDB, TFS, *MDB.get(),
348-
ModuleFilesPrefix, *RequiredModules.get())) {
464+
if (llvm::Error Err = Impl->getOrBuildModuleFile(
465+
RequiredModuleName, TFS, *MDB.get(), *RequiredModules.get())) {
349466
elog("Failed to build module {0}; due to {1}", RequiredModuleName,
350467
toString(std::move(Err)));
351468
return std::make_unique<FailedPrerequisiteModules>();
352469
}
353470
}
354471

355-
log("Built required modules for {0} in {1}", File, ModuleFilesPrefix);
356-
357472
return std::move(RequiredModules);
358473
}
359474

360-
bool StandalonePrerequisiteModules::canReuse(
361-
const CompilerInvocation &CI,
362-
llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS) const {
363-
if (RequiredModules.empty())
364-
return true;
365-
366-
SmallVector<StringRef> BMIPaths;
367-
for (auto &MF : RequiredModules)
368-
BMIPaths.push_back(MF.ModuleFilePath);
369-
return IsModuleFilesUpToDate(BMIPaths, *this, VFS);
475+
ModulesBuilder::ModulesBuilder(const GlobalCompilationDatabase &CDB) {
476+
Impl = std::make_unique<ModulesBuilderImpl>(CDB);
370477
}
371478

479+
ModulesBuilder::~ModulesBuilder() {}
480+
372481
} // namespace clangd
373482
} // namespace clang

0 commit comments

Comments
 (0)