-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[llvm-(min-)tblgen] Avoid redundant source compilation #114494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Build failure is unrelated: |
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-tablegen Author: Michael Kruse (Meinersbur) ChangesAll the sources of While this slightly reduces build time, the main motivation is ccache. Using the hard_link option, building the object files for I am using the hard_link option because it reduces the cost of having multiple build-trees of the same LLVM source tree and reduces the wear to the SSD they are stored on. Full diff: https://github.com/llvm/llvm-project/pull/114494.diff 1 Files Affected:
diff --git a/llvm/utils/TableGen/CMakeLists.txt b/llvm/utils/TableGen/CMakeLists.txt
index ba1e4aa01b48d6..ded2e228dcd6de 100644
--- a/llvm/utils/TableGen/CMakeLists.txt
+++ b/llvm/utils/TableGen/CMakeLists.txt
@@ -32,10 +32,8 @@ set(LLVM_LINK_COMPONENTS
add_tablegen(llvm-tblgen LLVM
DESTINATION "${LLVM_TOOLS_INSTALL_DIR}"
EXPORT LLVM
- ARMTargetDefEmitter.cpp
AsmMatcherEmitter.cpp
AsmWriterEmitter.cpp
- Attributes.cpp
CallingConvEmitter.cpp
CodeEmitterGen.cpp
CodeGenMapTable.cpp
@@ -48,7 +46,6 @@ add_tablegen(llvm-tblgen LLVM
DecoderEmitter.cpp
DFAEmitter.cpp
DFAPacketizerEmitter.cpp
- DirectiveEmitter.cpp
DisassemblerEmitter.cpp
DXILEmitter.cpp
ExegesisEmitter.cpp
@@ -57,18 +54,14 @@ add_tablegen(llvm-tblgen LLVM
GlobalISelEmitter.cpp
InstrDocsEmitter.cpp
InstrInfoEmitter.cpp
- IntrinsicEmitter.cpp
MacroFusionPredicatorEmitter.cpp
OptionParserEmitter.cpp
OptionRSTEmitter.cpp
PseudoLoweringEmitter.cpp
RegisterBankEmitter.cpp
RegisterInfoEmitter.cpp
- RISCVTargetDefEmitter.cpp
SearchableTableEmitter.cpp
SubtargetEmitter.cpp
- TableGen.cpp
- VTEmitter.cpp
WebAssemblyDisassemblerEmitter.cpp
X86InstrMappingEmitter.cpp
X86DisassemblerTables.cpp
@@ -76,6 +69,8 @@ add_tablegen(llvm-tblgen LLVM
X86MnemonicTables.cpp
X86ModRMFilters.cpp
X86RecognizableInstr.cpp
+
+ $<TARGET_OBJECTS:llvm-min-tblgen>
$<TARGET_OBJECTS:obj.LLVMTableGenBasic>
$<TARGET_OBJECTS:obj.LLVMTableGenCommon>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information.
Looks like #80847 rewound my change.
@Pierre-vh Sorry I was not aware of changes in my review.
llvm/utils/TableGen/CMakeLists.txt
Outdated
WebAssemblyDisassemblerEmitter.cpp | ||
X86InstrMappingEmitter.cpp | ||
X86DisassemblerTables.cpp | ||
X86FoldTablesEmitter.cpp | ||
X86MnemonicTables.cpp | ||
X86ModRMFilters.cpp | ||
X86RecognizableInstr.cpp | ||
|
||
$<TARGET_OBJECTS:llvm-min-tblgen> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a35bd89 I created another OBJECT library. Do you think it should be merge into TableGenBasic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the latter, should the "both" files moved into the Basic/ subdir?
You can test this locally with the following command:git-clang-format --diff dd30aa83aa12e5b2b5e58cb72ec85070f725df34 a1849e68b304738573ffb5f828a8e1aa121596e4 --extensions h,cpp -- llvm/utils/TableGen/Basic/TableGen.h llvm/utils/TableGen/llvm-min-tblgen.cpp llvm/utils/TableGen/llvm-tblgen.cpp llvm/utils/TableGen/Basic/ARMTargetDefEmitter.cpp llvm/utils/TableGen/Basic/Attributes.cpp llvm/utils/TableGen/Basic/DirectiveEmitter.cpp llvm/utils/TableGen/Basic/IntrinsicEmitter.cpp llvm/utils/TableGen/Basic/RISCVTargetDefEmitter.cpp llvm/utils/TableGen/Basic/TableGen.cpp llvm/utils/TableGen/Basic/VTEmitter.cpp View the diff from clang-format here.diff --git a/llvm/utils/TableGen/Basic/ARMTargetDefEmitter.cpp b/llvm/utils/TableGen/Basic/ARMTargetDefEmitter.cpp
index 3b02f63e94..d4aa1bcf4e 100644
--- a/llvm/utils/TableGen/Basic/ARMTargetDefEmitter.cpp
+++ b/llvm/utils/TableGen/Basic/ARMTargetDefEmitter.cpp
@@ -275,14 +275,15 @@ static void emitARMTargetDef(const RecordKeeper &RK, raw_ostream &OS) {
auto Name = Rec->getValueAsString("Name");
auto Alias = Rec->getValueAsString("Alias");
if (!Processors.contains(Alias))
- PrintFatalError(
- Rec, "Alias '" + Name + "' references a non-existent ProcessorModel '" + Alias + "'");
+ PrintFatalError(Rec, "Alias '" + Name +
+ "' references a non-existent ProcessorModel '" +
+ Alias + "'");
if (Processors.contains(Name))
- PrintFatalError(
- Rec, "Alias '" + Name + "' duplicates an existing ProcessorModel");
+ PrintFatalError(Rec, "Alias '" + Name +
+ "' duplicates an existing ProcessorModel");
if (!Aliases.insert(Name).second)
- PrintFatalError(
- Rec, "Alias '" + Name + "' duplicates an existing ProcessorAlias");
+ PrintFatalError(Rec, "Alias '" + Name +
+ "' duplicates an existing ProcessorAlias");
OS << llvm::formatv(R"( { "{0}", "{1}" },)", Name, Alias) << '\n';
}
diff --git a/llvm/utils/TableGen/Basic/VTEmitter.cpp b/llvm/utils/TableGen/Basic/VTEmitter.cpp
index d02932dd5e..3b0809441a 100644
--- a/llvm/utils/TableGen/Basic/VTEmitter.cpp
+++ b/llvm/utils/TableGen/Basic/VTEmitter.cpp
@@ -131,7 +131,7 @@ void VTEmitter::run(raw_ostream &OS) {
bool IsScalable = VT->getValueAsBit("isScalable");
bool IsRISCVVecTuple = VT->getValueAsBit("isRISCVVecTuple");
int64_t NF = VT->getValueAsInt("NF");
- bool IsNormalValueType = VT->getValueAsBit("isNormalValueType");
+ bool IsNormalValueType = VT->getValueAsBit("isNormalValueType");
int64_t NElem = IsVector ? VT->getValueAsInt("nElem") : 0;
StringRef EltName = IsVector ? VT->getValueAsDef("ElementType")->getName()
: "INVALID_SIMPLE_VALUE_TYPE";
|
@chapuni ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost good. Please make sure merging onto main before committing.
SDNodeProperties.cpp | ||
TableGen.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest TableGen.cpp
may be individual. IIRC, some cmake generators don't like add_executable
w/o any source files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake has a problem to identify which linker to use (C or C++) when there is no .c/.cpp file in the list. A strategy used elsewhere also in LLVM adds an empty dummy.cpp
into the list.
Keeping TableGen.cpp
(or using the same dummy.cpp
) in both executables would defeat the entire purpose of this patch: The same timestamp issue still occurs with even just a single file which causes all .td files to be processed again in a no-change build (see summary).
I instead applied the usual source structure of an executable: One file per executable which named after the executable name containing the (in this case trivial) main
function, which just calls the tblgen_main
in TableGen.cpp
. This should also clear up any confusion (including mine) of where each executable's main function is.
IntrinsicEmitter.cpp | ||
RISCVTargetDefEmitter.cpp | ||
VTEmitter.cpp | ||
llvm-min-tblgen.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant, one TableGen.cpp
for multiple instances. I don't think we need to create copied files.
PARTIAL_SOURCES_INTENDED
will accept duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please read #114494 (comment) for why not. The duplicate is the exactly the problem this PR intends to solve, even if PARTIAL_SOURCES_INTENDED
accepts it.
@@ -79,6 +70,8 @@ add_tablegen(llvm-tblgen LLVM | |||
$<TARGET_OBJECTS:obj.LLVMTableGenBasic> | |||
$<TARGET_OBJECTS:obj.LLVMTableGenCommon> | |||
|
|||
PARTIAL_SOURCES_INTENDED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be pruned.
Any change in a moved/renamed file shows up in a conflict in GitHub, even if nothing in the moved file was changed, so a new conflict showing up between push and reviews is very likely. When merging main, git recogizes the renamed and automatically applies any upstream changes to the new location. GitHub's auto-merge UI does not. This is also the reason why the code_formatter bot does not pass: it wants to reformat the moved files entirely. If we want them to be formatted, it should be done in a separate "format code" patch. Doing it here would cause actual merge conflicts with the formatted/moved and unformated/original files. |
@chapuni You already LGTM'ed this PR and did not revoke it after the edit, so I am assuming you are still OK with it (there are no copied files, but two new boilerplate ones that in the future may diverge when adding new features to only one of them). So I will push this at the end of this week. It would still be good if you explicitly OK's this change. |
I don't have strong opinions of this. I misunderstood this as reduction of duplications. I think this is the issue in ccache. That said, I don't think I should stop this. |
Independent of whether there is an issue with ccache, compiling any file twice is redundant work that should not happen in a well-designed build system. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/89/builds/13541 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/160/builds/10710 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/130/builds/8194 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/180/builds/10708 Here is the relevant piece of the build log for the reference
|
…)" This reverts commit f6cb569.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/8337 Here is the relevant piece of the build log for the reference
|
…)" This reverts commit f6cb569. Buildbot failures such as https://lab.llvm.org/buildbot/#/builders/89/builds/13541: ``` /usr/bin/ld: utils/TableGen/Basic/CMakeFiles/obj.LLVMTableGenBasic.dir/ARMTargetDefEmitter.cpp.o: undefined reference to symbol '_ZN4llvm23EnableABIBreakingChecksE' /usr/bin/ld: /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/./lib/libLLVMSupport.so.20.0git: error adding symbols: DSO missing from command line ``` Going to investigate.
All the sources of `llvm-min-tblgen` are also used for `llvm-tblgen`, with identical compilation flags. Reuse the object files of `llvm-min-tblgen` for `llvm-tblgen` by applying the usual source structure of an executable: One file per executable which named after the executable name containing the (in this case trivial) main function, which just calls the tblgen_main in TableGen.cpp. This should also clear up any confusion (including mine) of where each executable's main function is. While this slightly reduces build time, the main motivation is ccache. Using the hard_link option, building the object files for `llvm-tblgen` will result in a hard link to the same object file already used for `llvm-min-tblgen`. To signal the build system that the file is new, ccache will update the file's time stamp. Unfortunately, time stamps are shared between all hard-linked files s.t. this will indirectly also update the time stamps for the object files used for `llvm-tblgen`. At the next run, Ninja will recognize this time stamp discrepancy to the expected stamp recorded in `.ninja_log` and rebuild those object files for `llvm-min-tblgen`, which again will also update the stamp for the `llvm-tblgen`... . This is especially annoying for tablegen because it means Ninja will re-run all tablegenning in every build. I am using the hard_link option because it reduces the cost of having multiple build-trees of the LLVM sources and reduces the wear to the SSD they are stored on.
All the sources of
llvm-min-tblgen
are also used forllvm-tblgen
, with identical compilation flags. Reuse the object files ofllvm-min-tblgen
forllvm-tblgen
by applying the usual source structure of an executable: One file per executable which named after the executable name containing the (in this case trivial) main function, which just calls the tblgen_main in TableGen.cpp. This should also clear up any confusion (including mine) of where each executable's main function is.While this slightly reduces build time, the main motivation is ccache. Using the hard_link option, building the object files for
llvm-tblgen
will result in a hard link to the same object file already used forllvm-min-tblgen
. To signal the build system that the file is new, ccache will update the file's time stamp. Unfortunately, time stamps are shared between all hard-linked files s.t. this will indirectly also update the time stamps for the object files used forllvm-tblgen
. At the next run, Ninja will recognize this time stamp discrepancy to the expected stamp recorded in.ninja_log
and rebuild those object files forllvm-min-tblgen
, which again will also update the stamp for thellvm-tblgen
... . This is especially annoying for tablegen because it means Ninja will re-run all tablegenning in every build.I am using the hard_link option because it reduces the cost of having multiple build-trees of the LLVM sources and reduces the wear to the SSD they are stored on.