Skip to content

[LLVM][Triple] Drop unknown object types from normalized triples #135571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions llvm/include/llvm/TargetParser/Triple.h
Original file line number Diff line number Diff line change
Expand Up @@ -370,8 +370,7 @@ class Triple {
enum class CanonicalForm {
ANY = 0,
THREE_IDENT = 3, // ARCHITECTURE-VENDOR-OPERATING_SYSTEM
FOUR_IDENT = 4, // ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT
FIVE_IDENT = 5, // ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT-FORMAT
FOUR_IDENT = 4, // ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT(-FORMAT)
};

/// Turn an arbitrary machine specification into the canonical triple form (or
Expand Down
32 changes: 24 additions & 8 deletions llvm/lib/TargetParser/Triple.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1162,11 +1162,12 @@ std::string Triple::normalize(StringRef Str, CanonicalForm Form) {

// Note which components are already in their final position. These will not
// be moved.
bool Found[4];
bool Found[5];
Found[0] = Arch != UnknownArch;
Found[1] = Vendor != UnknownVendor;
Found[2] = OS != UnknownOS;
Found[3] = Environment != UnknownEnvironment;
Found[4] = ObjectFormat != UnknownObjectFormat;

// If they are not there already, permute the components into their canonical
// positions by seeing if they parse as a valid architecture, and if so moving
Expand Down Expand Up @@ -1202,10 +1203,10 @@ std::string Triple::normalize(StringRef Str, CanonicalForm Form) {
case 3:
Environment = parseEnvironment(Comp);
Valid = Environment != UnknownEnvironment;
if (!Valid) {
ObjectFormat = parseFormat(Comp);
Valid = ObjectFormat != UnknownObjectFormat;
}
break;
case 4:
ObjectFormat = parseFormat(Comp);
Valid = ObjectFormat != UnknownObjectFormat;
break;
}
if (!Valid)
Expand Down Expand Up @@ -1265,6 +1266,14 @@ std::string Triple::normalize(StringRef Str, CanonicalForm Form) {
}
}

// Environment "unknown-elf" is just "elf".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rest of the changes is fine, but this doesn't look right to me, especially looking at parseEnvironment, they are not the same. I agree that the object format is treated as part of the environment, but that doesn't mean that unknown-elf is elf. Also, I think unknown environment will yield UnknownEnvironment, based on parseEnvironment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseEnvironment("elf") will also yield UnknownEnvironment though.

Before this patch the following command
clang -### --target=aarch64-pc-linux--elf or clang -### --target=aarch64-pc-linux--elf gives an error due to bug in version parsing but the resultant triple is:
Target: aarch64-pc-linux-elf-unknown which seems wrong.

After this patch we get:
Target: aarch64-pc-linux-elf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is wrong right? We need to treat the root cause not the symptoms. Even if we take format as part of environment, aarch64-pc-linux--elf should also be a valid one, since -elf means the environment part is empty, and elf is the format part. We should not treat aarch64-pc-linux--elf as aarch64-pc-linux-elf-unknown. Instead, we should fix it such that it would be treated as aarch64-pc-linux-unknown-elf instead of aarch64-pc-linux-elf.

Copy link
Contributor Author

@UsmanNadeem UsmanNadeem Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your explanation makes sense to me but the problem is that goes against current toolchain expectations and will cause clang to not find libraries for some targets, e.g. some baremetal toolchains use triples like aarch64-none-elf and riscv64-unknown-elf.
These would normally be canonicalized to aarch64-unknown-none-elf and riscv-unknown-unknown-elf. Adding an extra unknown to account for the missing environment will break things.

Also, if you also look at some of the code in Baremetal.cpp they check for Triple.getEnvironmentName() == "elf", seems like both "elf" and "unknown-elf" should be synonymous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, their code is also part of LLVM code base such that they can be updated as well right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more reviewers to get additional opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context on five-part triples, see f80b49b / 3547633 / feb805f . Five-part triples are not commonly used for anything; it was just implemented that way because it was convenient at the time for the JIT. (So, for example, msvc-elf is a variant of the msvc environment, but using ELF object files.)

Everyone else wants nothing to do with five-part triples. "-elf" is just a generic baremetal ELF environment not tied to any particular operating system or libraries. Similar for "-coff" etc.

Changing the canonical form of "aarch64-pc-linux-elf" to "aarch64-pc-linux-unknown-elf" seems like it's making a lot of work without any real benefit.

if (ObjectFormat != UnknownObjectFormat &&
(Components[3] == "unknown" || Components[3].empty())) {
Found[3] = true;
std::swap(Components[3], Components[4]);
Components.pop_back();
}

// If "none" is in the middle component in a three-component triple, treat it
// as the OS (Components[2]) instead of the vendor (Components[1]).
if (Found[0] && !Found[1] && !Found[2] && Found[3] &&
Expand Down Expand Up @@ -1335,16 +1344,23 @@ std::string Triple::normalize(StringRef Str, CanonicalForm Form) {
}
}

// Leave out unknown object format from the string representation.
if (ObjectFormat == UnknownObjectFormat && Components.size() == 5)
Components.pop_back();

// Canonicalize the components if necessary.
switch (Form) {
case CanonicalForm::ANY:
break;
case CanonicalForm::THREE_IDENT:
case CanonicalForm::FOUR_IDENT:
case CanonicalForm::FIVE_IDENT: {
case CanonicalForm::THREE_IDENT: {
Components.resize(static_cast<unsigned>(Form), "unknown");
break;
}
case CanonicalForm::FOUR_IDENT: {
if (Components.size() < 4)
Components.resize(static_cast<unsigned>(Form), "unknown");
break;
}
}

// Stick the corrected components back together to form the normalized string.
Expand Down
81 changes: 23 additions & 58 deletions llvm/unittests/TargetParser/TripleTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1384,8 +1384,7 @@ TEST(TripleTest, Normalization) {
EXPECT_EQ("unknown-unknown", Triple::normalize("-"));
EXPECT_EQ("unknown-unknown-unknown", Triple::normalize("--"));
EXPECT_EQ("unknown-unknown-unknown-unknown", Triple::normalize("---"));
EXPECT_EQ("unknown-unknown-unknown-unknown-unknown",
Triple::normalize("----"));
EXPECT_EQ("unknown-unknown-unknown-unknown", Triple::normalize("----"));

EXPECT_EQ("a", Triple::normalize("a"));
EXPECT_EQ("a-b", Triple::normalize("a-b"));
Expand All @@ -1403,7 +1402,7 @@ TEST(TripleTest, Normalization) {
EXPECT_EQ("a-pc-b-c", Triple::normalize("a-b-c-pc"));

EXPECT_EQ("a-b-linux", Triple::normalize("a-b-linux"));
EXPECT_EQ("unknown-unknown-linux-b-c", Triple::normalize("linux-b-c"));
EXPECT_EQ("unknown-unknown-linux-b-elf", Triple::normalize("linux-b-elf"));
EXPECT_EQ("a-unknown-linux-c", Triple::normalize("a-linux-c"));

EXPECT_EQ("i386-pc-a", Triple::normalize("a-pc-i386"));
Expand Down Expand Up @@ -1438,27 +1437,15 @@ TEST(TripleTest, Normalization) {
Triple::normalize("a-b-c-d", Triple::CanonicalForm::FOUR_IDENT));
EXPECT_EQ("a-b-c-d",
Triple::normalize("a-b-c-d-e", Triple::CanonicalForm::FOUR_IDENT));

EXPECT_EQ("a-unknown-unknown-unknown-unknown",
Triple::normalize("a", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("a-b-unknown-unknown-unknown",
Triple::normalize("a-b", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("a-b-c-unknown-unknown",
Triple::normalize("a-b-c", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("a-b-c-d-unknown",
Triple::normalize("a-b-c-d", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("a-b-c-d-e",
Triple::normalize("a-b-c-d-e", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ(
"a-b-c-d-elf",
Triple::normalize("a-b-c-d-elf", Triple::CanonicalForm::FOUR_IDENT));

EXPECT_EQ("i386-b-c-unknown",
Triple::normalize("i386-b-c", Triple::CanonicalForm::FOUR_IDENT));
EXPECT_EQ("i386-b-c-unknown-unknown",
Triple::normalize("i386-b-c", Triple::CanonicalForm::FIVE_IDENT));

EXPECT_EQ("i386-a-c-unknown",
Triple::normalize("a-i386-c", Triple::CanonicalForm::FOUR_IDENT));
EXPECT_EQ("i386-a-c-unknown-unknown",
Triple::normalize("a-i386-c", Triple::CanonicalForm::FIVE_IDENT));

EXPECT_EQ("i386-a-b-unknown",
Triple::normalize("a-b-i386", Triple::CanonicalForm::FOUR_IDENT));
Expand Down Expand Up @@ -1502,46 +1489,24 @@ TEST(TripleTest, Normalization) {
"x86_64-unknown-linux-gnu",
Triple::normalize("x86_64-gnu-linux", Triple::CanonicalForm::FOUR_IDENT));

EXPECT_EQ("i386-a-b-unknown-unknown",
Triple::normalize("a-b-i386", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("i386-a-b-c-unknown",
Triple::normalize("a-b-c-i386", Triple::CanonicalForm::FIVE_IDENT));

EXPECT_EQ("a-pc-c-unknown-unknown",
Triple::normalize("a-pc-c", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("unknown-pc-b-c-unknown",
Triple::normalize("pc-b-c", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("a-pc-b-unknown-unknown",
Triple::normalize("a-b-pc", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("a-pc-b-c-unknown",
Triple::normalize("a-b-c-pc", Triple::CanonicalForm::FIVE_IDENT));

EXPECT_EQ("a-b-linux-unknown-unknown",
Triple::normalize("a-b-linux", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("unknown-unknown-linux-b-c",
Triple::normalize("linux-b-c", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("a-unknown-linux-c-unknown",
Triple::normalize("a-linux-c", Triple::CanonicalForm::FIVE_IDENT));

EXPECT_EQ("i386-pc-a-unknown-unknown",
Triple::normalize("a-pc-i386", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("i386-pc-unknown-unknown-unknown",
Triple::normalize("-pc-i386", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("unknown-pc-linux-c-unknown",
Triple::normalize("linux-pc-c", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("unknown-pc-linux-unknown-unknown",
Triple::normalize("linux-pc-", Triple::CanonicalForm::FIVE_IDENT));

EXPECT_EQ("i386-unknown-unknown-unknown-unknown",
Triple::normalize("i386", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("unknown-pc-unknown-unknown-unknown",
Triple::normalize("pc", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("unknown-unknown-linux-unknown-unknown",
Triple::normalize("linux", Triple::CanonicalForm::FIVE_IDENT));

EXPECT_EQ(
"x86_64-unknown-linux-gnu-unknown",
Triple::normalize("x86_64-gnu-linux", Triple::CanonicalForm::FIVE_IDENT));
EXPECT_EQ("arm-unknown-linux-gnueabi",
Triple::normalize("arm-linux-gnueabi",
Triple::CanonicalForm::FOUR_IDENT));
EXPECT_EQ("arm-unknown-linux-gnueabi-elf",
Triple::normalize("arm-linux-gnueabi-elf",
Triple::CanonicalForm::FOUR_IDENT));
EXPECT_EQ("arm-unknown-linux-gnueabi",
Triple::normalize("arm-linux-gnueabi-xyz",
Triple::CanonicalForm::FOUR_IDENT));
EXPECT_EQ("arm-unknown-linux-gnueabi",
Triple::normalize("arm-linux-gnueabi-unknown",
Triple::CanonicalForm::FOUR_IDENT));
EXPECT_EQ("arm-unknown-linux-gnueabi",
Triple::normalize("arm-unknown-linux-gnueabi-unknown",
Triple::CanonicalForm::FOUR_IDENT));
EXPECT_EQ("arm-unknown-linux-gnueabi",
Triple::normalize("arm-unknown-linux-gnueabi-xyz",
Triple::CanonicalForm::ANY));

// Check that normalizing a permutated set of valid components returns a
// triple with the unpermuted components.
Expand Down