Skip to content

[clang] [ARM] Explicitly enable NEON for Windows/Darwin targets #122095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions clang/lib/Driver/ToolChains/Arch/ARM.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -659,13 +659,21 @@ llvm::ARM::FPUKind arm::getARMTargetFeatures(const Driver &D,
CPUArgFPUKind != llvm::ARM::FK_INVALID ? CPUArgFPUKind : ArchArgFPUKind;
(void)llvm::ARM::getFPUFeatures(FPUKind, Features);
} else {
bool Generic = true;
if (!ForAS) {
std::string CPU = arm::getARMTargetCPU(CPUName, ArchName, Triple);
if (CPU != "generic")
Generic = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I do feel that this logic where we explicitly look for "generic" is a bit kludgey. It would be slightly neater to just stick this into the existing Android case above (at lines 650-655) - but then the default-neon case overrides arch/os default CPUs from getLLVMArchKindForARM, which shows up as different output for %clang -target x86_64-apple-macosx10.10 -arch armv7s in arm-target-features.c below (where the default cpu for armv7s includes fp16 support).

I'm open to opinions on what's the least ugly compromise here. :-)

llvm::ARM::ArchKind ArchKind =
arm::getLLVMArchKindForARM(CPU, ArchName, Triple);
FPUKind = llvm::ARM::getDefaultFPU(CPU, ArchKind);
(void)llvm::ARM::getFPUFeatures(FPUKind, Features);
}
if (Generic && (Triple.isOSWindows() || Triple.isOSDarwin()) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the logic for the "generic" check is that clang -target windows will be using cpu generic, fine. If you were specifying the CPU of a real target device, then presumably that would have Neon and Clang would know that anyway.

Only if you're doing something strange with a CPU that's never had iOS or Windows run on it, only then does this code not apply. But given that it's a strange use case, it's best to leave it alone and just take the user's intent literally.

Correct?

Also I note that Android does not do this generic CPU check. I don't think making it do that should be part of this PR, and so, while I think this code is spaghetti, I wouldn't try to merge them here. Probably wouldn't break anything, but probably isn't zero chance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost; clang -target armv7-apple-darwin will return "generic" above, and we'll need to add in NEON explicitly to it. However armv7-windows will get a default CPU of cortex-a9 which has NEON (and more). This case isn't necessarily something we need to preserve much about, but a case of armv7s-apple-darwin does get hit by one of the testcases, and armv7s does also get a specific CPU (swift) which also has got NEON and FP16 - and the distinction from that is picked up by an existing testcase.

So only for the cases where arm::getARMTargetCPU() doesn't return a specific CPU for this arch/OS target combo, we hit the "generic" case, where we want to add in reasonable defaults.

Yes, the fact that Android has its own check further above, means that it doesn't hit the arm::getARMTargetCPU() case, and doesn't pick up defaults from there (where it can add more core specific things like FP16 etc) - and indeed, I'm trying to avoid changing anything wrt that, even if it probably could work with this new codepath as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mstorsjo, this breaks Darwin when compiling assembly with armv7s which is neon-vfp4. vfp4 instructions (e.g. vfma) are no longer recognized because you effectively downgraded neon-vfp4 to neon. When ForAS you assume a "generic" CPU unconditionally rather than "swift".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anemet, sorry about this! While I did test some combinatons with armv7s (as exposed by other testcases), I clearly missed this case.

I tried poking around for different ways of fixing this, but they all more or less amount to the fix as is currently being done in #130623, to remove the ForAS check here. (I also felt that that condition was problematic here, but I was afraid that changing that would be a bigger policy change.)

What do you (and @davemgreen, @DavidSpickett an @Stylie777) think here, what's the best course of action for the current 20.x release branch? Should we split out the change to remove the ForAS condition from the rest of the changes in #130623, making that a smaller separate step which can be backported?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see folks being very happy about backporting anything that would add a potentially breaking change, even if it does fix another issue in the process.

If you were to split out the ForAS change, would that be targeted enough to fix @anemet 's issue, or would that backport be just as impactful as #134366 ?

(we can more likely justify a "worse" fix on 20.x given that main will get a comprehensive fix)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+    bool Generic = "generic";

Did you try this as bool Generic = CPU == "generic"? I am not 100% what the behaviour as it is in the diff would work, I feel that would set Generic to always be true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, that was a typo as I recreated the diff I used for testing (later I did find the one I actually did test with); yes I did use bool Generic = CPU == "generic"; for my testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, thought that might be the case 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the minimal fix in #134366 isn't really backportable, I'm not sure if I see many other reasonable ways to fix this breakage, so then we probably should revert this change for the 20.x branch. If we can land #130623 before 21.x (or at least the split out #134366) then both this and the armv7s/vfpv4 issue should be fixed for 21.x. And this PR in itself only attempted to fix things for downstream-patched Clang, so it's not strictly essential anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That PR is exactly the split out form of the targeted fix for @anemet's issue - but it does change whether FPU features are available in the assembler overall, so it is a bit of a potentially breaking change.

Right, my mistake.

I agree with the idea of reverting this on 20.x.

getARMSubArchVersionNumber(Triple) >= 7) {
FPUKind = llvm::ARM::parseFPU("neon");
(void)llvm::ARM::getFPUFeatures(FPUKind, Features);
}
}

// Now we've finished accumulating features from arch, cpu and fpu,
Expand Down
6 changes: 4 additions & 2 deletions clang/test/Driver/arm-mfpu.c
Original file line number Diff line number Diff line change
Expand Up @@ -356,8 +356,10 @@
// CHECK-HF-DAG: "-target-cpu" "arm1176jzf-s"

// RUN: %clang -target armv7-apple-darwin -x assembler %s -### -c 2>&1 \
// RUN: | FileCheck --check-prefix=ASM %s
// ASM-NOT: -target-feature
// RUN: | FileCheck --check-prefix=ASM-NEON %s
// RUN: %clang -target armv7-windows -x assembler %s -### -c 2>&1 \
// RUN: | FileCheck --check-prefix=ASM-NEON %s
// ASM-NEON: "-target-feature" "+neon"

// RUN: %clang -target armv8-linux-gnueabi -mfloat-abi=soft -mfpu=none %s -### -c 2>&1 \
// RUN: | FileCheck --check-prefix=CHECK-SOFT-ABI-FP %s
Expand Down
27 changes: 27 additions & 0 deletions clang/test/Preprocessor/arm-target-features.c
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,30 @@
// CHECK-V7VE-DEFAULT-ABI-SOFT: #define __ARM_ARCH_EXT_IDIV__ 1
// CHECK-V7VE-DEFAULT-ABI-SOFT: #define __ARM_FP 0xc

// RUN: %clang -target x86_64-apple-macosx10.10 -arch armv7 -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-DARWIN-V7 %s
// CHECK-DARWIN-V7: #define __ARMEL__ 1
// CHECK-DARWIN-V7: #define __ARM_ARCH 7
// CHECK-DARWIN-V7: #define __ARM_ARCH_7A__ 1
// CHECK-DARWIN-V7-NOT: __ARM_FEATURE_CRC32
// CHECK-DARWIN-V7-NOT: __ARM_FEATURE_NUMERIC_MAXMIN
// CHECK-DARWIN-V7-NOT: __ARM_FEATURE_DIRECTED_ROUNDING
// CHECK-DARWIN-V7: #define __ARM_FP 0xc
// CHECK-DARWIN-V7: #define __ARM_NEON 1
// CHECK-DARWIN-V7: #define __ARM_NEON_FP 0x4
// CHECK-DARWIN-V7: #define __ARM_NEON__ 1

// RUN: %clang -target armv7-windows -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-WINDOWS-V7 %s
// CHECK-WINDOWS-V7: #define __ARMEL__ 1
// CHECK-WINDOWS-V7: #define __ARM_ARCH 7
// CHECK-WINDOWS-V7: #define __ARM_ARCH_7A__ 1
// CHECK-WINDOWS-V7-NOT: __ARM_FEATURE_CRC32
// CHECK-WINDOWS-V7-NOT: __ARM_FEATURE_NUMERIC_MAXMIN
// CHECK-WINDOWS-V7-NOT: __ARM_FEATURE_DIRECTED_ROUNDING
// CHECK-WINDOWS-V7: #define __ARM_FP 0xe
// CHECK-WINDOWS-V7: #define __ARM_NEON 1
// CHECK-WINDOWS-V7: #define __ARM_NEON_FP 0x6
// CHECK-WINDOWS-V7: #define __ARM_NEON__ 1

// RUN: %clang -target x86_64-apple-macosx10.10 -arch armv7s -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-V7S %s
// CHECK-V7S: #define __ARMEL__ 1
// CHECK-V7S: #define __ARM_ARCH 7
Expand All @@ -140,6 +164,9 @@
// CHECK-V7S-NOT: __ARM_FEATURE_NUMERIC_MAXMIN
// CHECK-V7S-NOT: __ARM_FEATURE_DIRECTED_ROUNDING
// CHECK-V7S: #define __ARM_FP 0xe
// CHECK-V7S: #define __ARM_NEON 1
// CHECK-V7S: #define __ARM_NEON_FP 0x6
// CHECK-V7S: #define __ARM_NEON__ 1

// RUN: %clang -target arm-arm-none-eabi -march=armv7-m -mfloat-abi=soft -x c -E -dM %s | FileCheck -match-full-lines --check-prefix=CHECK-VFP-FP %s
// RUN: %clang -target arm-arm-none-eabi -march=armv7-m -mfloat-abi=softfp -x c -E -dM %s | FileCheck -match-full-lines --check-prefix=CHECK-VFP-FP %s
Expand Down
Loading