WIP: R_BPF_64_ABS64 relocation fix #137

alessandrod · 2022-04-27T10:57:28Z

Functional but needs tests

Redefines NULL as nullptr instead of ((void*)0) in C++ for OpenCL. Such internal representation of NULL provides compatibility with C++11 and later language standards. Patch by Topotuna (Justas Janickas)! Differential Revision: https://reviews.llvm.org/D105987

This is needed for having the functions isl_{set,map}_n_basic_{set,map} exported to the C++ interface. Some tests have been modified to reflect the isl changes.

Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions with normal codegen patterns. Differential Revision: https://reviews.llvm.org/D106724

This is partially a workaround. SILowerI1Copies does not understand unstructured loops. This would result in inserting instructions to merge a mask register in the same block where it was defined in an unstructured loop.

The order of testing in two sparse tensor ops was incorrect, which could cause an invalid cast (crashing the compiler instead of reporting the error). This revision fixes that bug. Reviewed By: gussmith23 Differential Revision: https://reviews.llvm.org/D106841

* Implements all of the discussed features: - Links against common CAPI libraries that are self contained. - Stops using the 'python/' directory at the root for everything, opening the namespace up for multiple projects to embed the MLIR python API. - Separates declaration of sources (py and C++) needed to build the extension from building, allowing external projects to build custom assemblies from core parts of the API. - Makes the core python API relocatable (i.e. it could be embedded as something like 'npcomp.ir', 'npcomp.dialects', etc). Still a bit more to do to make it truly isolated but the main structural reset is done. - When building statically, installed python packages are completely self contained, suitable for direct setup and upload to PyPi, et al. - Lets external projects assemble their own CAPI common runtime library that all extensions use. No more possibilities for TypeID issues. - Begins modularizing the API so that external projects that just include a piece pay only for what they use. * I also rolled in a re-organization of the native libraries that matches how I was packaging these out of tree and is a better layering (i.e. all libraries go into a nested _mlir_libs package). There is some further cleanup that I resisted since it would have required source changes that I'd rather do in a followup once everything stabilizes. * Note that I made a somewhat odd choice in choosing to recompile all extensions for each project they are included into (as opposed to compiling once and just linking). While not leveraged yet, this will let us set definitions controlling the namespacing of the extensions so that they can be made to not conflict across projects (with preprocessor definitions). * This will be a relatively substantial breaking change for downstreams. I will handle the npcomp migration and will coordinate with the circt folks before landing. We should stage this and make sure it isn't causing problems before landing. * Fixed a couple of absolute imports that were causing issues. Differential Revision: https://reviews.llvm.org/D106520

I forgot to squash the test updates for b32d3d9

Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D106822

Based on post commit review comments at 68ffed1.

…loops Consider the following loop: void foo(float *dst, float *src, int N) { for (int i = 0; i < N; i++) { dst[i] = 0.0; for (int j = 0; j < N; j++) { dst[i] += src[(i * N) + j]; } } } When we are not building with -Ofast we may attempt to vectorise the inner loop using ordered reductions instead. In addition we also try to select an appropriate interleave count for the inner loop. However, when choosing a VF=1 the inner loop will be scalar and there is existing code in selectInterleaveCount that limits the interleave count to 2 for reductions due to concerns about increasing the critical path. For ordered reductions this problem is even worse due to the additional data dependency, and so I've added code to simply disable interleaving for scalar ordered reductions for now. Test added here: Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll Differential Revision: https://reviews.llvm.org/D106646

Patch by Mohammad Fawaz This patch allows lifetime calls to be ignored (and later erased) if we know that the copy-constant-to-alloca optimization is going to happen. The case that is missed is when the global variable is in a different address space than the alloca (as shown in the example added to the lit test.) This used to work before llvm@6da31fa Differential Revision: https://reviews.llvm.org/D106573

This expands the cost model test for min/max to many more types, including floating point minnum/maxnum and minimum/maximum, and FP16 with and without fullfp16. The old llc run lines are removed, as those are better tested by CodeGen tests.

…t ffp-contract=on Change the ffp-model=precise to enables -ffp-contract=on (previously -ffp-model=precise enabled -ffp-contract=fast). This is a follow-up to Andy Kaylor's comments in the llvm-dev discussion "Floating Point semantic modes". From the same email thread, I put Andy's distillation of floating point options and floating point modes into UsersManual.rst Also fixes bugs.llvm.org/show_bug.cgi?id=50222 I had to revert this a few times because of failures on the x86-64 buildbot but I think we finally have that fixed by LNT/79f2b03c51. Reviewed By: rjmccall, andrew.kaylor Differential Revision: https://reviews.llvm.org/D74436

A vector add may be faster than a vector shift. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106689

Differential Revision: https://reviews.llvm.org/D106104

Building the libraries with -fPIC ensures that we can link an executable against the static libraries with -fPIE. Furthermore, there is apparently basically no downside to building the libraries with position independent code, since modern toolchains are sufficiently clever. This commit enforces that we always build the runtime libraries with -fPIC. This is another take on D104327, which instead makes the decision of whether to build with -fPIC or not to the build script that drives the runtimes' build. Fixes http://llvm.org/PR43604. Differential Revision: https://reviews.llvm.org/D104328

Matches ld64 (cf Options::findIndirectDylib()), and fixes PR51218. Differential Revision: https://reviews.llvm.org/D106842

@jkreiner

The current JumpThreading pass does not jump thread loops since it can result in irreducible control flow that harms other optimizations. This prevents switch statements inside a loop from being optimized to use unconditional branches. This code pattern occurs in the core_state_transition function of Coremark. The state machine can be implemented manually with goto statements resulting in a large runtime improvement, and this transform makes the switch implementation match the goto version in performance. This patch specifically targets switch statements inside a loop that have the opportunity to be threaded. Once it identifies an opportunity, it creates new paths that branch directly to the correct code block. For example, the left CFG could be transformed to the right CFG: ``` sw.bb sw.bb / | \ / | \ case1 case2 case3 case1 case2 case3 \ | / / | \ latch.bb latch.2 latch.3 latch.1 br sw.bb / | \ sw.bb.2 sw.bb.3 sw.bb.1 br case2 br case3 br case1 ``` Co-author: Justin Kreiner @jkreiner Co-author: Ehsan Amiri @amehsan Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D99205

…rtsShapeInfo As an instruction is replaced in optimizeTransposes RAUW will replace it in the ShapeMap (ShapeMap is ValueMap so that uses are updated). In finalizeLowering however we skip updating uses if they are in the ShapeMap since they will be lowered separately at which point we pick up the lowered operands. In the testcase what happened was that since we replaced the doubled-transpose with the shuffle, it ended up in the ShapeMap. As we lowered the columnwise-load the use in the shuffle was not updated. Then as we removed the original columnwise-load we changed that to an undef. I.e. we ended up with: ``` %shuf = shufflevector <8 x double> undef, <8 x double> poison, <6 x i32> ^^^^^ <i32 0, i32 1, i32 2, i32 4, i32 5, i32 6> ``` Besides the fix itself, I have fortified this last bit. As we change uses to undef when removing instruction we track the undefed instruction to make sure we eventually remove those too. This would have caught the issue at compile time. Differential Revision: https://reviews.llvm.org/D106714

The shape of the input is C x R. Differential Revision: https://reviews.llvm.org/D106722

The test accidentally tested something else that makes lld fail with a different (correct-looking) error that wasn't the one the test tries to test for. (The test case before this change makes ld64 hang in an infinite loop.)

Reviewed By: gbalats Differential Revision: https://reviews.llvm.org/D106895

`StackAlignment` has only one use: `StackAlignment = std::max(StackAlignment, AI.getAlignment());` So it is redundant. Reviewed By: vitalybuka, MTC Differential Revision: https://reviews.llvm.org/D106741

Causes a fallback because of lack of regclasses on vregs, unless its without asserts, where we end up crashing later in codegen.

align_val_t is not supported on z/OS, it causes failure on z/OS. similar to https://reviews.llvm.org/rGd0fe294729a2ac45625ed45a5619c8405a14db49 , we will need to disable those test cases on z/OS platform. Differential Revision: https://reviews.llvm.org/D106810

The endswith() check for the framework name fails when joining with the native path separator. Always use the posix separator as fix.

Remove overriding MinGlobalAlign to 0 for z/OS target to be consistent with SystemZ. Reviewed By: abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D106890

* [SOL][BPF] Adjust BPF tests * [SOL][BPF] Improve reporting of stack size is too large - issue only one warning for each function - report the function location if debug information is available

Co-authored-by: Jack May <[email protected]>

* [SOL] Make lld thread-safe with llvm when used in-process Every time Solang tries to link a web-assembly file in-process, the linker re-inits llvm which is not thread-safe with the rest of solang. Signed-off-by: Sean Young <[email protected]> * [SOL][BPF] Enable the _ExtInt extension on the BPF Target for Solana Signed-off-by: Sean Young <[email protected]>

Solana extends BPF so that structs type information is not fully supported in BTF. This leads to ICE crashes and some unsupported relocations being emitted in binary files that linker errors on. For, now the debug information is simply disabled when compiling for Solana to avoid the errors in Debug builds.

- duplicate checks in stack-clash-medium removed - align attribute is not supported by cmpxchg yet

…stants

RBPF doesn't support R_BPF_64_ABS64 and R_BPF_64_64 correctly represents relocation information for global data objects, so we can use them without breaking the correctness of generated object files.

Lower atomic operations to their regular non-atomic equivalents. Lowering for all operations except atomic fence is done at DAG legalization time. Fences are removed at instruction emission time.

[SOL] Introduce dynamic stack frames and the SBFv2 flag Introduce dynamic stack frames, which are currently opt-in and enabled setting cpu=sbfv2. When sbfv2 is used, ELF files are flagged with e_flags=EF_SBF_V2 so the runtime can detect it and react accordingly. Co-authored-by: Dmitri Makarov <[email protected]>

Adds BPF_SDIV, which is enabled only for the SBF subtarget.

7b107c accidentally reverted it back to an hard error.

alessandrod · 2022-04-27T10:58:20Z

oops wrong repo apologies for the noise

Make serialising Yk IR instructions more flexible.

Anastasia Stulova and others added 30 commits July 27, 2021 16:33

Update isl to isl-0.24-69-g54aac5ac

ec3da1a

This is needed for having the functions isl_{set,map}_n_basic_{set,map} exported to the C++ interface. Some tests have been modified to reflect the isl changes.

[WebAssembly] Codegen for extmul SIMD instructions

3378657

Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions with normal codegen patterns. Differential Revision: https://reviews.llvm.org/D106724

AMDGPU: Treat IMPLICIT_DEF like a constant lanemask source

b32d3d9

This is partially a workaround. SILowerI1Copies does not understand unstructured loops. This would result in inserting instructions to merge a mask register in the same block where it was defined in an unstructured loop.

AMDGPU: Update tests for lower i1 change

9b1bcae

I forgot to squash the test updates for b32d3d9

[mlir] Math: add algebraic simplification patterns to math transforms

d94426d

Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D106822

Update reduction test. Remove standalone test file

c78b954

Based on post commit review comments at 68ffed1.

[RISCV] Select vector shl by 1 to a vector add.

3852b8c

A vector add may be faster than a vector shift. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106689

[libc++] CI: Run -std=c++03 on Clang ToT

4547861

Differential Revision: https://reviews.llvm.org/D106104

[lld/mac] When loading reexports, look for basename in -F / -L first

8e8701a

Matches ld64 (cf Options::findIndirectDylib()), and fixes PR51218. Differential Revision: https://reviews.llvm.org/D106842

[Matrix] Fix shape for factored transpose

d87d361

The shape of the input is C x R. Differential Revision: https://reviews.llvm.org/D106722

[lld/mac] Fix application-extension.s failure after 8e8701a

e26356a

The test accidentally tested something else that makes lld fail with a different (correct-looking) error that wasn't the one the test tries to test for. (The test case before this change makes ld64 hang in an infinite loop.)

[gn build] Port 02077da

df95697

[dfsan][NFC] Update API interfaces

00411eb

Reviewed By: gbalats Differential Revision: https://reviews.llvm.org/D106895

[ASAN] NFC: Remove redundant variable

1ee6559

`StackAlignment` has only one use: `StackAlignment = std::max(StackAlignment, AI.getAlignment());` So it is redundant. Reviewed By: vitalybuka, MTC Differential Revision: https://reviews.llvm.org/D106741

[AArch64][GlobalISel] Fix constraining LDXPX intrinsic selection.

a11d9a1

Causes a fallback because of lack of regclasses on vregs, unless its without asserts, where we end up crashing later in codegen.

Add test update for a11d9a1 which disables fallbacks.

fac6c5c

Remove unused include that's also a layering violation. NFC.

05815c9

[lld/mac] Fix sub-library.s on Windows after 8e8701a

dd57915

The endswith() check for the framework name fails when joining with the native path separator. Always use the posix separator as fix.

[z/OS] Make MinGlobalAlign consistent with SystemZ

a2d4b06

Remove overriding MinGlobalAlign to 0 for z/OS target to be consistent with SystemZ. Reviewed By: abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D106890

dmakarov and others added 27 commits October 28, 2021 14:55

[SOL] Allow unaligned store operations

d6ca03d

[SOL] Adjust BPF tests

0dc7225

* [SOL][BPF] Adjust BPF tests * [SOL][BPF] Improve reporting of stack size is too large - issue only one warning for each function - report the function location if debug information is available

[SOL] Allow misaligned loads

5926ecb

Co-authored-by: Jack May <[email protected]>

[SOL] Add BPF compiler-rt builtins

20fb1ce

[SOL] Enable Solana BPF extensions as subtarget feature

4ecd4a5

[SOL] Fix broken tests unrelated to Solana/BPF backend

e246def

- duplicate checks in stack-clash-medium removed - align attribute is not supported by cmpxchg yet

[SOL] Set max stores per mem func depending on the target features

a5acca5

[SOL] Prevent breaking ISelDAG connectivity on replacing loads by con…

3cf76a6

…stants

[SOL] Adjust allowsMisalignedMemoryAccesses signature

5687133

[SOL] Adjust rust alloc routines declarations for Library Info test

57a0303

[SOL] Override default getImplicitAddend implementation for BPF arch

3000af0

[SOL] Add R_BPF_64_ABS64 relocation handling in lld

37cfe04

[SOL] Revert to R_BPF_64_64 for global data object relocations

9743d18

RBPF doesn't support R_BPF_64_ABS64 and R_BPF_64_64 correctly represents relocation information for global data objects, so we can use them without breaking the correctness of generated object files.

[SOL] Revert to R_BPF_64_32 until support for R_BPF_64_ABS32 added

d9d227e

[SOL] Add sbf-solana-solana target triplet

b045e55

[SOL] Turn on solana feature for SBF target by default

c9316dd

[SOL] Register SBF asm parser

e535821

[SOL] Add SBF compiler-rt builtins

9a51f40

[SOL] Add missing SBF conditions to match BPFEL target

9f7a8c2

[SOL] add support for (pseudo) atomics to SBF (rust-lang#23)

421b949

Lower atomic operations to their regular non-atomic equivalents. Lowering for all operations except atomic fence is done at DAG legalization time. Fences are removed at instruction emission time.

[SOL] disable llvm.bpf.load.* intrinsics on SBF (rust-lang#24)

321d9c0

[SOL] native support for signed division in SBF

e8ecac3

Adds BPF_SDIV, which is enabled only for the SBF subtarget.

[SOL] report exceeded stack size as a warning if dynamic frames are off

bce8ee7

7b107c accidentally reverted it back to an hard error.

WIP: fix 64bit data relocs

e7e8a08

alessandrod closed this Apr 27, 2022

vext01 pushed a commit to vext01/llvm-project that referenced this pull request Apr 25, 2024

Merge pull request rust-lang#137 from vext01/more-flexible-lowering

9cb7ad9

Make serialising Yk IR instructions more flexible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: R_BPF_64_ABS64 relocation fix #137

WIP: R_BPF_64_ABS64 relocation fix #137

Uh oh!

alessandrod commented Apr 27, 2022

Uh oh!

alessandrod commented Apr 27, 2022

Uh oh!

Uh oh!

WIP: R_BPF_64_ABS64 relocation fix #137

WIP: R_BPF_64_ABS64 relocation fix #137

Uh oh!

Conversation

alessandrod commented Apr 27, 2022

Uh oh!

alessandrod commented Apr 27, 2022

Uh oh!

Uh oh!