Skip to content

WIP: R_BPF_64_ABS64 relocation fix #137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10,000 commits into from

Conversation

alessandrod
Copy link

Functional but needs tests

Anastasia Stulova and others added 30 commits July 27, 2021 16:33
Redefines NULL as nullptr instead of ((void*)0)
in C++ for OpenCL.

Such internal representation of NULL provides
compatibility with C++11 and later language
standards.

Patch by Topotuna (Justas Janickas)!

Differential Revision: https://reviews.llvm.org/D105987
This is needed for having the functions isl_{set,map}_n_basic_{set,map}
exported to the C++ interface.
Some tests have been modified to reflect the isl changes.
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.

Differential Revision: https://reviews.llvm.org/D106724
This is partially a workaround. SILowerI1Copies does not understand
unstructured loops. This would result in inserting instructions to
merge a mask register in the same block where it was defined in an
unstructured loop.
The order of testing in two sparse tensor ops was incorrect,
which could cause an invalid cast (crashing the compiler instead
of reporting the error). This revision fixes that bug.

Reviewed By: gussmith23

Differential Revision: https://reviews.llvm.org/D106841
* Implements all of the discussed features:
  - Links against common CAPI libraries that are self contained.
  - Stops using the 'python/' directory at the root for everything, opening the namespace up for multiple projects to embed the MLIR python API.
  - Separates declaration of sources (py and C++) needed to build the extension from building, allowing external projects to build custom assemblies from core parts of the API.
  - Makes the core python API relocatable (i.e. it could be embedded as something like 'npcomp.ir', 'npcomp.dialects', etc). Still a bit more to do to make it truly isolated but the main structural reset is done.
  - When building statically, installed python packages are completely self contained, suitable for direct setup and upload to PyPi, et al.
  - Lets external projects assemble their own CAPI common runtime library that all extensions use. No more possibilities for TypeID issues.
  - Begins modularizing the API so that external projects that just include a piece pay only for what they use.
* I also rolled in a re-organization of the native libraries that matches how I was packaging these out of tree and is a better layering (i.e. all libraries go into a nested _mlir_libs package). There is some further cleanup that I resisted since it would have required source changes that I'd rather do in a followup once everything stabilizes.
* Note that I made a somewhat odd choice in choosing to recompile all extensions for each project they are included into (as opposed to compiling once and just linking). While not leveraged yet, this will let us set definitions controlling the namespacing of the extensions so that they can be made to not conflict across projects (with preprocessor definitions).
* This will be a relatively substantial breaking change for downstreams. I will handle the npcomp migration and will coordinate with the circt folks before landing. We should stage this and make sure it isn't causing problems before landing.
* Fixed a couple of absolute imports that were causing issues.

Differential Revision: https://reviews.llvm.org/D106520
I forgot to squash the test updates for b32d3d9
Based on post commit review comments at 68ffed1.
…loops

Consider the following loop:

  void foo(float *dst, float *src, int N) {
    for (int i = 0; i < N; i++) {
      dst[i] = 0.0;
      for (int j = 0; j < N; j++) {
        dst[i] += src[(i * N) + j];
      }
    }
  }

When we are not building with -Ofast we may attempt to vectorise the
inner loop using ordered reductions instead. In addition we also try
to select an appropriate interleave count for the inner loop. However,
when choosing a VF=1 the inner loop will be scalar and there is existing
code in selectInterleaveCount that limits the interleave count to 2
for reductions due to concerns about increasing the critical path.
For ordered reductions this problem is even worse due to the additional
data dependency, and so I've added code to simply disable interleaving
for scalar ordered reductions for now.

Test added here:

  Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll

Differential Revision: https://reviews.llvm.org/D106646
Patch by Mohammad Fawaz

This patch allows lifetime calls to be ignored (and later erased) if we
know that the copy-constant-to-alloca optimization is going to happen.
The case that is missed is when the global variable is in a different address
space than the alloca (as shown in the example added to the lit test.)

This used to work before llvm@6da31fa

Differential Revision: https://reviews.llvm.org/D106573
This expands the cost model test for min/max to many more types,
including floating point minnum/maxnum and minimum/maximum, and FP16
with and without fullfp16.  The old llc run lines are removed, as those
are better tested by CodeGen tests.
…t ffp-contract=on

Change the ffp-model=precise to enables -ffp-contract=on (previously
-ffp-model=precise enabled -ffp-contract=fast). This is a follow-up
to Andy Kaylor's comments in the llvm-dev discussion "Floating Point
semantic modes". From the same email thread, I put Andy's distillation
of floating point options and floating point modes into UsersManual.rst
Also fixes bugs.llvm.org/show_bug.cgi?id=50222

I had to revert this a few times because of failures on the x86-64
buildbot but I think we finally have that fixed by LNT/79f2b03c51.

Reviewed By: rjmccall, andrew.kaylor

Differential Revision: https://reviews.llvm.org/D74436
A vector add may be faster than a vector shift.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106689
Building the libraries with -fPIC ensures that we can link an executable
against the static libraries with -fPIE. Furthermore, there is apparently
basically no downside to building the libraries with position independent
code, since modern toolchains are sufficiently clever.

This commit enforces that we always build the runtime libraries with -fPIC.
This is another take on D104327, which instead makes the decision of whether
to build with -fPIC or not to the build script that drives the runtimes'
build.

Fixes http://llvm.org/PR43604.

Differential Revision: https://reviews.llvm.org/D104328
Matches ld64 (cf Options::findIndirectDylib()), and fixes PR51218.

Differential Revision: https://reviews.llvm.org/D106842
The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.

This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.

This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:

```
          sw.bb                        sw.bb
        /   |   \                    /   |   \
   case1  case2  case3          case1  case2  case3
        \   |   /                /       |       \
        latch.bb             latch.2  latch.3  latch.1
         br sw.bb              /         |         \
                           sw.bb.2     sw.bb.3     sw.bb.1
                            br case2    br case3    br case1
```

Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D99205
…rtsShapeInfo

As an instruction is replaced in optimizeTransposes RAUW will replace it in
the ShapeMap (ShapeMap is ValueMap so that uses are updated).  In
finalizeLowering however we skip updating uses if they are in the ShapeMap
since they will be lowered separately at which point we pick up the lowered
operands.

In the testcase what happened was that since we replaced the doubled-transpose
with the shuffle, it ended up in the ShapeMap.  As we lowered the
columnwise-load the use in the shuffle was not updated.  Then as we removed
the original columnwise-load we changed that to an undef.  I.e. we ended up
with:

```
%shuf = shufflevector <8 x double> undef, <8 x double> poison, <6 x i32>
                                   ^^^^^
                                  <i32 0, i32 1, i32 2, i32 4, i32 5, i32 6>
```

Besides the fix itself, I have fortified this last bit.  As we change uses to
undef when removing instruction we track the undefed instruction to make sure
we eventually remove those too.  This would have caught the issue at compile
time.

Differential Revision: https://reviews.llvm.org/D106714
The shape of the input is C x R.

Differential Revision: https://reviews.llvm.org/D106722
The test accidentally tested something else that makes lld fail
with a different (correct-looking) error that wasn't the one the
test tries to test for. (The test case before this change makes
ld64 hang in an infinite loop.)
Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D106895
`StackAlignment` has only one use: `StackAlignment = std::max(StackAlignment, AI.getAlignment());` So it is redundant.

Reviewed By: vitalybuka, MTC

Differential Revision: https://reviews.llvm.org/D106741
Causes a fallback because of lack of regclasses on vregs, unless its without
asserts, where we end up crashing later in codegen.
align_val_t is not supported on z/OS, it causes failure on z/OS. similar to https://reviews.llvm.org/rGd0fe294729a2ac45625ed45a5619c8405a14db49 , we will need to disable those test cases on z/OS platform.

Differential Revision: https://reviews.llvm.org/D106810
The endswith() check for the framework name fails when joining
with the native path separator. Always use the posix separator as fix.
Remove overriding MinGlobalAlign to 0 for z/OS target to be consistent with SystemZ.

Reviewed By: abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D106890
dmakarov and others added 27 commits October 28, 2021 14:55
* [SOL][BPF] Adjust BPF tests

* [SOL][BPF] Improve reporting of stack size is too large

- issue only one warning for each function
- report the function location if debug information is available
* [SOL] Make lld thread-safe with llvm when used in-process

Every time Solang tries to link a web-assembly file in-process, the linker
re-inits llvm which is not thread-safe with the rest of solang.

Signed-off-by: Sean Young <[email protected]>

* [SOL][BPF] Enable the _ExtInt extension on the BPF Target for Solana

Signed-off-by: Sean Young <[email protected]>
Solana extends BPF so that structs type information is not fully
supported in BTF.  This leads to ICE crashes and some unsupported
relocations being emitted in binary files that linker errors on.
For, now the debug information is simply disabled when compiling
for Solana to avoid the errors in Debug builds.
- duplicate checks in stack-clash-medium removed
- align attribute is not supported by cmpxchg yet
RBPF doesn't support R_BPF_64_ABS64 and R_BPF_64_64 correctly
represents relocation information for global data objects, so we can
use them without breaking the correctness of generated object files.
Lower atomic operations to their regular non-atomic equivalents. Lowering for
all operations except atomic fence is done at DAG legalization time. Fences are
removed at instruction emission time.
[SOL] Introduce dynamic stack frames and the SBFv2 flag

Introduce dynamic stack frames, which are currently opt-in and enabled setting cpu=sbfv2.
When sbfv2 is used, ELF files are flagged with e_flags=EF_SBF_V2 so the runtime can detect it
and react accordingly. 

Co-authored-by: Dmitri Makarov <[email protected]>
Adds BPF_SDIV, which is enabled only for the SBF subtarget.
7b107c accidentally reverted it back to an hard error.
@alessandrod
Copy link
Author

oops wrong repo apologies for the noise

vext01 pushed a commit to vext01/llvm-project that referenced this pull request Apr 25, 2024
Make serialising Yk IR instructions more flexible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.