forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 102
WIP: R_BPF_64_ABS64 relocation fix #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
alessandrod
wants to merge
10,000
commits into
rust-lang:master
from
alessandrod:sol/reloc-abs64-fix
Closed
WIP: R_BPF_64_ABS64 relocation fix #137
alessandrod
wants to merge
10,000
commits into
rust-lang:master
from
alessandrod:sol/reloc-abs64-fix
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Redefines NULL as nullptr instead of ((void*)0) in C++ for OpenCL. Such internal representation of NULL provides compatibility with C++11 and later language standards. Patch by Topotuna (Justas Janickas)! Differential Revision: https://reviews.llvm.org/D105987
This is needed for having the functions isl_{set,map}_n_basic_{set,map} exported to the C++ interface. Some tests have been modified to reflect the isl changes.
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions with normal codegen patterns. Differential Revision: https://reviews.llvm.org/D106724
This is partially a workaround. SILowerI1Copies does not understand unstructured loops. This would result in inserting instructions to merge a mask register in the same block where it was defined in an unstructured loop.
The order of testing in two sparse tensor ops was incorrect, which could cause an invalid cast (crashing the compiler instead of reporting the error). This revision fixes that bug. Reviewed By: gussmith23 Differential Revision: https://reviews.llvm.org/D106841
* Implements all of the discussed features: - Links against common CAPI libraries that are self contained. - Stops using the 'python/' directory at the root for everything, opening the namespace up for multiple projects to embed the MLIR python API. - Separates declaration of sources (py and C++) needed to build the extension from building, allowing external projects to build custom assemblies from core parts of the API. - Makes the core python API relocatable (i.e. it could be embedded as something like 'npcomp.ir', 'npcomp.dialects', etc). Still a bit more to do to make it truly isolated but the main structural reset is done. - When building statically, installed python packages are completely self contained, suitable for direct setup and upload to PyPi, et al. - Lets external projects assemble their own CAPI common runtime library that all extensions use. No more possibilities for TypeID issues. - Begins modularizing the API so that external projects that just include a piece pay only for what they use. * I also rolled in a re-organization of the native libraries that matches how I was packaging these out of tree and is a better layering (i.e. all libraries go into a nested _mlir_libs package). There is some further cleanup that I resisted since it would have required source changes that I'd rather do in a followup once everything stabilizes. * Note that I made a somewhat odd choice in choosing to recompile all extensions for each project they are included into (as opposed to compiling once and just linking). While not leveraged yet, this will let us set definitions controlling the namespacing of the extensions so that they can be made to not conflict across projects (with preprocessor definitions). * This will be a relatively substantial breaking change for downstreams. I will handle the npcomp migration and will coordinate with the circt folks before landing. We should stage this and make sure it isn't causing problems before landing. * Fixed a couple of absolute imports that were causing issues. Differential Revision: https://reviews.llvm.org/D106520
I forgot to squash the test updates for b32d3d9
Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D106822
Based on post commit review comments at 68ffed1.
…loops Consider the following loop: void foo(float *dst, float *src, int N) { for (int i = 0; i < N; i++) { dst[i] = 0.0; for (int j = 0; j < N; j++) { dst[i] += src[(i * N) + j]; } } } When we are not building with -Ofast we may attempt to vectorise the inner loop using ordered reductions instead. In addition we also try to select an appropriate interleave count for the inner loop. However, when choosing a VF=1 the inner loop will be scalar and there is existing code in selectInterleaveCount that limits the interleave count to 2 for reductions due to concerns about increasing the critical path. For ordered reductions this problem is even worse due to the additional data dependency, and so I've added code to simply disable interleaving for scalar ordered reductions for now. Test added here: Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll Differential Revision: https://reviews.llvm.org/D106646
Patch by Mohammad Fawaz This patch allows lifetime calls to be ignored (and later erased) if we know that the copy-constant-to-alloca optimization is going to happen. The case that is missed is when the global variable is in a different address space than the alloca (as shown in the example added to the lit test.) This used to work before llvm@6da31fa Differential Revision: https://reviews.llvm.org/D106573
This expands the cost model test for min/max to many more types, including floating point minnum/maxnum and minimum/maximum, and FP16 with and without fullfp16. The old llc run lines are removed, as those are better tested by CodeGen tests.
…t ffp-contract=on Change the ffp-model=precise to enables -ffp-contract=on (previously -ffp-model=precise enabled -ffp-contract=fast). This is a follow-up to Andy Kaylor's comments in the llvm-dev discussion "Floating Point semantic modes". From the same email thread, I put Andy's distillation of floating point options and floating point modes into UsersManual.rst Also fixes bugs.llvm.org/show_bug.cgi?id=50222 I had to revert this a few times because of failures on the x86-64 buildbot but I think we finally have that fixed by LNT/79f2b03c51. Reviewed By: rjmccall, andrew.kaylor Differential Revision: https://reviews.llvm.org/D74436
A vector add may be faster than a vector shift. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106689
Differential Revision: https://reviews.llvm.org/D106104
Building the libraries with -fPIC ensures that we can link an executable against the static libraries with -fPIE. Furthermore, there is apparently basically no downside to building the libraries with position independent code, since modern toolchains are sufficiently clever. This commit enforces that we always build the runtime libraries with -fPIC. This is another take on D104327, which instead makes the decision of whether to build with -fPIC or not to the build script that drives the runtimes' build. Fixes http://llvm.org/PR43604. Differential Revision: https://reviews.llvm.org/D104328
Matches ld64 (cf Options::findIndirectDylib()), and fixes PR51218. Differential Revision: https://reviews.llvm.org/D106842
The current JumpThreading pass does not jump thread loops since it can result in irreducible control flow that harms other optimizations. This prevents switch statements inside a loop from being optimized to use unconditional branches. This code pattern occurs in the core_state_transition function of Coremark. The state machine can be implemented manually with goto statements resulting in a large runtime improvement, and this transform makes the switch implementation match the goto version in performance. This patch specifically targets switch statements inside a loop that have the opportunity to be threaded. Once it identifies an opportunity, it creates new paths that branch directly to the correct code block. For example, the left CFG could be transformed to the right CFG: ``` sw.bb sw.bb / | \ / | \ case1 case2 case3 case1 case2 case3 \ | / / | \ latch.bb latch.2 latch.3 latch.1 br sw.bb / | \ sw.bb.2 sw.bb.3 sw.bb.1 br case2 br case3 br case1 ``` Co-author: Justin Kreiner @jkreiner Co-author: Ehsan Amiri @amehsan Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D99205
…rtsShapeInfo As an instruction is replaced in optimizeTransposes RAUW will replace it in the ShapeMap (ShapeMap is ValueMap so that uses are updated). In finalizeLowering however we skip updating uses if they are in the ShapeMap since they will be lowered separately at which point we pick up the lowered operands. In the testcase what happened was that since we replaced the doubled-transpose with the shuffle, it ended up in the ShapeMap. As we lowered the columnwise-load the use in the shuffle was not updated. Then as we removed the original columnwise-load we changed that to an undef. I.e. we ended up with: ``` %shuf = shufflevector <8 x double> undef, <8 x double> poison, <6 x i32> ^^^^^ <i32 0, i32 1, i32 2, i32 4, i32 5, i32 6> ``` Besides the fix itself, I have fortified this last bit. As we change uses to undef when removing instruction we track the undefed instruction to make sure we eventually remove those too. This would have caught the issue at compile time. Differential Revision: https://reviews.llvm.org/D106714
The shape of the input is C x R. Differential Revision: https://reviews.llvm.org/D106722
The test accidentally tested something else that makes lld fail with a different (correct-looking) error that wasn't the one the test tries to test for. (The test case before this change makes ld64 hang in an infinite loop.)
Reviewed By: gbalats Differential Revision: https://reviews.llvm.org/D106895
`StackAlignment` has only one use: `StackAlignment = std::max(StackAlignment, AI.getAlignment());` So it is redundant. Reviewed By: vitalybuka, MTC Differential Revision: https://reviews.llvm.org/D106741
Causes a fallback because of lack of regclasses on vregs, unless its without asserts, where we end up crashing later in codegen.
align_val_t is not supported on z/OS, it causes failure on z/OS. similar to https://reviews.llvm.org/rGd0fe294729a2ac45625ed45a5619c8405a14db49 , we will need to disable those test cases on z/OS platform. Differential Revision: https://reviews.llvm.org/D106810
The endswith() check for the framework name fails when joining with the native path separator. Always use the posix separator as fix.
Remove overriding MinGlobalAlign to 0 for z/OS target to be consistent with SystemZ. Reviewed By: abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D106890
* [SOL][BPF] Adjust BPF tests * [SOL][BPF] Improve reporting of stack size is too large - issue only one warning for each function - report the function location if debug information is available
Co-authored-by: Jack May <[email protected]>
* [SOL] Make lld thread-safe with llvm when used in-process Every time Solang tries to link a web-assembly file in-process, the linker re-inits llvm which is not thread-safe with the rest of solang. Signed-off-by: Sean Young <[email protected]> * [SOL][BPF] Enable the _ExtInt extension on the BPF Target for Solana Signed-off-by: Sean Young <[email protected]>
Solana extends BPF so that structs type information is not fully supported in BTF. This leads to ICE crashes and some unsupported relocations being emitted in binary files that linker errors on. For, now the debug information is simply disabled when compiling for Solana to avoid the errors in Debug builds.
- duplicate checks in stack-clash-medium removed - align attribute is not supported by cmpxchg yet
RBPF doesn't support R_BPF_64_ABS64 and R_BPF_64_64 correctly represents relocation information for global data objects, so we can use them without breaking the correctness of generated object files.
Lower atomic operations to their regular non-atomic equivalents. Lowering for all operations except atomic fence is done at DAG legalization time. Fences are removed at instruction emission time.
[SOL] Introduce dynamic stack frames and the SBFv2 flag Introduce dynamic stack frames, which are currently opt-in and enabled setting cpu=sbfv2. When sbfv2 is used, ELF files are flagged with e_flags=EF_SBF_V2 so the runtime can detect it and react accordingly. Co-authored-by: Dmitri Makarov <[email protected]>
Adds BPF_SDIV, which is enabled only for the SBF subtarget.
7b107c accidentally reverted it back to an hard error.
oops wrong repo apologies for the noise |
vext01
pushed a commit
to vext01/llvm-project
that referenced
this pull request
Apr 25, 2024
Make serialising Yk IR instructions more flexible.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Functional but needs tests