-
Notifications
You must be signed in to change notification settings - Fork 0
[MLIR][Bufferization] BufferResultsToOutParams: Add an option to eliminate AllocOp and Copy #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
||
/// If true, the pass eliminates the memref.alloc and memcpy if the returned | ||
/// memref is allocated in the current function. | ||
bool eliminateAllocCopy = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe eliminateResultCopy
is more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It eliminates the allocation as well. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The allocation is eliminated because it's dead code. The key is eliminate result copy. Also there's another allocation in the caller, eliminateAllocCopy
is a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eliminate
usually uses for removing something from existing IR. But result copy doesn't exist before this pass, the actual behaviour of this option is to prevent generating extra copy. Thus suggest to rename as avoidResultCopy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides avoid copying, it also replaces the uses of the allocated memref with the memref in args. avoidResultCopy
may surprise the user that it also replaces the memref and removes the AllocOp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace the allocated memref by the memref in arg list is quite straight-forward from the pass name itself, if we have to emphasize removing the alloc and extra copy, how about avoidBufferResultAllocAndCopy
?
Skipping dynamic shape is due to the original pass doesn't well support this scenario, right? |
Shall we discuss on the public PR? llvm#90011 BTW, The original pass partially support dynamic shape. |
9170bed
to
5a15f77
Compare
Builder alerted me to the failing test, attempt #1 in the blind.
…_64 for now (llvm#90750) Disabling this test on all x86_64 to unblock CI. rdar://125052424
These two are, from a semantic checking perspective, identical to first-private/private/etc, other than appertainment. This patch implements both.
The support for interleaved accesses for scalable vector with a factor of 2 is enabled in vectorizer. Therefore, the patch removed the restriction for scalable vector with a factor of 2.
Like present, no_create, and first_private, copy is a clause that takes just a var-list, and follows the same rules as the others. The one unique part of this clause is that it ALSO supports two deprecated/backwards-compatibility spellings, so this patch adds them and implements them.
…#89848) Make BasicTTIImplBase's `isTypeLegal` check handle unknown types. Current behavior is aborting. Motivated by a use case in SimplifyCFG, where `isTypeLegal` is called on a struct type and dies, when it could be treated as illegal and skipped. In general it could make sense for unknown types to be allowed, and by default just considered not legal, but the behavior can of course be overriden.
…vm#90915) Followup to llvm#90759. Instead of just returning null when the caller scope is not translatable, "jump over" the current caller scope and use the outer scope as the caller if that is available. This means that in an inlined call stack if there are frames without debug scope, those frames are skipped to preserve what is available. In the original example where ``` func A { foo loc(fused<#A>["a":1:1]) } func B { call @A loc("b":1:1) } func C { call @b loc(fused<#C>["c":1:1]) } ``` is inlined into ``` func C { foo loc(callsite( callsite(fused<#A>["a":1:1] at loc("b":1:1)) at fused<#C>["c":1:1])) } ``` The translated result would be `!1`: ``` !0 = !DILocation(line: 1, column: 1, scope: !C) !1 = !DILocation(line: 1, column: 1, scope: !A, inlinedAt: !0) ``` This has a neat benefit in maintaining callsite associativity: No matter if we have `callsite(callsite(A at B) at C)` or `callsite(A at callsite(B at C))`, the translation now is the same. The previous solution did not provide this guarantee, which meant the callsite construction would somehow impact this translation.
After a48ebb8, the function is no longer used. Remove it.
…ruct Like 'copy', these also have alternate names, so this implements that as well. Additionally, these have an optional tag of either 'readonly' or 'zero' depending on the clause. Otherwise, this is a pretty rote implementation of the clause, as there aren't any special rules for it.
Summary: 'test_exit_status_message_sigterm' is failing due to 'psutil' dependency introduced in PR llvm#89405. This fix removes 'deque' dependency and checks if 'psutil' can be imported before running the test. If 'psutil' cannot be imported, it emits a warning and skips the test. Test Plan: ./bin/llvm-lit -sv /path-to-llvm-project/lldb/test/API/tools/lldb-dap/console/TestDAP_console.py --filter=tools/lldb-dap/console/TestDAP_console.py Reviewers: @jeffreytan81,@clayborg,@kusmour, @JDevlieghere,@walter-erquinigo Subscribers: Tasks: lldb-dap Tags:
The vector crypto instructions may have different scheduling behavior compared to VALU operations. Instead of using scheduling resources that describe VALU operations, we give these instructions their own scheduling resources. This is similar to what we did for Zb* instructions. The sifive-p670 has vector crypto, so we model behavior for these instructions in the P600SchedModel. The numbers are based off of measurements collected internally. These numbers are a bit old and new measurements show that they may not be fully accurate. It is likely that we will refine these numbers in a follow up patch(s) based on new measurements. This PR is stacked on llvm#89256.
The m68k backend will always emit external calls (including libcalls) with PC-relative PLT relocations, even when in non-pic mode or -fno-plt is used. This is unexpected, as other function calls are emitted with absolute addressing, and a static code modes suggests that there is no PLT. It also leads to a miscompilation where the call instruction emitted expects an immediate address, while the relocation emitted for that instruction is PC-relative. This miscompilation can even be seen in the default C function in godbolt: https://godbolt.org/z/zEoazovzo Fix the issue by classifying external function references based upon the pic mode. This triggers a change in the static code model, making it more in line with the expected behaviour and allowing use of this backend in more bare-metal situations where a PLT does not exist. The change avoids the issue where we emit a PLT32 relocation for an absolute call, and makes libcalls and other external calls use absolute addressing modes when a static code model is desired. Further work should be done in instruction lowering and validation to ensure that miscompilations of the same type don't occur.
This reverts commit 02660e2. The tests do not pass on AIX, the buildkite precommit CI fails on these tests. For example, https://buildkite.com/llvm-project/libcxx-ci/builds/35184
This attribute is used in the headers. Not using this in the modules has led to several issues. Add them to the modules to avoid these errors in other placed.
Enables vectorization of unpack op in the case of unknown vector size. The vector sizes are determined by the result's shape.
…m tweak (llvm#88737) Clangd already implements some utility functions for converting between `SourceLocation`s, `Position`s and `Offset`s into a buffer.
We have 3 different enums all expressing severity (info, warning, error). Remove all uses with a new Severity enum in lldb-enumerations.h.
…arseNormalizedArchString (llvm#90895) If 'z' is given as the complete extension name or with a digit after it, it will crash in the extension map compare function. Check for these cases and give an error.
into the current module Following of llvm#86912. After llvm#86912, with reduced BMI, the BMI can keep unchange if the dependent modules only changes the implementation (without introduing new decls). However, this is not strictly correct. For example: ``` // a.cppm export module a; export inline int a() { ... } // b.cppm export module b; import a; export inline int b() { return a(); } ``` Since both `a()` and `b()` are inline, we need to make sure the BMI of `b.pcm` will change after the implementation of `a()` changes. We can't get that naturally since we won't record the body of `a()` during the writing process. We can't reuse ODRHash here since ODRHash won't calculate the called function recursively. So ODRHash will be problematic if `a()` calls other inline functions. Probably we can solve this by a new hash mechanism. But the safety and efficiency may a problem too. Here we just combine the hash value of the used modules conservatively.
… helper function (llvm#81024) * This way the helper function could be re-used by indirect-call-promotion pass to find out the vtable for an indirect call and extract the value profiles if any. * The parent patch is llvm#80762
look through shuffle vectors
Demonstrate that this isn't yet working right.
... even if the storage types are different.
…f.forall (llvm#90189) -- This commit adds a canonicalization pattern to fold away iter args of scf.forall if :- a. The corresponding tied result has no use. b. It is not being modified within the loop. Signed-off-by: Abhishek Varma <[email protected]>
… VarDecls (llvm#90948) With the commit d530894, we now preserve the initializer for invalid decls with the recovery-expr. However there is a chance that the original init expr is a typo-expr, we should not preserve it in the final AST, as typo-expr is an internal AST node. We should use the one after the typo correction. This is spotted by a clangd hover crash on the testcase.
Cycle is associated with construct-names and not labels. Change name of a few variables to reflect this. Also add appropriate comment to describe the else case of error checking.
Only VRs should use $noreg, this GPR was accidentally changed in d392520
…inate AllocOp and avoid Copy Add an option hoist-static-allocs to remove the unnecessary memref.alloc and memref.copy after this pass, when the memref in ReturnOp is allocated by memref.alloc and is statically shaped. Instead, it replaces the uses of the allocated memref with the memref in the out argument. By default, BufferResultsToOutParams will result in a memcpy operation to copy the originally returned memref to the output argument memref. This is inefficient when the source of memcpy (the returned memref in the original ReturnOp) is from a local AllocOp. The pass can use the output argument memref to replace the locally allocated memref for better performance. elim-alloc-copy avoids dynamic allocation and memory movement. This option will be critical for performance-sensivtive applications, which require BufferResultsToOutParams pass for a caller-owned output buffer calling convension.
5a15f77
to
acdbdbf
Compare
…e exception specification of a function (llvm#90760) [temp.deduct.general] p6 states: > At certain points in the template argument deduction process it is necessary to take a function type that makes use of template parameters and replace those template parameters with the corresponding template arguments. This is done at the beginning of template argument deduction when any explicitly specified template arguments are substituted into the function type, and again at the end of template argument deduction when any template arguments that were deduced or obtained from default arguments are substituted. [temp.deduct.general] p7 goes on to say: > The _deduction substitution loci_ are > - the function type outside of the _noexcept-specifier_, > - the explicit-specifier, > - the template parameter declarations, and > - the template argument list of a partial specialization > > The substitution occurs in all types and expressions that are used in the deduction substitution loci. [...] Consider the following: ```cpp struct A { static constexpr bool x = true; }; template<typename T, typename U> void f(T, U) noexcept(T::x); // #1 template<typename T, typename U> void f(T, U*) noexcept(T::y); // #2 template<> void f<A>(A, int*) noexcept; // clang currently accepts, GCC and EDG reject ``` Currently, `Sema::SubstituteExplicitTemplateArguments` will substitute into the _noexcept-specifier_ when deducing template arguments from a function declaration or when deducing template arguments for taking the address of a function template (and the substitution is treated as a SFINAE context). In the above example, `#1` is selected as the primary template because substitution of the explicit template arguments into the _noexcept-specifier_ of `#2` failed, which resulted in the candidate being ignored. This behavior is incorrect ([temp.deduct.general] note 4 says as much), and this patch corrects it by deferring all substitution into the _noexcept-specifier_ until it is instantiated. As part of the necessary changes to make this patch work, the instantiation of the exception specification of a function template specialization when taking the address of a function template is changed to only occur for the function selected by overload resolution per [except.spec] p13.1 (as opposed to being instantiated for every candidate).
…ined member functions & member function templates (llvm#88963) Consider the following snippet from the discussion of CWG2847 on the core reflector: ``` template<typename T> concept C = sizeof(T) <= sizeof(long); template<typename T> struct A { template<typename U> void f(U) requires C<U>; // #1, declares a function template void g() requires C<T>; // #2, declares a function template<> void f(char); // #3, an explicit specialization of a function template that declares a function }; template<> template<typename U> void A<short>::f(U) requires C<U>; // #4, an explicit specialization of a function template that declares a function template template<> template<> void A<int>::f(int); // llvm#5, an explicit specialization of a function template that declares a function template<> void A<long>::g(); // llvm#6, an explicit specialization of a function that declares a function ``` A number of problems exist: - Clang rejects `#4` because the trailing _requires-clause_ has `U` substituted with the wrong template parameter depth when `Sema::AreConstraintExpressionsEqual` is called to determine whether it matches the trailing _requires-clause_ of the implicitly instantiated function template. - Clang rejects `llvm#5` because the function template specialization instantiated from `A<int>::f` has a trailing _requires-clause_, but `llvm#5` does not (nor can it have one as it isn't a templated function). - Clang rejects `llvm#6` for the same reasons it rejects `llvm#5`. This patch resolves these issues by making the following changes: - To fix `#4`, `Sema::AreConstraintExpressionsEqual` is passed `FunctionTemplateDecl`s when comparing the trailing _requires-clauses_ of `#4` and the function template instantiated from `#1`. - To fix `llvm#5` and `llvm#6`, the trailing _requires-clauses_ are not compared for explicit specializations that declare functions. In addition to these changes, `CheckMemberSpecialization` now considers constraint satisfaction/constraint partial ordering when determining which member function is specialized by an explicit specialization of a member function for an implicit instantiation of a class template (we previously would select the first function that has the same type as the explicit specialization). With constraints taken under consideration, we match EDG's behavior for these declarations.
...which caused issues like > ==42==ERROR: AddressSanitizer failed to deallocate 0x32 (50) bytes at address 0x117e0000 (error code: 28) > ==42==Cannot dump memory map on emscriptenAddressSanitizer: CHECK failed: sanitizer_common.cpp:81 "((0 && "unable to unmmap")) != (0)" (0x0, 0x0) (tid=288045824) > #0 0x14f73b0c in __asan::CheckUnwind()+0x14f73b0c (this.program+0x14f73b0c) > #1 0x14f8a3c2 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long)+0x14f8a3c2 (this.program+0x14f8a3c2) > #2 0x14f7d6e1 in __sanitizer::ReportMunmapFailureAndDie(void*, unsigned long, int, bool)+0x14f7d6e1 (this.program+0x14f7d6e1) > #3 0x14f81fbd in __sanitizer::UnmapOrDie(void*, unsigned long)+0x14f81fbd (this.program+0x14f81fbd) > #4 0x14f875df in __sanitizer::SuppressionContext::ParseFromFile(char const*)+0x14f875df (this.program+0x14f875df) > llvm#5 0x14f74eab in __asan::InitializeSuppressions()+0x14f74eab (this.program+0x14f74eab) > llvm#6 0x14f73a1a in __asan::AsanInitInternal()+0x14f73a1a (this.program+0x14f73a1a) when trying to use an ASan suppressions file under Emscripten: Even though it would be considered OK by SUSv4, the Emscripten runtime states "We don't support partial munmapping" (see <emscripten-core/emscripten@f4115eb> "Implement MAP_ANONYMOUS on top of malloc in STANDALONE_WASM mode (llvm#16289)"). Co-authored-by: Stephan Bergmann <[email protected]>
…ication as used during partial ordering (llvm#91534) We do not deduce template arguments from the exception specification when determining the primary template of a function template specialization or when taking the address of a function template. Therefore, this patch changes `isAtLeastAsSpecializedAs` such that we do not mark template parameters in the exception specification as 'used' during partial ordering (per [temp.deduct.partial] p12) to prevent the following from being ambiguous: ``` template<typename T, typename U> void f(U) noexcept(noexcept(T())); // #1 template<typename T> void f(T*) noexcept; // #2 template<> void f<int>(int*) noexcept; // currently ambiguous, selects #2 with this patch applied ``` Although there is no corresponding wording in the standard (see core issue filed here cplusplus/CWG#537), this seems to be the intended behavior given the definition of _deduction substitution loci_ in [temp.deduct.general] p7 (and EDG does the same thing).
…erSize (llvm#67657)" This reverts commit f0b3654. This commit triggers UB by reading an uninitialized variable. `UP.PartialThreshold` is used uninitialized in `getUnrollingPreferences()` when it is called from `LoopVectorizationPlanner::executePlan()`. In this case the `UP` variable is created on the stack and its fields are not initialized. ``` ==8802==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x557c0b081b99 in llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h #1 0x557c0b07a40c in llvm::TargetTransformInfo::Model<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:2277:17 #2 0x557c0f5d69ee in llvm::TargetTransformInfo::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) const llvm-project/llvm/lib/Analysis/TargetTransformInfo.cpp:387:19 #3 0x557c0e6b96a0 in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7624:7 #4 0x557c0e6e4b63 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10253:13 llvm#5 0x557c0e6f2429 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo*, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AssumptionCache&, llvm::LoopAccessInfoManager&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10344:30 llvm#6 0x557c0e6f2f97 in llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10383:9 [...] Uninitialized value was created by an allocation of 'UP' in the stack frame #0 0x557c0e6b961e in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7623:3 ```
…vm#90820) This solves some ambuguity introduced in P0522 regarding how template template parameters are partially ordered, and should reduce the negative impact of enabling `-frelaxed-template-template-args` by default. When performing template argument deduction, a template template parameter containing no packs should be more specialized than one that does. Given the following example: ```C++ template<class T2> struct A; template<template<class ...T3s> class TT1, class T4> struct A<TT1<T4>>; // #1 template<template<class T5 > class TT2, class T6> struct A<TT2<T6>>; // #2 template<class T1> struct B; template struct A<B<char>>; ``` Prior to P0522, candidate `#2` would be more specialized. After P0522, neither is more specialized, so this becomes ambiguous. With this change, `#2` becomes more specialized again, maintaining compatibility with pre-P0522 implementations. The problem is that in P0522, candidates are at least as specialized when matching packs to fixed-size lists both ways, whereas before, a fixed-size list is more specialized. This patch keeps the original behavior when checking template arguments outside deduction, but restores this aspect of pre-P0522 matching during deduction. --- Since this changes provisional implementation of CWG2398 which has not been released yet, and already contains a changelog entry, we don't provide a changelog entry here.
…llvm#92855) This solves some ambuguity introduced in P0522 regarding how template template parameters are partially ordered, and should reduce the negative impact of enabling `-frelaxed-template-template-args` by default. When performing template argument deduction, we extend the provisional wording introduced in llvm#89807 so it also covers deduction of class templates. Given the following example: ```C++ template <class T1, class T2 = float> struct A; template <class T3> struct B; template <template <class T4> class TT1, class T5> struct B<TT1<T5>>; // #1 template <class T6, class T7> struct B<A<T6, T7>>; // #2 template struct B<A<int>>; ``` Prior to P0522, `#2` was picked. Afterwards, this became ambiguous. This patch restores the pre-P0522 behavior, `#2` is picked again. This has the beneficial side effect of making the following code valid: ```C++ template<class T, class U> struct A {}; A<int, float> v; template<template<class> class TT> void f(TT<int>); // OK: TT picks 'float' as the default argument for the second parameter. void g() { f(v); } ``` --- Since this changes provisional implementation of CWG2398 which has not been released yet, and already contains a changelog entry, we don't provide a changelog entry here.
The problematic program is as follows: ```shell #define pre_a 0 #define PRE(x) pre_##x void f(void) { PRE(a) && 0; } int main(void) { return 0; } ``` in which after token concatenation (`##`), there's another nested macro `pre_a`. Currently only the outer expansion region will be produced. ([compiler explorer link](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:29,endLineNumber:8,positionColumn:29,positionLineNumber:8,selectionStartColumn:29,selectionStartLineNumber:8,startColumn:29,startLineNumber:8),source:'%23define+pre_a+0%0A%23define+PRE(x)+pre_%23%23x%0A%0Avoid+f(void)+%7B%0A++++PRE(a)+%26%26+0%3B%0A%7D%0A%0Aint+main(void)+%7B+return+0%3B+%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:51.69491525423727,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:cclang_assertions_trunk,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'1',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:2,lang:___c,libs:!(),options:'-fprofile-instr-generate+-fcoverage-mapping+-fcoverage-mcdc+-Xclang+-dump-coverage-mapping+',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+x86-64+clang+(assertions+trunk)+(Editor+%231)',t:'0')),k:34.5741843594503,l:'4',m:28.903654485049834,n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compilerName:'x86-64+clang+(trunk)',editorid:1,fontScale:14,fontUsePx:'0',j:2,wrap:'1'),l:'5',n:'0',o:'Output+of+x86-64+clang+(assertions+trunk)+(Compiler+%232)',t:'0')),header:(),l:'4',m:71.09634551495017,n:'0',o:'',s:0,t:'0')),k:48.30508474576271,l:'3',n:'0',o:'',t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4)) ```text f: File 0, 4:14 -> 6:2 = #0 Decision,File 0, 5:5 -> 5:16 = M:0, C:2 Expansion,File 0, 5:5 -> 5:8 = #0 (Expanded file = 1) File 0, 5:15 -> 5:16 = #1 Branch,File 0, 5:15 -> 5:16 = 0, 0 [2,0,0] File 1, 2:16 -> 2:23 = #0 File 2, 1:15 -> 1:16 = #0 File 2, 1:15 -> 1:16 = #0 Branch,File 2, 1:15 -> 1:16 = 0, 0 [1,2,0] ``` The inner expansion region isn't produced because: 1. In the range-based for loop quoted below, each sloc is processed and possibly emit a corresponding expansion region. 2. For our sloc in question, its direct parent returned by `getIncludeOrExpansionLoc()` is a `<scratch space>`, because that's how `##` is processed. https://github.com/llvm/llvm-project/blob/88b6186af3908c55b357858eb348b5143f21c289/clang/lib/CodeGen/CoverageMappingGen.cpp#L518-L520 3. This `<scratch space>` cannot be found in the FileID mapping so `ParentFileID` will be assigned an `std::nullopt` https://github.com/llvm/llvm-project/blob/88b6186af3908c55b357858eb348b5143f21c289/clang/lib/CodeGen/CoverageMappingGen.cpp#L521-L526 4. As a result this iteration of for loop finishes early and no expansion region is added for the sloc. This problem gets worse with MC/DC: as the example shows, there's a branch from File 2 but File 2 itself is missing. This will trigger assertion failures. The fix is more or less a workaround and takes a similar approach as llvm#89573. ~~Depends on llvm#89573.~~ This includes llvm#89573. Kudos to @chapuni! This and llvm#89573 together fix llvm#87000: I tested locally, both the reduced program and my original use case (fwiw, Linux kernel) can run successfully. --------- Co-authored-by: NAKAMURA Takumi <[email protected]>
…des (llvm#94453) LSR will generate chains of related instructions with a known increment between them. With SVE, in the case of the test case, this can include increments like 'vscale * 16 + 8'. The idea of this patch is if we have a '+8' increment already calculated in the chain, we can generate a (legal) '+ vscale*16' addressing mode from it, allowing us to use the '[x16, #1, mul vl]' addressing mode instructions. In order to do this we keep track of the known 'bases' when generating chains in GenerateIVChain, checking for each if the accumulated increment expression from the base neatly folds into a legal addressing mode. If they do not we fall back to the existing LeftOverExpr, whether it is legal or not. This is mostly orthogonal to llvm#88124, dealing with the generation of chains as opposed to rest of LSR. The existing vscale addressing mode work has greatly helped compared to the last time I looked at this, allowing us to check that the addressing modes are indeed legal.
…lvm#104148) `hasOperands` does not always execute matchers in the order they are written. This can cause issue in code using bindings when one operand matcher is relying on a binding set by the other. With this change, the first matcher present in the code is always executed first and any binding it sets are available to the second matcher. Simple example with current version (1 match) and new version (2 matches): ```bash > cat tmp.cpp int a = 13; int b = ((int) a) - a; int c = a - ((int) a); > clang-query tmp.cpp clang-query> set traversal IgnoreUnlessSpelledInSource clang-query> m binaryOperator(hasOperands(cStyleCastExpr(has(declRefExpr(hasDeclaration(valueDecl().bind("d"))))), declRefExpr(hasDeclaration(valueDecl(equalsBoundNode("d")))))) Match #1: tmp.cpp:1:1: note: "d" binds here int a = 13; ^~~~~~~~~~ tmp.cpp:2:9: note: "root" binds here int b = ((int)a) - a; ^~~~~~~~~~~~ 1 match. > ./build/bin/clang-query tmp.cpp clang-query> set traversal IgnoreUnlessSpelledInSource clang-query> m binaryOperator(hasOperands(cStyleCastExpr(has(declRefExpr(hasDeclaration(valueDecl().bind("d"))))), declRefExpr(hasDeclaration(valueDecl(equalsBoundNode("d")))))) Match #1: tmp.cpp:1:1: note: "d" binds here 1 | int a = 13; | ^~~~~~~~~~ tmp.cpp:2:9: note: "root" binds here 2 | int b = ((int)a) - a; | ^~~~~~~~~~~~ Match #2: tmp.cpp:1:1: note: "d" binds here 1 | int a = 13; | ^~~~~~~~~~ tmp.cpp:3:9: note: "root" binds here 3 | int c = a - ((int)a); | ^~~~~~~~~~~~ 2 matches. ``` If this should be documented or regression tested anywhere please let me know where.
…104523) Compilers and language runtimes often use helper functions that are fundamentally uninteresting when debugging anything but the compiler/runtime itself. This patch introduces a user-extensible mechanism that allows for these frames to be hidden from backtraces and automatically skipped over when navigating the stack with `up` and `down`. This does not affect the numbering of frames, so `f <N>` will still provide access to the hidden frames. The `bt` output will also print a hint that frames have been hidden. My primary motivation for this feature is to hide thunks in the Swift programming language, but I'm including an example recognizer for `std::function::operator()` that I wished for myself many times while debugging LLDB. rdar://126629381 Example output. (Yes, my proof-of-concept recognizer could hide even more frames if we had a method that returned the function name without the return type or I used something that isn't based off regex, but it's really only meant as an example). before: ``` (lldb) thread backtrace --filtered=false * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12 frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10 frame llvm#5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12 frame llvm#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame llvm#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame llvm#8: 0x0000000183cdf154 dyld`start + 2476 (lldb) ``` after ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame llvm#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame llvm#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame llvm#8: 0x0000000183cdf154 dyld`start + 2476 Note: Some frames were hidden by frame recognizers ```
`JITDylibSearchOrderResolver` local variable can be destroyed before completion of all callbacks. Capture it together with `Deps` in `OnEmitted` callback. Original error: ``` ==2035==ERROR: AddressSanitizer: stack-use-after-return on address 0x7bebfa155b70 at pc 0x7ff2a9a88b4a bp 0x7bec08d51980 sp 0x7bec08d51978 READ of size 8 at 0x7bebfa155b70 thread T87 (tf_xla-cpu-llvm) #0 0x7ff2a9a88b49 in operator() llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:58 #1 0x7ff2a9a88b49 in __invoke<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:149:25 #2 0x7ff2a9a88b49 in __call<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:224:5 #3 0x7ff2a9a88b49 in operator() libcxx/include/__functional/function.h:210:12 #4 0x7ff2a9a88b49 in void std::__u::__function::__policy_invoker<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, ```
Static destructor can race with calls to notify and trigger tsan warning. ``` WARNING: ThreadSanitizer: data race (pid=5787) Write of size 1 at 0x55bec9df8de8 by thread T23: #0 pthread_mutex_destroy [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1344](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=1344&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b12affb) (BuildId: ff25ace8b17d9863348bb1759c47246c) #1 __libcpp_recursive_mutex_destroy [third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h:91](third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h?l=91&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d4e9) (BuildId: ff25ace8b17d9863348bb1759c47246c) #2 std::__tsan::recursive_mutex::~recursive_mutex() [third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp:52](third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp?l=52&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d4e9) #3 ~SmartMutex [third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h:28](third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h?l=28&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaedfe) (BuildId: ff25ace8b17d9863348bb1759c47246c) #4 (anonymous namespace)::PerfJITEventListener::~PerfJITEventListener() [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:65](third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=65&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaedfe) llvm#5 cxa_at_exit_callback_installed_at(void*) [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:437](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=437&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b172cb9) (BuildId: ff25ace8b17d9863348bb1759c47246c) llvm#6 llvm::JITEventListener::createPerfJITEventListener() [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:496](third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=496&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcad8f5) (BuildId: ff25ace8b17d9863348bb1759c47246c) ``` ``` Previous atomic read of size 1 at 0x55bec9df8de8 by thread T192 (mutexes: write M0, write M1): #0 pthread_mutex_unlock [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1387](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=1387&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b12b6bb) (BuildId: ff25ace8b17d9863348bb1759c47246c) #1 __libcpp_recursive_mutex_unlock [third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h:87](third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h?l=87&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d589) (BuildId: ff25ace8b17d9863348bb1759c47246c) #2 std::__tsan::recursive_mutex::unlock() [third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp:64](third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp?l=64&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d589) #3 unlock [third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h:47](third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h?l=47&cl=669089572):16 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968) (BuildId: ff25ace8b17d9863348bb1759c47246c) #4 ~lock_guard [third_party/crosstool/v18/stable/src/libcxx/include/__mutex/lock_guard.h:39](third_party/crosstool/v18/stable/src/libcxx/include/__mutex/lock_guard.h?l=39&cl=669089572):101 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968) llvm#5 (anonymous namespace)::PerfJITEventListener::notifyObjectLoaded(unsigned long, llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo const&) [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:290](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=290&cl=669089572):1 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968) llvm#6 llvm::orc::RTDyldObjectLinkingLayer::onObjEmit(llvm::orc::MaterializationResponsibility&, llvm::object::OwningBinary<llvm::object::ObjectFile>, std::__tsan::unique_ptr<llvm::RuntimeDyld::MemoryManager, std::__tsan::default_delete<llvm::RuntimeDyld::MemoryManager>>, std::__tsan::unique_ptr<llvm::RuntimeDyld::LoadedObjectInfo, std::__tsan::default_delete<llvm::RuntimeDyld::LoadedObjectInfo>>, std::__tsan::unique_ptr<llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>, llvm::DenseMapInfo<llvm::orc::JITDylib*, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>>>, std::__tsan::default_delete<llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>, llvm::DenseMapInfo<llvm::orc::JITDylib*, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>>>>>, llvm::Error) [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:386](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp?l=386&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bc404a8) (BuildId: ff25ace8b17d9863348bb1759c47246c) ```
Add an option
elim-alloc-copy
to remove the unnecessarymemref.alloc
andmemref.copy
after this pass, when the memref inReturnOp
is allocated bymemref.alloc
. Instead, it replaces the uses of the allocated memref with the memref in the out argument.By default, BufferResultsToOutParams will result in a memcpy operation to copy the originally returned memref to the output argument memref. This is inefficient when the source of memcpy (the returned memref in the original ReturnOp) is from a local
AllocOp
. The pass can use the output argument memref to replace the locally allocated memref for better performance.elim-alloc-copy
avoids dynamic allocation and memory movement.This option will be critical for performance-sensivtive applications, which require BufferResultsToOutParams pass for a caller-owned output buffer ABI.