[MLIR][Bufferization] BufferResultsToOutParams: Add an option to eliminate AllocOp and Copy #1

Menooker · 2024-04-18T08:16:22Z

Add an option elim-alloc-copy to remove the unnecessary memref.alloc and memref.copy after this pass, when the memref in ReturnOp is allocated by memref.alloc. Instead, it replaces the uses of the allocated memref with the memref in the out argument.
By default, BufferResultsToOutParams will result in a memcpy operation to copy the originally returned memref to the output argument memref. This is inefficient when the source of memcpy (the returned memref in the original ReturnOp) is from a local AllocOp. The pass can use the output argument memref to replace the locally allocated memref for better performance. elim-alloc-copy avoids dynamic allocation and memory movement.
This option will be critical for performance-sensivtive applications, which require BufferResultsToOutParams pass for a caller-owned output buffer ABI.

ciyongch

LGTM.

ciyongch · 2024-04-18T08:40:47Z

mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h

+
+  /// If true, the pass eliminates the memref.alloc and memcpy if the returned
+  /// memref is allocated in the current function.
+  bool eliminateAllocCopy = false;


Maybe eliminateResultCopy is more appropriate?

It eliminates the allocation as well. :)

The allocation is eliminated because it's dead code. The key is eliminate result copy. Also there's another allocation in the caller, eliminateAllocCopy is a bit confusing.

eliminate usually uses for removing something from existing IR. But result copy doesn't exist before this pass, the actual behaviour of this option is to prevent generating extra copy. Thus suggest to rename as avoidResultCopy.

Besides avoid copying, it also replaces the uses of the allocated memref with the memref in args. avoidResultCopy may surprise the user that it also replaces the memref and removes the AllocOp.

replace the allocated memref by the memref in arg list is quite straight-forward from the pass name itself, if we have to emphasize removing the alloc and extra copy, how about avoidBufferResultAllocAndCopy ?

ciyongch · 2024-04-28T08:11:12Z

Skipping dynamic shape is due to the original pass doesn't well support this scenario, right?
BTW, the original pass always requires the the caller to allocate the buffer for the result, it might be a bit confusing that we're running with "buffer-results-to-out-params" pass but with "hoist-static-allocs=0".

Menooker · 2024-04-28T11:23:37Z

Skipping dynamic shape is due to the original pass doesn't well support this scenario, right? BTW, the original pass always requires the the caller to allocate the buffer for the result, it might be a bit confusing that we're running with "buffer-results-to-out-params" pass but with "hoist-static-allocs=0".

Shall we discuss on the public PR? llvm#90011

BTW, The original pass partially support dynamic shape.

Builder alerted me to the failing test, attempt #1 in the blind.

…_64 for now (llvm#90750) Disabling this test on all x86_64 to unblock CI. rdar://125052424

These two are, from a semantic checking perspective, identical to first-private/private/etc, other than appertainment. This patch implements both.

The support for interleaved accesses for scalable vector with a factor of 2 is enabled in vectorizer. Therefore, the patch removed the restriction for scalable vector with a factor of 2.

Like present, no_create, and first_private, copy is a clause that takes just a var-list, and follows the same rules as the others. The one unique part of this clause is that it ALSO supports two deprecated/backwards-compatibility spellings, so this patch adds them and implements them.

…#89848) Make BasicTTIImplBase's `isTypeLegal` check handle unknown types. Current behavior is aborting. Motivated by a use case in SimplifyCFG, where `isTypeLegal` is called on a struct type and dies, when it could be treated as illegal and skipped. In general it could make sense for unknown types to be allowed, and by default just considered not legal, but the behavior can of course be overriden.

@A

…vm#90915) Followup to llvm#90759. Instead of just returning null when the caller scope is not translatable, "jump over" the current caller scope and use the outer scope as the caller if that is available. This means that in an inlined call stack if there are frames without debug scope, those frames are skipped to preserve what is available. In the original example where ``` func A { foo loc(fused<#A>["a":1:1]) } func B { call @A loc("b":1:1) } func C { call @b loc(fused<#C>["c":1:1]) } ``` is inlined into ``` func C { foo loc(callsite( callsite(fused<#A>["a":1:1] at loc("b":1:1)) at fused<#C>["c":1:1])) } ``` The translated result would be `!1`: ``` !0 = !DILocation(line: 1, column: 1, scope: !C) !1 = !DILocation(line: 1, column: 1, scope: !A, inlinedAt: !0) ``` This has a neat benefit in maintaining callsite associativity: No matter if we have `callsite(callsite(A at B) at C)` or `callsite(A at callsite(B at C))`, the translation now is the same. The previous solution did not provide this guarantee, which meant the callsite construction would somehow impact this translation.

After a48ebb8, the function is no longer used. Remove it.

…ruct Like 'copy', these also have alternate names, so this implements that as well. Additionally, these have an optional tag of either 'readonly' or 'zero' depending on the clause. Otherwise, this is a pretty rote implementation of the clause, as there aren't any special rules for it.

@jeffreytan81

Summary: 'test_exit_status_message_sigterm' is failing due to 'psutil' dependency introduced in PR llvm#89405. This fix removes 'deque' dependency and checks if 'psutil' can be imported before running the test. If 'psutil' cannot be imported, it emits a warning and skips the test. Test Plan: ./bin/llvm-lit -sv /path-to-llvm-project/lldb/test/API/tools/lldb-dap/console/TestDAP_console.py --filter=tools/lldb-dap/console/TestDAP_console.py Reviewers: @jeffreytan81,@clayborg,@kusmour, @JDevlieghere,@walter-erquinigo Subscribers: Tasks: lldb-dap Tags:

The vector crypto instructions may have different scheduling behavior compared to VALU operations. Instead of using scheduling resources that describe VALU operations, we give these instructions their own scheduling resources. This is similar to what we did for Zb* instructions. The sifive-p670 has vector crypto, so we model behavior for these instructions in the P600SchedModel. The numbers are based off of measurements collected internally. These numbers are a bit old and new measurements show that they may not be fully accurate. It is likely that we will refine these numbers in a follow up patch(s) based on new measurements. This PR is stacked on llvm#89256.

The m68k backend will always emit external calls (including libcalls) with PC-relative PLT relocations, even when in non-pic mode or -fno-plt is used. This is unexpected, as other function calls are emitted with absolute addressing, and a static code modes suggests that there is no PLT. It also leads to a miscompilation where the call instruction emitted expects an immediate address, while the relocation emitted for that instruction is PC-relative. This miscompilation can even be seen in the default C function in godbolt: https://godbolt.org/z/zEoazovzo Fix the issue by classifying external function references based upon the pic mode. This triggers a change in the static code model, making it more in line with the expected behaviour and allowing use of this backend in more bare-metal situations where a PLT does not exist. The change avoids the issue where we emit a PLT32 relocation for an absolute call, and makes libcalls and other external calls use absolute addressing modes when a static code model is desired. Further work should be done in instruction lowering and validation to ensure that miscompilations of the same type don't occur.

This reverts commit 02660e2. The tests do not pass on AIX, the buildkite precommit CI fails on these tests. For example, https://buildkite.com/llvm-project/libcxx-ci/builds/35184

This attribute is used in the headers. Not using this in the modules has led to several issues. Add them to the modules to avoid these errors in other placed.

Enables vectorization of unpack op in the case of unknown vector size. The vector sizes are determined by the result's shape.

…m tweak (llvm#88737) Clangd already implements some utility functions for converting between `SourceLocation`s, `Position`s and `Offset`s into a buffer.

We have 3 different enums all expressing severity (info, warning, error). Remove all uses with a new Severity enum in lldb-enumerations.h.

…arseNormalizedArchString (llvm#90895) If 'z' is given as the complete extension name or with a digit after it, it will crash in the extension map compare function. Check for these cases and give an error.

into the current module Following of llvm#86912. After llvm#86912, with reduced BMI, the BMI can keep unchange if the dependent modules only changes the implementation (without introduing new decls). However, this is not strictly correct. For example: ``` // a.cppm export module a; export inline int a() { ... } // b.cppm export module b; import a; export inline int b() { return a(); } ``` Since both `a()` and `b()` are inline, we need to make sure the BMI of `b.pcm` will change after the implementation of `a()` changes. We can't get that naturally since we won't record the body of `a()` during the writing process. We can't reuse ODRHash here since ODRHash won't calculate the called function recursively. So ODRHash will be problematic if `a()` calls other inline functions. Probably we can solve this by a new hash mechanism. But the safety and efficiency may a problem too. Here we just combine the hash value of the used modules conservatively.

… helper function (llvm#81024) * This way the helper function could be re-used by indirect-call-promotion pass to find out the vtable for an indirect call and extract the value profiles if any. * The parent patch is llvm#80762

look through shuffle vectors

…7ff9 The test fails after dfa7ff9. I didn't find this locally due to cache.

Demonstrate that this isn't yet working right.

... even if the storage types are different.

…f.forall (llvm#90189) -- This commit adds a canonicalization pattern to fold away iter args of scf.forall if :- a. The corresponding tied result has no use. b. It is not being modified within the loop. Signed-off-by: Abhishek Varma <[email protected]>

… VarDecls (llvm#90948) With the commit d530894, we now preserve the initializer for invalid decls with the recovery-expr. However there is a chance that the original init expr is a typo-expr, we should not preserve it in the final AST, as typo-expr is an internal AST node. We should use the one after the typo correction. This is spotted by a clangd hover crash on the testcase.

Cycle is associated with construct-names and not labels. Change name of a few variables to reflect this. Also add appropriate comment to describe the else case of error checking.

Only VRs should use $noreg, this GPR was accidentally changed in d392520

…inate AllocOp and avoid Copy Add an option hoist-static-allocs to remove the unnecessary memref.alloc and memref.copy after this pass, when the memref in ReturnOp is allocated by memref.alloc and is statically shaped. Instead, it replaces the uses of the allocated memref with the memref in the out argument. By default, BufferResultsToOutParams will result in a memcpy operation to copy the originally returned memref to the output argument memref. This is inefficient when the source of memcpy (the returned memref in the original ReturnOp) is from a local AllocOp. The pass can use the output argument memref to replace the locally allocated memref for better performance. elim-alloc-copy avoids dynamic allocation and memory movement. This option will be critical for performance-sensivtive applications, which require BufferResultsToOutParams pass for a caller-owned output buffer calling convension.

…e exception specification of a function (llvm#90760) [temp.deduct.general] p6 states: > At certain points in the template argument deduction process it is necessary to take a function type that makes use of template parameters and replace those template parameters with the corresponding template arguments. This is done at the beginning of template argument deduction when any explicitly specified template arguments are substituted into the function type, and again at the end of template argument deduction when any template arguments that were deduced or obtained from default arguments are substituted. [temp.deduct.general] p7 goes on to say: > The _deduction substitution loci_ are > - the function type outside of the _noexcept-specifier_, > - the explicit-specifier, > - the template parameter declarations, and > - the template argument list of a partial specialization > > The substitution occurs in all types and expressions that are used in the deduction substitution loci. [...] Consider the following: ```cpp struct A { static constexpr bool x = true; }; template<typename T, typename U> void f(T, U) noexcept(T::x); // #1 template<typename T, typename U> void f(T, U*) noexcept(T::y); // #2 template<> void f<A>(A, int*) noexcept; // clang currently accepts, GCC and EDG reject ``` Currently, `Sema::SubstituteExplicitTemplateArguments` will substitute into the _noexcept-specifier_ when deducing template arguments from a function declaration or when deducing template arguments for taking the address of a function template (and the substitution is treated as a SFINAE context). In the above example, `#1` is selected as the primary template because substitution of the explicit template arguments into the _noexcept-specifier_ of `#2` failed, which resulted in the candidate being ignored. This behavior is incorrect ([temp.deduct.general] note 4 says as much), and this patch corrects it by deferring all substitution into the _noexcept-specifier_ until it is instantiated. As part of the necessary changes to make this patch work, the instantiation of the exception specification of a function template specialization when taking the address of a function template is changed to only occur for the function selected by overload resolution per [except.spec] p13.1 (as opposed to being instantiated for every candidate).

…ined member functions & member function templates (llvm#88963) Consider the following snippet from the discussion of CWG2847 on the core reflector: ``` template<typename T> concept C = sizeof(T) <= sizeof(long); template<typename T> struct A { template<typename U> void f(U) requires C<U>; // #1, declares a function template void g() requires C<T>; // #2, declares a function template<> void f(char); // #3, an explicit specialization of a function template that declares a function }; template<> template<typename U> void A<short>::f(U) requires C<U>; // #4, an explicit specialization of a function template that declares a function template template<> template<> void A<int>::f(int); // llvm#5, an explicit specialization of a function template that declares a function template<> void A<long>::g(); // llvm#6, an explicit specialization of a function that declares a function ``` A number of problems exist: - Clang rejects `#4` because the trailing _requires-clause_ has `U` substituted with the wrong template parameter depth when `Sema::AreConstraintExpressionsEqual` is called to determine whether it matches the trailing _requires-clause_ of the implicitly instantiated function template. - Clang rejects `llvm#5` because the function template specialization instantiated from `A<int>::f` has a trailing _requires-clause_, but `llvm#5` does not (nor can it have one as it isn't a templated function). - Clang rejects `llvm#6` for the same reasons it rejects `llvm#5`. This patch resolves these issues by making the following changes: - To fix `#4`, `Sema::AreConstraintExpressionsEqual` is passed `FunctionTemplateDecl`s when comparing the trailing _requires-clauses_ of `#4` and the function template instantiated from `#1`. - To fix `llvm#5` and `llvm#6`, the trailing _requires-clauses_ are not compared for explicit specializations that declare functions. In addition to these changes, `CheckMemberSpecialization` now considers constraint satisfaction/constraint partial ordering when determining which member function is specialized by an explicit specialization of a member function for an implicit instantiation of a class template (we previously would select the first function that has the same type as the explicit specialization). With constraints taken under consideration, we match EDG's behavior for these declarations.

...which caused issues like > ==42==ERROR: AddressSanitizer failed to deallocate 0x32 (50) bytes at address 0x117e0000 (error code: 28) > ==42==Cannot dump memory map on emscriptenAddressSanitizer: CHECK failed: sanitizer_common.cpp:81 "((0 && "unable to unmmap")) != (0)" (0x0, 0x0) (tid=288045824) > #0 0x14f73b0c in __asan::CheckUnwind()+0x14f73b0c (this.program+0x14f73b0c) > #1 0x14f8a3c2 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long)+0x14f8a3c2 (this.program+0x14f8a3c2) > #2 0x14f7d6e1 in __sanitizer::ReportMunmapFailureAndDie(void*, unsigned long, int, bool)+0x14f7d6e1 (this.program+0x14f7d6e1) > #3 0x14f81fbd in __sanitizer::UnmapOrDie(void*, unsigned long)+0x14f81fbd (this.program+0x14f81fbd) > #4 0x14f875df in __sanitizer::SuppressionContext::ParseFromFile(char const*)+0x14f875df (this.program+0x14f875df) > llvm#5 0x14f74eab in __asan::InitializeSuppressions()+0x14f74eab (this.program+0x14f74eab) > llvm#6 0x14f73a1a in __asan::AsanInitInternal()+0x14f73a1a (this.program+0x14f73a1a) when trying to use an ASan suppressions file under Emscripten: Even though it would be considered OK by SUSv4, the Emscripten runtime states "We don't support partial munmapping" (see <emscripten-core/emscripten@f4115eb> "Implement MAP_ANONYMOUS on top of malloc in STANDALONE_WASM mode (llvm#16289)"). Co-authored-by: Stephan Bergmann <[email protected]>

…ication as used during partial ordering (llvm#91534) We do not deduce template arguments from the exception specification when determining the primary template of a function template specialization or when taking the address of a function template. Therefore, this patch changes `isAtLeastAsSpecializedAs` such that we do not mark template parameters in the exception specification as 'used' during partial ordering (per [temp.deduct.partial] p12) to prevent the following from being ambiguous: ``` template<typename T, typename U> void f(U) noexcept(noexcept(T())); // #1 template<typename T> void f(T*) noexcept; // #2 template<> void f<int>(int*) noexcept; // currently ambiguous, selects #2 with this patch applied ``` Although there is no corresponding wording in the standard (see core issue filed here cplusplus/CWG#537), this seems to be the intended behavior given the definition of _deduction substitution loci_ in [temp.deduct.general] p7 (and EDG does the same thing).

…erSize (llvm#67657)" This reverts commit f0b3654. This commit triggers UB by reading an uninitialized variable. `UP.PartialThreshold` is used uninitialized in `getUnrollingPreferences()` when it is called from `LoopVectorizationPlanner::executePlan()`. In this case the `UP` variable is created on the stack and its fields are not initialized. ``` ==8802==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x557c0b081b99 in llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h #1 0x557c0b07a40c in llvm::TargetTransformInfo::Model<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:2277:17 #2 0x557c0f5d69ee in llvm::TargetTransformInfo::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) const llvm-project/llvm/lib/Analysis/TargetTransformInfo.cpp:387:19 #3 0x557c0e6b96a0 in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7624:7 #4 0x557c0e6e4b63 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10253:13 llvm#5 0x557c0e6f2429 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo*, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AssumptionCache&, llvm::LoopAccessInfoManager&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10344:30 llvm#6 0x557c0e6f2f97 in llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10383:9 [...] Uninitialized value was created by an allocation of 'UP' in the stack frame #0 0x557c0e6b961e in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7623:3 ```

…vm#90820) This solves some ambuguity introduced in P0522 regarding how template template parameters are partially ordered, and should reduce the negative impact of enabling `-frelaxed-template-template-args` by default. When performing template argument deduction, a template template parameter containing no packs should be more specialized than one that does. Given the following example: ```C++ template<class T2> struct A; template<template<class ...T3s> class TT1, class T4> struct A<TT1<T4>>; // #1 template<template<class T5 > class TT2, class T6> struct A<TT2<T6>>; // #2 template<class T1> struct B; template struct A<B<char>>; ``` Prior to P0522, candidate `#2` would be more specialized. After P0522, neither is more specialized, so this becomes ambiguous. With this change, `#2` becomes more specialized again, maintaining compatibility with pre-P0522 implementations. The problem is that in P0522, candidates are at least as specialized when matching packs to fixed-size lists both ways, whereas before, a fixed-size list is more specialized. This patch keeps the original behavior when checking template arguments outside deduction, but restores this aspect of pre-P0522 matching during deduction. --- Since this changes provisional implementation of CWG2398 which has not been released yet, and already contains a changelog entry, we don't provide a changelog entry here.

…llvm#92855) This solves some ambuguity introduced in P0522 regarding how template template parameters are partially ordered, and should reduce the negative impact of enabling `-frelaxed-template-template-args` by default. When performing template argument deduction, we extend the provisional wording introduced in llvm#89807 so it also covers deduction of class templates. Given the following example: ```C++ template <class T1, class T2 = float> struct A; template <class T3> struct B; template <template <class T4> class TT1, class T5> struct B<TT1<T5>>; // #1 template <class T6, class T7> struct B<A<T6, T7>>; // #2 template struct B<A<int>>; ``` Prior to P0522, `#2` was picked. Afterwards, this became ambiguous. This patch restores the pre-P0522 behavior, `#2` is picked again. This has the beneficial side effect of making the following code valid: ```C++ template<class T, class U> struct A {}; A<int, float> v; template<template<class> class TT> void f(TT<int>); // OK: TT picks 'float' as the default argument for the second parameter. void g() { f(v); } ``` --- Since this changes provisional implementation of CWG2398 which has not been released yet, and already contains a changelog entry, we don't provide a changelog entry here.

@chapuni

The problematic program is as follows: ```shell #define pre_a 0 #define PRE(x) pre_##x void f(void) { PRE(a) && 0; } int main(void) { return 0; } ``` in which after token concatenation (`##`), there's another nested macro `pre_a`. Currently only the outer expansion region will be produced. ([compiler explorer link](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:29,endLineNumber:8,positionColumn:29,positionLineNumber:8,selectionStartColumn:29,selectionStartLineNumber:8,startColumn:29,startLineNumber:8),source:'%23define+pre_a+0%0A%23define+PRE(x)+pre_%23%23x%0A%0Avoid+f(void)+%7B%0A++++PRE(a)+%26%26+0%3B%0A%7D%0A%0Aint+main(void)+%7B+return+0%3B+%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:51.69491525423727,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:cclang_assertions_trunk,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'1',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:2,lang:___c,libs:!(),options:'-fprofile-instr-generate+-fcoverage-mapping+-fcoverage-mcdc+-Xclang+-dump-coverage-mapping+',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+x86-64+clang+(assertions+trunk)+(Editor+%231)',t:'0')),k:34.5741843594503,l:'4',m:28.903654485049834,n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compilerName:'x86-64+clang+(trunk)',editorid:1,fontScale:14,fontUsePx:'0',j:2,wrap:'1'),l:'5',n:'0',o:'Output+of+x86-64+clang+(assertions+trunk)+(Compiler+%232)',t:'0')),header:(),l:'4',m:71.09634551495017,n:'0',o:'',s:0,t:'0')),k:48.30508474576271,l:'3',n:'0',o:'',t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4)) ```text f: File 0, 4:14 -> 6:2 = #0 Decision,File 0, 5:5 -> 5:16 = M:0, C:2 Expansion,File 0, 5:5 -> 5:8 = #0 (Expanded file = 1) File 0, 5:15 -> 5:16 = #1 Branch,File 0, 5:15 -> 5:16 = 0, 0 [2,0,0] File 1, 2:16 -> 2:23 = #0 File 2, 1:15 -> 1:16 = #0 File 2, 1:15 -> 1:16 = #0 Branch,File 2, 1:15 -> 1:16 = 0, 0 [1,2,0] ``` The inner expansion region isn't produced because: 1. In the range-based for loop quoted below, each sloc is processed and possibly emit a corresponding expansion region. 2. For our sloc in question, its direct parent returned by `getIncludeOrExpansionLoc()` is a `<scratch space>`, because that's how `##` is processed. https://github.com/llvm/llvm-project/blob/88b6186af3908c55b357858eb348b5143f21c289/clang/lib/CodeGen/CoverageMappingGen.cpp#L518-L520 3. This `<scratch space>` cannot be found in the FileID mapping so `ParentFileID` will be assigned an `std::nullopt` https://github.com/llvm/llvm-project/blob/88b6186af3908c55b357858eb348b5143f21c289/clang/lib/CodeGen/CoverageMappingGen.cpp#L521-L526 4. As a result this iteration of for loop finishes early and no expansion region is added for the sloc. This problem gets worse with MC/DC: as the example shows, there's a branch from File 2 but File 2 itself is missing. This will trigger assertion failures. The fix is more or less a workaround and takes a similar approach as llvm#89573. ~~Depends on llvm#89573.~~ This includes llvm#89573. Kudos to @chapuni! This and llvm#89573 together fix llvm#87000: I tested locally, both the reduced program and my original use case (fwiw, Linux kernel) can run successfully. --------- Co-authored-by: NAKAMURA Takumi <[email protected]>

…des (llvm#94453) LSR will generate chains of related instructions with a known increment between them. With SVE, in the case of the test case, this can include increments like 'vscale * 16 + 8'. The idea of this patch is if we have a '+8' increment already calculated in the chain, we can generate a (legal) '+ vscale*16' addressing mode from it, allowing us to use the '[x16, #1, mul vl]' addressing mode instructions. In order to do this we keep track of the known 'bases' when generating chains in GenerateIVChain, checking for each if the accumulated increment expression from the base neatly folds into a legal addressing mode. If they do not we fall back to the existing LeftOverExpr, whether it is legal or not. This is mostly orthogonal to llvm#88124, dealing with the generation of chains as opposed to rest of LSR. The existing vscale addressing mode work has greatly helped compared to the last time I looked at this, allowing us to check that the addressing modes are indeed legal.

…lvm#104148) `hasOperands` does not always execute matchers in the order they are written. This can cause issue in code using bindings when one operand matcher is relying on a binding set by the other. With this change, the first matcher present in the code is always executed first and any binding it sets are available to the second matcher. Simple example with current version (1 match) and new version (2 matches): ```bash > cat tmp.cpp int a = 13; int b = ((int) a) - a; int c = a - ((int) a); > clang-query tmp.cpp clang-query> set traversal IgnoreUnlessSpelledInSource clang-query> m binaryOperator(hasOperands(cStyleCastExpr(has(declRefExpr(hasDeclaration(valueDecl().bind("d"))))), declRefExpr(hasDeclaration(valueDecl(equalsBoundNode("d")))))) Match #1: tmp.cpp:1:1: note: "d" binds here int a = 13; ^~~~~~~~~~ tmp.cpp:2:9: note: "root" binds here int b = ((int)a) - a; ^~~~~~~~~~~~ 1 match. > ./build/bin/clang-query tmp.cpp clang-query> set traversal IgnoreUnlessSpelledInSource clang-query> m binaryOperator(hasOperands(cStyleCastExpr(has(declRefExpr(hasDeclaration(valueDecl().bind("d"))))), declRefExpr(hasDeclaration(valueDecl(equalsBoundNode("d")))))) Match #1: tmp.cpp:1:1: note: "d" binds here 1 | int a = 13; | ^~~~~~~~~~ tmp.cpp:2:9: note: "root" binds here 2 | int b = ((int)a) - a; | ^~~~~~~~~~~~ Match #2: tmp.cpp:1:1: note: "d" binds here 1 | int a = 13; | ^~~~~~~~~~ tmp.cpp:3:9: note: "root" binds here 3 | int c = a - ((int)a); | ^~~~~~~~~~~~ 2 matches. ``` If this should be documented or regression tested anywhere please let me know where.

…104523) Compilers and language runtimes often use helper functions that are fundamentally uninteresting when debugging anything but the compiler/runtime itself. This patch introduces a user-extensible mechanism that allows for these frames to be hidden from backtraces and automatically skipped over when navigating the stack with `up` and `down`. This does not affect the numbering of frames, so `f <N>` will still provide access to the hidden frames. The `bt` output will also print a hint that frames have been hidden. My primary motivation for this feature is to hide thunks in the Swift programming language, but I'm including an example recognizer for `std::function::operator()` that I wished for myself many times while debugging LLDB. rdar://126629381 Example output. (Yes, my proof-of-concept recognizer could hide even more frames if we had a method that returned the function name without the return type or I used something that isn't based off regex, but it's really only meant as an example). before: ``` (lldb) thread backtrace --filtered=false * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12 frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10 frame llvm#5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12 frame llvm#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame llvm#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame llvm#8: 0x0000000183cdf154 dyld`start + 2476 (lldb) ``` after ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame llvm#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame llvm#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame llvm#8: 0x0000000183cdf154 dyld`start + 2476 Note: Some frames were hidden by frame recognizers ```

`JITDylibSearchOrderResolver` local variable can be destroyed before completion of all callbacks. Capture it together with `Deps` in `OnEmitted` callback. Original error: ``` ==2035==ERROR: AddressSanitizer: stack-use-after-return on address 0x7bebfa155b70 at pc 0x7ff2a9a88b4a bp 0x7bec08d51980 sp 0x7bec08d51978 READ of size 8 at 0x7bebfa155b70 thread T87 (tf_xla-cpu-llvm) #0 0x7ff2a9a88b49 in operator() llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:58 #1 0x7ff2a9a88b49 in __invoke<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:149:25 #2 0x7ff2a9a88b49 in __call<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:224:5 #3 0x7ff2a9a88b49 in operator() libcxx/include/__functional/function.h:210:12 #4 0x7ff2a9a88b49 in void std::__u::__function::__policy_invoker<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, ```

Static destructor can race with calls to notify and trigger tsan warning. ``` WARNING: ThreadSanitizer: data race (pid=5787) Write of size 1 at 0x55bec9df8de8 by thread T23: #0 pthread_mutex_destroy [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1344](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=1344&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b12affb) (BuildId: ff25ace8b17d9863348bb1759c47246c) #1 __libcpp_recursive_mutex_destroy [third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h:91](third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h?l=91&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d4e9) (BuildId: ff25ace8b17d9863348bb1759c47246c) #2 std::__tsan::recursive_mutex::~recursive_mutex() [third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp:52](third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp?l=52&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d4e9) #3 ~SmartMutex [third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h:28](third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h?l=28&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaedfe) (BuildId: ff25ace8b17d9863348bb1759c47246c) #4 (anonymous namespace)::PerfJITEventListener::~PerfJITEventListener() [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:65](third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=65&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaedfe) llvm#5 cxa_at_exit_callback_installed_at(void*) [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:437](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=437&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b172cb9) (BuildId: ff25ace8b17d9863348bb1759c47246c) llvm#6 llvm::JITEventListener::createPerfJITEventListener() [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:496](third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=496&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcad8f5) (BuildId: ff25ace8b17d9863348bb1759c47246c) ``` ``` Previous atomic read of size 1 at 0x55bec9df8de8 by thread T192 (mutexes: write M0, write M1): #0 pthread_mutex_unlock [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1387](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=1387&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b12b6bb) (BuildId: ff25ace8b17d9863348bb1759c47246c) #1 __libcpp_recursive_mutex_unlock [third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h:87](third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h?l=87&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d589) (BuildId: ff25ace8b17d9863348bb1759c47246c) #2 std::__tsan::recursive_mutex::unlock() [third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp:64](third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp?l=64&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d589) #3 unlock [third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h:47](third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h?l=47&cl=669089572):16 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968) (BuildId: ff25ace8b17d9863348bb1759c47246c) #4 ~lock_guard [third_party/crosstool/v18/stable/src/libcxx/include/__mutex/lock_guard.h:39](third_party/crosstool/v18/stable/src/libcxx/include/__mutex/lock_guard.h?l=39&cl=669089572):101 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968) llvm#5 (anonymous namespace)::PerfJITEventListener::notifyObjectLoaded(unsigned long, llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo const&) [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:290](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=290&cl=669089572):1 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968) llvm#6 llvm::orc::RTDyldObjectLinkingLayer::onObjEmit(llvm::orc::MaterializationResponsibility&, llvm::object::OwningBinary<llvm::object::ObjectFile>, std::__tsan::unique_ptr<llvm::RuntimeDyld::MemoryManager, std::__tsan::default_delete<llvm::RuntimeDyld::MemoryManager>>, std::__tsan::unique_ptr<llvm::RuntimeDyld::LoadedObjectInfo, std::__tsan::default_delete<llvm::RuntimeDyld::LoadedObjectInfo>>, std::__tsan::unique_ptr<llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>, llvm::DenseMapInfo<llvm::orc::JITDylib*, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>>>, std::__tsan::default_delete<llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>, llvm::DenseMapInfo<llvm::orc::JITDylib*, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>>>>>, llvm::Error) [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:386](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp?l=386&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bc404a8) (BuildId: ff25ace8b17d9863348bb1759c47246c) ```

ciyongch reviewed Apr 18, 2024

View reviewed changes

Menooker force-pushed the fix_buffer_results_to_arg branch 2 times, most recently from 9170bed to 5a15f77 Compare April 29, 2024 02:45

Menooker pushed a commit that referenced this pull request Apr 29, 2024

[llvm-mca] Fix -skip-unsupported-instruction tests on Windows

a19a411

Builder alerted me to the failing test, attempt #1 in the blind.

thetruestblue and others added 24 commits May 3, 2024 06:47

[NFC][x86_64][Test Only] Disable for san coverage for lsan on all x86…

3fffe6c

…_64 for now (llvm#90750) Disabling this test on all x86_64 to unblock CI. rdar://125052424

[LV][NFC]Address last comments from llvm#88025.

6517c5b

[OpenACC] Implement no_create and present clauses on compute constructs

bd909d2

These two are, from a semantic checking perspective, identical to first-private/private/etc, other than appertainment. This patch implements both.

[RISCV] Support interleaved accesses for scalable vector. (llvm#90583)

3f1fef3

The support for interleaved accesses for scalable vector with a factor of 2 is enabled in vectorizer. Therefore, the patch removed the restriction for scalable vector with a factor of 2.

[AMDGPU] Remove unneeded calls to setInstrAndDebugLoc in matchers. NFC.

99ca408

[AMDGPU] Use replaceOpcodeWith instead of applyCombine_s_mul_u64. NFC.

1cde124

[VPlan] Remove unused VPWidenCanonicalIVRecipe::getScalarType (NFCI).

40cc96e

After a48ebb8, the function is no longer used. Remove it.

[GlobalISel] Use some standard matchinfo defs. NFC.

692e887

[RISCV] Use Sched*MC for Zvk MC instructions

d13f635

[RISCV][llvm-mca] Add vector crypto llvm-mca tests for P600

4821882

Revert "[NFC] Enable atomic tests on AIX"

a06c1fe

This reverts commit 02660e2. The tests do not pass on AIX, the buildkite precommit CI fails on these tests. For example, https://buildkite.com/llvm-project/libcxx-ci/builds/35184

[libc++][modules] Uses _LIBCPP_USING_IF_EXISTS. (llvm#90409)

6c4dedd

This attribute is used in the headers. Not using this in the modules has led to several issues. Add them to the modules to avoid these errors in other placed.

[mlir][linalg] Vectorize unpack op without masking (llvm#89067)

2755c69

Enables vectorization of unpack op in the case of unknown vector size. The vector sizes are determined by the result's shape.

[clangd] use existing functions for code locations in the scopify enu…

8d946c7

…m tweak (llvm#88737) Clangd already implements some utility functions for converting between `SourceLocation`s, `Position`s and `Offset`s into a buffer.

[lldb] Create a single Severity enum in lldb-enumerations (llvm#90917)

528f5ba

We have 3 different enums all expressing severity (info, warning, error). Remove all uses with a new Severity enum in lldb-enumerations.h.

[lld,test] Convert text files from CRLF to LF

55ad294

[RISCV] Add partial validation of Z extension name to RISCVISAInfo::p…

7a6847e

…arseNormalizedArchString (llvm#90895) If 'z' is given as the complete extension name or with a digit after it, it will crash in the extension map compare function. Check for these cases and give an error.

ChuanqiXu9 and others added 14 commits May 7, 2024 11:41

[NFC][OpenMP][OMPX] Move declare variant up

02ce822

[GlobalIsel] Combine extract vector element (llvm#90339)

b42f553

look through shuffle vectors

[NFC] Fix Modules/no-transitive-source-location-change.cppm after dfa…

ad9f38d

…7ff9 The test fails after dfa7ff9. I didn't find this locally due to cache.

[clang][Interp][NFC] Add eval-order test

05f4448

Demonstrate that this isn't yet working right.

[clang][Interp][NFC] Allow Pointer assignment if both are zero

5f2f390

... even if the storage types are different.

[Flang][OpenMP] NFC: Trivial changes in OmpCycleChecker (llvm#91024)

6ad37a4

Cycle is associated with construct-names and not labels. Change name of a few variables to reflect this. Also add appropriate comment to describe the else case of error checking.

[mlir][math] Add expand patterns for acosh, asinh, atanh (llvm#90718)

a62a702

[RISCV] Use IMPLICIT_DEF for undef GPR reg in vsetvli test. NFC

ebde770

Only VRs should use $noreg, this GPR was accidentally changed in d392520

[LoongArch] Optimize codegen for ISD::{ROTL,ROTR} (llvm#91174)

ad59967

Menooker force-pushed the fix_buffer_results_to_arg branch from 5a15f77 to acdbdbf Compare May 7, 2024 07:54

Menooker closed this May 8, 2024

Menooker deleted the fix_buffer_results_to_arg branch May 8, 2024 06:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLIR][Bufferization] BufferResultsToOutParams: Add an option to eliminate AllocOp and Copy #1

[MLIR][Bufferization] BufferResultsToOutParams: Add an option to eliminate AllocOp and Copy #1

Uh oh!

Menooker commented Apr 18, 2024 •

edited

Loading

Uh oh!

ciyongch left a comment

Uh oh!

ciyongch Apr 18, 2024

Uh oh!

Menooker Apr 19, 2024

Uh oh!

ZhennanQin Apr 19, 2024

Uh oh!

ZhennanQin Apr 19, 2024

Uh oh!

Menooker Apr 19, 2024

Uh oh!

ciyongch Apr 22, 2024

Uh oh!

ciyongch commented Apr 28, 2024

Uh oh!

Menooker commented Apr 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

[MLIR][Bufferization] BufferResultsToOutParams: Add an option to eliminate AllocOp and Copy #1

[MLIR][Bufferization] BufferResultsToOutParams: Add an option to eliminate AllocOp and Copy #1

Uh oh!

Conversation

Menooker commented Apr 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ciyongch left a comment

Choose a reason for hiding this comment

Uh oh!

ciyongch Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

Menooker Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

ZhennanQin Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

ZhennanQin Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

Menooker Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

ciyongch Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

ciyongch commented Apr 28, 2024

Uh oh!

Menooker commented Apr 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Menooker commented Apr 18, 2024 •

edited

Loading

Menooker commented Apr 28, 2024 •

edited

Loading