-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[Clang] Allow parsing arbitrary order of attributes for declarations #133107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang Author: Denis.G (DenisGZM) ChangesEnable parsing alignas attribute after GNU attributes, before ParseDeclaration This might be useful for cuda code where shared and other specificators may be mixed with align. I'd be glad to see if there are any better places or other technique to process this attribute without interrupting current flow of parsing. Full diff: https://github.com/llvm/llvm-project/pull/133107.diff 2 Files Affected:
diff --git a/clang/lib/Parse/ParseStmt.cpp b/clang/lib/Parse/ParseStmt.cpp
index 150b2879fc94f..33b9f63bcfa08 100644
--- a/clang/lib/Parse/ParseStmt.cpp
+++ b/clang/lib/Parse/ParseStmt.cpp
@@ -296,6 +296,11 @@ StmtResult Parser::ParseStatementOrDeclarationAfterAttributes(
goto Retry;
}
+ case tok::kw_alignas: {
+ ParseAlignmentSpecifier(CXX11Attrs);
+ goto Retry;
+ }
+
case tok::kw_template: {
SourceLocation DeclEnd;
ParseTemplateDeclarationOrSpecialization(DeclaratorContext::Block, DeclEnd,
diff --git a/clang/test/SemaCUDA/cuda-attr-order.cu b/clang/test/SemaCUDA/cuda-attr-order.cu
new file mode 100644
index 0000000000000..d3bf5b014d1c6
--- /dev/null
+++ b/clang/test/SemaCUDA/cuda-attr-order.cu
@@ -0,0 +1,15 @@
+// Verify that we can parse a simple CUDA file with different attributes order.
+// RUN: %clang_cc1 "-triple" "nvptx-nvidia-cuda" -fsyntax-only -verify %s
+// expected-no-diagnostics
+#include "Inputs/cuda.h"
+
+struct alignas(16) float4 {
+ float x, y, z, w;
+};
+
+__attribute__((device)) float func() {
+ __shared__ alignas(alignof(float4)) float As[4][4]; // Both combinations
+ alignas(alignof(float4)) __shared__ float Bs[4][4]; // must be legal
+
+ return As[0][0] + Bs[0][0];
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a glance this does seem like the right place to do this, but this is still missing a release note.
It seems like GCC allows e.g. __attribute__(()) alignas(16) int x
in any case, so I don’t see why we shouldn’t allow this too. Can you also add some tests that use __attribute__(())
directly and which aren’t CUDA-specific?
Oh, and can you add solmething like this as a test as well:
struct S { __attribute__((deprecated)) alignas(16) int x; };
CC @erichkeane in case there’s a specific reason I’m not aware of as to why we currently don’t allow this. |
Actually this test doesn't work with this patch... In this case all attributes are processed in In ParseDecl.cpp
And AttrsLastTime is always false in declarations of the form: Another approach i tried is to add processing alignas-cxx11 just like it is done for C: kw__Alignas and kw_alignas (c23). |
Hmm, @erichkeane probably knows where this needs to be parsed then; I might take another look at this myself later (because I’m not sure either off the top of my head), but I’m rather busy today unfortunately... |
…MemberDeclaration
I added parsing all attributes in ParseCXXClassMemberDeclaration before calling ParseDeclarationSpecifiers and it seems to solve problem, but it also changes annotation ranges for struct and class members |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me, but I’d still like @erichkeane to take a look at this as the attributes code owner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing of attributes is admittedly the part I'm least comfortable with here. I would love tests for how this interacts with our __declspec
spelling attributes though, and to help determine why we wouldn't parse all 3 together here.
As a followup/future direction for some one, there is perhaps value of a MaybeParseAnyAttributes
that does all 3 in a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks reasonable? I would like @AaronBallman to stop by though, he might think of some reason why this isn't right per-grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! The changes should come with a release note in clang/docs/ReleaseNotes.rst
so users know about the fix.
Reworked approach for parsing. We needed to support arbitrary attribute parsing rather than just |
✅ With the latest revision this PR passed the C/C++ code formatter. |
@@ -24,7 +24,7 @@ int templateFunction(T value) __attribute__((annotate("works"))); | |||
|
|||
// CHECK: ClassDecl=Test:3:7 (Definition) Extent=[3:1 - 17:2] | |||
// CHECK-NEXT: CXXAccessSpecifier=:4:1 (Definition) Extent=[4:1 - 4:8] | |||
// CHECK-NEXT: CXXMethod=aMethod:5:51 Extent=[5:3 - 5:60] | |||
// CHECK-NEXT: CXXMethod=aMethod:5:51 Extent=[5:46 - 5:60] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we went from pointing to the start of __attribute__
to pointing to the start of void
which is a bit unfortunate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should avoid it somehow? Or just accept it as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence. It's not the worst regression in behavior, but it does make the diagnostic slightly harder for users to reason about. WDYT @erichkeane ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its really quite unfortunate... I think it is at least worth seeing how much work needs to be done to get this 'right', and see if it is worth the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem here is to determine 'right' :)
Now I set annotated declaration range at the begining of the first parsed attribute and it might be not only DeclSpecAttr
.
Earlier we parsed only CXX attrs before ParseDeclarationSpecifiers
and then annotated range could only contain DeclSpecAttrs in it.
Examples:
class Test {
public:
__attribute__((annotate("spiffy_method"))) [[deprecated]] void aMethod(); // Error before, now: Extent=[5:3 - 5:75]
};
class Test {
public:
[[deprecated]] __attribute__((annotate("spiffy_method"))) void aMethod(); // Before: Extent=[5:18 - 5:75], now: Extent=[5:3 - 5:75]
};
Is it what expected to be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the underlying issue is that we're using in-band information about source ranges that's no longer true. We used to be able to rely on the source range because the order was more strict, but as we've relaxed it, you can now mix declaration and decl specifier attributes in more exotic ways.
However, addressing that may be quite involved. So I think we should probably accept this as-is; pointing to the start of the list is better than pointing to the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we think @AaronBallman ? I think the diagnostic column is as best effort as we are going to get, so I'm OK with this as-is. WE could perhaps improve that, but I don't think doing that here is worth the effort.
I'll approve, but I want to make sure Aaron has a chance to say otherwise before merging.
Yeah, I think I can live with this. I think not supporting the arbitrary order is more annoying to users than a slight degradation in source location reporting. |
Do you need us to land the changes on your behalf, btw? |
Yeah, why not |
I'd like to chime in for the issue with source locations and its implications. Seems like it has been brought up already along the thread. These might not be as crucial for diagnostic locations (or other clang purposes) when they're slightly off, but they actually result in big changes in most of the source tools, especially ones that rewrite source code. After this patch we now have a discrepancy in source ranges associated with member vs non-member declarations. It's also popping up in test cases updated in this patch, we're updating some golden tests but only for member declarations. I think getting rid of this discrepancy would prevent a lot of churn in each individual source tool; either by keeping source ranges as-is for member decls, or having a similar update for non-member declarations? That way source tools only need to special case attributes, and not |
The point @kadircet brings up about tooling is a good one, though. I'm not certain we need to revert the patch, but breaking a bunch of tools and making them cope with the source location change is pretty disruptive. I think we may need to consider refactoring source location handling for attributes more broadly to solve the underlying concerns. Thoughts @erichkeane? Related, this issue just was filed today: #140020 |
I wouldn't be opposed to SOME sort of improvement in attributes handling. We flatten the list of attributes unfortunately, so we couldn't add source location to each group in any way.... My one immediate thought is: What if we added ONE extra source-location to WDYT? I don't really have the time/ability to do so, but am definitely willing to review something like this. |
I think that's still going to be too clever for tooling because tools will do stuff like use AST matchers to say "does this declaration have this attribute? Cool, let's add a fix-it to remove that range of characters." and that's going to fail for them when the attribute is middle in a list of three. I think what we need is:
I think this ends up getting full fidelity for tools to use without having to store too many locations, but the problem with (2) and (3) is that we still need those helpers to be smart about inherited attributes. e.g.,
so those helpers really only make sense on a single declaration, not for inherited attributes. WDYT?
I'm in the same boat. |
For #2 above, we'd not be able to figure out But otherwise I think this is reasonable. |
I'd like to highlight that many tools still only care about declarations themselves and not the attributes. The ones that fiddle with attributes need to do it in a special and complicated way already. Making this less complicated definitely creates some value. But I think the immediate issue isn't about making that easy, but rather keeping most tools that don't care about attributes simple. Hence I think having consistency in source ranges associated with declarations to eliminate special cases in most tools is more beneficial (and possibly simpler). I'd probably lean towards keeping them out of the source ranges associated with decls, as that's the current state of the world. Unless we're deliberately going to include them in declarations source range's systematically, I think we're more likely to regress users down the line. |
Good to know.
We're currently inconsistent regarding attributes; sometimes we include the attributes in the range, sometimes we don't. But declaration source ranges are tough to reason about in general because there's many different moving parts. Consider:
If we include attributes in the declaration, shouldn't we also include pragmas which impact the declaration like above (this one adds an implicit attribute)? I think where we've traditionally fallen on this is that the source range for any given AST node covers all of the tokens used to produce that node. This means leading or trailing tokens are the awkward cases; are they used to produce the node or not? So currently, for So we could either improve the current approach of using the source range which encompasses the maximum contiguous sequence of tokens comprising the full definition. Or we could consider changing it so that the declaration source range is the minimum contiguous sequence of tokens to form a valid declaration. e.g., |
I totally agree. I was asking to change them deliberately and systematically for the same reason. Such changes, even if they make sense and look good in isolation, introduce more chaos into the system. As they change one set of random behavior with another and each of these changes need to be handled by source tooling consumers.
I'd rather not dive into the weeds here and make a general decision around whether we should refrain from such changes to semantics of source locations. I am happy to discuss how we can improve things systematically in a separate medium.
I am also leaning towards this direction, having all that information available at least gives the tools always something to build on top (rather than lacking information, which is a lot harder to recover). But I think we should be tackling that as a separate problem, if we want to reduce churn and regressions for source tools. E.g. if we decide that we're going to fix source ranges involving attribute tokens, we should try our best to cover all the places that parse/contain attributes. Instead of improving things here and there as we're making other changes. That's just death by thousand cuts, and most of the time these will be small enough regressions that no one prioritizes/fixes. |
+1
I think we need to understand what we want before we can make decisions on what needs changing, though. Are there invariants we want to introduce, like the source range for the AST node should encompass the source locations tracked within the AST node? Or are we fine with the AST node tracking source locations which exist outside of the source range for the node itself? How do we want users of the AST to understand what the source range represents?
I think there are separate problems here. We are missing source location information, definitely. For example, we do not track source locations for storage class specifiers. But that's orthogonal to answering the question of whether the source range for a
+1, that's why I'm trying to understand what the underlying goals are for the source range information itself. Historically, source location tracking has primarily been about diagnostic quality. We've been shifting that focus organically over the years to be about diagnostic quality as well as AST fidelity for tooling and now there's some tension from that shift. So if we're going to add some rigor, we need to figure out what makes sense. I am currently leaning towards the idea that the source range for an AST node should be the union of the ranges tracked by the node and its children. e.g., the declaration |
I think semantically connecting all the requirements of source tools to AST nodes is difficult.
I think this makes sense and sounds principled on paper, but achieving that isn't feasible without breaking users every time we take a step towards it.
I think that would be a breaking change for downstream consumers once again. People will rely on those changes not being part of the outer decls, and when we expand them, we'll regress those (e.g. tools that just rewrite types, will start dropping storage specifiers all of a sudden). I really don't know the answer here, and the more I think the harder it feels. Even if we didn't have this "incremental issues result in regressions for the users" issue, as you've also pointed out, I don't think modelling AST nodes as "properly nested" works for C++ :/ Ignoring macros, pragmas, comments; declarator syntax itself means type and name of a declaration might be nested, instead of being siblings. So I'd probably just keep the source range definition for composite definitions as-is today, claim them deprecated and just provide "self" locations for ast nodes, possibly making it multiple locations to model any discontinuities. e.g. a |
Agreed, but any changes to source location fidelity will break users unless it's exposing a new source location we didn't previously expose. Tooling gets ABI stability guarantees, it does not get behavioral stability guarantees; we still need to be able to evolve the compiler as languages and the state of the art move forward.
Same!
Yeah, that's a fair point.
I think this may be the most principled way forward, but there are some moving parts. This means we need to track a lot more source locations in the AST, which comes with overhead. We don't know the impacts of that increased overhead, some of it may be too painful for us to want to bear. Once we've started tracking enough source location information for an AST node that we no longer need to track the range, we can deprecate the internal range API (since we now have something else we can switch to). Once we've replaced all the range uses with the source location uses, we can get rid of the range API for that node. But this leaves the question about what to do for tooling. We have |
Enable parsing alignas attribute after GNU attributes, before ParseDeclaration
This might be useful for cuda code where shared and other specificators may be mixed with align.
I'd be glad to see if there are any better places or other technique to process this attribute without interrupting current flow of parsing.