Skip to content

[Clang] Allow parsing arbitrary order of attributes for declarations #133107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 9, 2025

Conversation

DenisGZM
Copy link
Contributor

Enable parsing alignas attribute after GNU attributes, before ParseDeclaration

This might be useful for cuda code where shared and other specificators may be mixed with align.

I'd be glad to see if there are any better places or other technique to process this attribute without interrupting current flow of parsing.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Mar 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2025

@llvm/pr-subscribers-clang

Author: Denis.G (DenisGZM)

Changes

Enable parsing alignas attribute after GNU attributes, before ParseDeclaration

This might be useful for cuda code where shared and other specificators may be mixed with align.

I'd be glad to see if there are any better places or other technique to process this attribute without interrupting current flow of parsing.


Full diff: https://github.com/llvm/llvm-project/pull/133107.diff

2 Files Affected:

  • (modified) clang/lib/Parse/ParseStmt.cpp (+5)
  • (added) clang/test/SemaCUDA/cuda-attr-order.cu (+15)
diff --git a/clang/lib/Parse/ParseStmt.cpp b/clang/lib/Parse/ParseStmt.cpp
index 150b2879fc94f..33b9f63bcfa08 100644
--- a/clang/lib/Parse/ParseStmt.cpp
+++ b/clang/lib/Parse/ParseStmt.cpp
@@ -296,6 +296,11 @@ StmtResult Parser::ParseStatementOrDeclarationAfterAttributes(
     goto Retry;
   }
 
+  case tok::kw_alignas: {
+    ParseAlignmentSpecifier(CXX11Attrs);
+    goto Retry;
+  }
+
   case tok::kw_template: {
     SourceLocation DeclEnd;
     ParseTemplateDeclarationOrSpecialization(DeclaratorContext::Block, DeclEnd,
diff --git a/clang/test/SemaCUDA/cuda-attr-order.cu b/clang/test/SemaCUDA/cuda-attr-order.cu
new file mode 100644
index 0000000000000..d3bf5b014d1c6
--- /dev/null
+++ b/clang/test/SemaCUDA/cuda-attr-order.cu
@@ -0,0 +1,15 @@
+// Verify that we can parse a simple CUDA file with different attributes order.
+// RUN: %clang_cc1 "-triple" "nvptx-nvidia-cuda"  -fsyntax-only -verify %s
+// expected-no-diagnostics
+#include "Inputs/cuda.h"
+
+struct alignas(16) float4 {
+    float x, y, z, w;
+};
+
+__attribute__((device)) float func() {
+    __shared__ alignas(alignof(float4)) float As[4][4];  // Both combinations
+    alignas(alignof(float4)) __shared__  float Bs[4][4]; // must be legal
+
+    return As[0][0] + Bs[0][0];
+}

Copy link
Member

@Sirraide Sirraide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance this does seem like the right place to do this, but this is still missing a release note.

It seems like GCC allows e.g. __attribute__(()) alignas(16) int x in any case, so I don’t see why we shouldn’t allow this too. Can you also add some tests that use __attribute__(()) directly and which aren’t CUDA-specific?

Oh, and can you add solmething like this as a test as well:

struct S { __attribute__((deprecated)) alignas(16) int x; };

@Sirraide
Copy link
Member

CC @erichkeane in case there’s a specific reason I’m not aware of as to why we currently don’t allow this.

@DenisGZM
Copy link
Contributor Author

DenisGZM commented Mar 28, 2025

At a glance this does seem like the right place to do this, but this is still missing a release note.

It seems like GCC allows e.g. __attribute__(()) alignas(16) int x in any case, so I don’t see why we shouldn’t allow this too. Can you also add some tests that use __attribute__(()) directly and which aren’t CUDA-specific?

Oh, and can you add solmething like this as a test as well:

struct S { __attribute__((deprecated)) alignas(16) int x; };

Actually this test doesn't work with this patch...

In this case all attributes are processed in ParseDeclarationSpecifiers, which in my first view was the right place to fix, but has way more complicated logic and easy to break diagnostics.

In ParseDeclarationSpecifiers we parse kw__attributes and other CXX11 Attributes and set bool AttrsLastTime = true to check that last parsed piece was attr. Later this block prohibit attributes with AttrsLastTime = false,

ParseDecl.cpp

    DoneWithDeclSpec:
      if (!AttrsLastTime)
        ProhibitAttributes(attrs);

And AttrsLastTime is always false in declarations of the form: <attributes> <type> <identifier> , because last token we parse is type

Another approach i tried is to add processing alignas-cxx11 just like it is done for C: kw__Alignas and kw_alignas (c23).
Well, it do the parsing but later it skips CXX11 attributes when correcting declaration type (assumed that attributes must have been processed before)

@Sirraide
Copy link
Member

Hmm, @erichkeane probably knows where this needs to be parsed then; I might take another look at this myself later (because I’m not sure either off the top of my head), but I’m rather busy today unfortunately...

@DenisGZM
Copy link
Contributor Author

I added parsing all attributes in ParseCXXClassMemberDeclaration before calling ParseDeclarationSpecifiers and it seems to solve problem, but it also changes annotation ranges for struct and class members

Copy link
Member

@Sirraide Sirraide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me, but I’d still like @erichkeane to take a look at this as the attributes code owner

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing of attributes is admittedly the part I'm least comfortable with here. I would love tests for how this interacts with our __declspec spelling attributes though, and to help determine why we wouldn't parse all 3 together here.

As a followup/future direction for some one, there is perhaps value of a MaybeParseAnyAttributes that does all 3 in a loop.

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks reasonable? I would like @AaronBallman to stop by though, he might think of some reason why this isn't right per-grammar.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! The changes should come with a release note in clang/docs/ReleaseNotes.rst so users know about the fix.

@DenisGZM
Copy link
Contributor Author

DenisGZM commented Apr 8, 2025

Reworked approach for parsing.

We needed to support arbitrary attribute parsing rather than just alignas for CXX. So please check new commit

@DenisGZM DenisGZM requested a review from erichkeane April 8, 2025 18:40
Copy link

github-actions bot commented Apr 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -24,7 +24,7 @@ int templateFunction(T value) __attribute__((annotate("works")));

// CHECK: ClassDecl=Test:3:7 (Definition) Extent=[3:1 - 17:2]
// CHECK-NEXT: CXXAccessSpecifier=:4:1 (Definition) Extent=[4:1 - 4:8]
// CHECK-NEXT: CXXMethod=aMethod:5:51 Extent=[5:3 - 5:60]
// CHECK-NEXT: CXXMethod=aMethod:5:51 Extent=[5:46 - 5:60]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we went from pointing to the start of __attribute__ to pointing to the start of void which is a bit unfortunate.

Copy link
Contributor Author

@DenisGZM DenisGZM Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should avoid it somehow? Or just accept it as is?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence. It's not the worst regression in behavior, but it does make the diagnostic slightly harder for users to reason about. WDYT @erichkeane ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its really quite unfortunate... I think it is at least worth seeing how much work needs to be done to get this 'right', and see if it is worth the effort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem here is to determine 'right' :)
Now I set annotated declaration range at the begining of the first parsed attribute and it might be not only DeclSpecAttr.
Earlier we parsed only CXX attrs before ParseDeclarationSpecifiers and then annotated range could only contain DeclSpecAttrs in it.

Examples:

class Test {
public:
  __attribute__((annotate("spiffy_method"))) [[deprecated]] void aMethod();  // Error before, now: Extent=[5:3 - 5:75]
};

class Test {
public:
  [[deprecated]] __attribute__((annotate("spiffy_method")))  void aMethod(); // Before: Extent=[5:18 - 5:75], now: Extent=[5:3 - 5:75]
};

Is it what expected to be done?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the underlying issue is that we're using in-band information about source ranges that's no longer true. We used to be able to rely on the source range because the order was more strict, but as we've relaxed it, you can now mix declaration and decl specifier attributes in more exotic ways.

However, addressing that may be quite involved. So I think we should probably accept this as-is; pointing to the start of the list is better than pointing to the type.

@DenisGZM DenisGZM changed the title [CLANG] Enable alignas after GNU attributes [CLANG] Allow parsing arbitrary order of attributes for declarations Apr 26, 2025
Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we think @AaronBallman ? I think the diagnostic column is as best effort as we are going to get, so I'm OK with this as-is. WE could perhaps improve that, but I don't think doing that here is worth the effort.

I'll approve, but I want to make sure Aaron has a chance to say otherwise before merging.

@AaronBallman
Copy link
Collaborator

What do we think @AaronBallman ? I think the diagnostic column is as best effort as we are going to get, so I'm OK with this as-is. WE could perhaps improve that, but I don't think doing that here is worth the effort.

I'll approve, but I want to make sure Aaron has a chance to say otherwise before merging.

Yeah, I think I can live with this. I think not supporting the arbitrary order is more annoying to users than a slight degradation in source location reporting.

@AaronBallman
Copy link
Collaborator

Do you need us to land the changes on your behalf, btw?

@DenisGZM
Copy link
Contributor Author

DenisGZM commented May 9, 2025

Do you need us to land the changes on your behalf, btw?

Yeah, why not

@cor3ntin cor3ntin changed the title [CLANG] Allow parsing arbitrary order of attributes for declarations [Clang] Allow parsing arbitrary order of attributes for declarations May 9, 2025
@AaronBallman AaronBallman merged commit b3a6d43 into llvm:main May 9, 2025
7 of 10 checks passed
@kadircet
Copy link
Member

kadircet commented May 15, 2025

I'd like to chime in for the issue with source locations and its implications. Seems like it has been brought up already along the thread.

These might not be as crucial for diagnostic locations (or other clang purposes) when they're slightly off, but they actually result in big changes in most of the source tools, especially ones that rewrite source code.
Many tools already operate with the assumption of source locations provided by AST being a best-effort service, but subtle edge cases like this PR makes tools really complicated and also regress without anyone noticing, as these edge cases are usually not tested.

After this patch we now have a discrepancy in source ranges associated with member vs non-member declarations. It's also popping up in test cases updated in this patch, we're updating some golden tests but only for member declarations. I think getting rid of this discrepancy would prevent a lot of churn in each individual source tool; either by keeping source ranges as-is for member decls, or having a similar update for non-member declarations? That way source tools only need to special case attributes, and not FieldDecls on top of that.

@AaronBallman
Copy link
Collaborator

AaronBallman commented May 15, 2025

Yeah, I think I can live with this. I think not supporting the arbitrary order is more annoying to users than a slight degradation in source location reporting.

The point @kadircet brings up about tooling is a good one, though. I'm not certain we need to revert the patch, but breaking a bunch of tools and making them cope with the source location change is pretty disruptive. I think we may need to consider refactoring source location handling for attributes more broadly to solve the underlying concerns. Thoughts @erichkeane?

Related, this issue just was filed today: #140020

@erichkeane
Copy link
Collaborator

Yeah, I think I can live with this. I think not supporting the arbitrary order is more annoying to users than a slight degradation in source location reporting.

The point @kadircet brings up about tooling is a good one, though. I'm not certain we need to revert the patch, but breaking a bunch of tools and making them cope with the source location change is pretty disruptive. I think we may need to consider refactoring source location handling for attributes more broadly to solve the underlying concerns. Thoughts @erichkeane?

Related, this issue just was filed today: #140020

I wouldn't be opposed to SOME sort of improvement in attributes handling. We flatten the list of attributes unfortunately, so we couldn't add source location to each group in any way....

My one immediate thought is: What if we added ONE extra source-location to AttrCommonInfo? The FIRST in each group gets the 'begin' location, and the 'LAST' in each group gets the 'end' location. Everyone else gets an empty source location. We could make it private, then just accessible via functions from Decl. Since we maintain order of attributes, we'd get them reasonably well.

WDYT?

I don't really have the time/ability to do so, but am definitely willing to review something like this.

@AaronBallman
Copy link
Collaborator

Yeah, I think I can live with this. I think not supporting the arbitrary order is more annoying to users than a slight degradation in source location reporting.

The point @kadircet brings up about tooling is a good one, though. I'm not certain we need to revert the patch, but breaking a bunch of tools and making them cope with the source location change is pretty disruptive. I think we may need to consider refactoring source location handling for attributes more broadly to solve the underlying concerns. Thoughts @erichkeane?
Related, this issue just was filed today: #140020

I wouldn't be opposed to SOME sort of improvement in attributes handling. We flatten the list of attributes unfortunately, so we couldn't add source location to each group in any way....

My one immediate thought is: What if we added ONE extra source-location to AttrCommonInfo? The FIRST in each group gets the 'begin' location, and the 'LAST' in each group gets the 'end' location. Everyone else gets an empty source location. We could make it private, then just accessible via functions from Decl. Since we maintain order of attributes, we'd get them reasonably well.

WDYT?

I think that's still going to be too clever for tooling because tools will do stuff like use AST matchers to say "does this declaration have this attribute? Cool, let's add a fix-it to remove that range of characters." and that's going to fail for them when the attribute is middle in a list of three.

I think what we need is:

  1. Each attribute stores the source range for that one attribute. The beginning of the range is the attribute token itself. The end of the range is either closed when there's no argument list or it's the closing paren for the attribute argument list.
  2. We introduce a helper method which gets you the full source range of all attributes within one syntactic group for a declaration. e.g., [[foo, bar(12), baz]]; each attribute tracks its source information, and the helper function gets you the range starting from [ and ending with ].
  3. We introduce another helper method with gets you the full source range of all attributes within all syntactic groups for a declaration. e.g., [[foo, bar]] __attribute__((baz(12))) __declspec(no_bad); each attribute tracks its source information, we use the helper from (2) to get the range for each syntax group, and we extend the returned range to cover those ranges. In this example, the start of the range is [ and the end of the range is ) from the __declspec.

I think this ends up getting full fidelity for tools to use without having to store too many locations, but the problem with (2) and (3) is that we still need those helpers to be smart about inherited attributes. e.g.,

__attribute__((foo)) void func();
[[bar]] void func(); // Inherits __attribute__((foo))

so those helpers really only make sense on a single declaration, not for inherited attributes.

WDYT?

I don't really have the time/ability to do so, but am definitely willing to review something like this.

I'm in the same boat.

@erichkeane
Copy link
Collaborator

For #2 above, we'd not be able to figure out [[foo, bar]] from [[foo]][[bar]], but maybe thats OK? It gets WEIRD with using-namespaces though.

But otherwise I think this is reasonable.

@kadircet
Copy link
Member

I'd like to highlight that many tools still only care about declarations themselves and not the attributes.

The ones that fiddle with attributes need to do it in a special and complicated way already. Making this less complicated definitely creates some value. But I think the immediate issue isn't about making that easy, but rather keeping most tools that don't care about attributes simple.

Hence I think having consistency in source ranges associated with declarations to eliminate special cases in most tools is more beneficial (and possibly simpler). I'd probably lean towards keeping them out of the source ranges associated with decls, as that's the current state of the world. Unless we're deliberately going to include them in declarations source range's systematically, I think we're more likely to regress users down the line.

@AaronBallman
Copy link
Collaborator

I'd like to highlight that many tools still only care about declarations themselves and not the attributes.

Good to know.

The ones that fiddle with attributes need to do it in a special and complicated way already. Making this less complicated definitely creates some value. But I think the immediate issue isn't about making that easy, but rather keeping most tools that don't care about attributes simple.

Hence I think having consistency in source ranges associated with declarations to eliminate special cases in most tools is more beneficial (and possibly simpler). I'd probably lean towards keeping them out of the source ranges associated with decls, as that's the current state of the world. Unless we're deliberately going to include them in declarations source range's systematically, I think we're more likely to regress users down the line.

We're currently inconsistent regarding attributes; sometimes we include the attributes in the range, sometimes we don't. But declaration source ranges are tough to reason about in general because there's many different moving parts. Consider:

#pragma clang section bss = "test"
static int i = 12;

If we include attributes in the declaration, shouldn't we also include pragmas which impact the declaration like above (this one adds an implicit attribute)?

I think where we've traditionally fallen on this is that the source range for any given AST node covers all of the tokens used to produce that node. This means leading or trailing tokens are the awkward cases; are they used to produce the node or not?

So currently, for static int i = 12, j = 100 /*NOLINT*/;, the source range for the VarDecl for j starts at static and ends at 100. Perhaps surprising, but understandable. But what about: /* NOLINT */ [[]] static int i = 12, j = 100 /* NOLINT */;? Should that include the NOLINT comments? Tools often associate comments with declarations for special handling, so I think there are arguments either way. But what about the empty attribute specifier? Is that salient to the declaration? In the earlier example, should the pragma be included because it impacts the declaration? What about a #pragma clang attribute slapping attributes onto the declaration? Then there are problems like unknown attributes; should the source range for [[unknown::unknown]] int i; include the attribute? And so on.

So we could either improve the current approach of using the source range which encompasses the maximum contiguous sequence of tokens comprising the full definition. Or we could consider changing it so that the declaration source range is the minimum contiguous sequence of tokens to form a valid declaration. e.g., static int i = 100; could have the range cover int i and static auto i = 100; could have the range cover auto i = 100. But I suspect the least amount of churn for tools is to continue to go with the maximal range and improve what we track at the edges. But I also suspect that will always be best-effort and tools are going to have to handle those on a case-by-case basis sometimes.

@kadircet
Copy link
Member

We're currently inconsistent regarding attributes; sometimes we include the attributes in the range, sometimes we don't. But declaration source ranges are tough to reason about in general because there's many different moving parts.

I totally agree. I was asking to change them deliberately and systematically for the same reason. Such changes, even if they make sense and look good in isolation, introduce more chaos into the system. As they change one set of random behavior with another and each of these changes need to be handled by source tooling consumers.

Consider: #pragma clang section bss = "test" .. static int i = 12;

I'd rather not dive into the weeds here and make a general decision around whether we should refrain from such changes to semantics of source locations. I am happy to discuss how we can improve things systematically in a separate medium.

I suspect the least amount of churn for tools is to continue to go with the maximal range and improve what we track at the edges. But I also suspect that will always be best-effort and tools are going to have to handle those on a case-by-case basis sometimes.

I am also leaning towards this direction, having all that information available at least gives the tools always something to build on top (rather than lacking information, which is a lot harder to recover).

But I think we should be tackling that as a separate problem, if we want to reduce churn and regressions for source tools. E.g. if we decide that we're going to fix source ranges involving attribute tokens, we should try our best to cover all the places that parse/contain attributes. Instead of improving things here and there as we're making other changes. That's just death by thousand cuts, and most of the time these will be small enough regressions that no one prioritizes/fixes.

@AaronBallman
Copy link
Collaborator

We're currently inconsistent regarding attributes; sometimes we include the attributes in the range, sometimes we don't. But declaration source ranges are tough to reason about in general because there's many different moving parts.

I totally agree. I was asking to change them deliberately and systematically for the same reason. Such changes, even if they make sense and look good in isolation, introduce more chaos into the system. As they change one set of random behavior with another and each of these changes need to be handled by source tooling consumers.

+1

Consider: #pragma clang section bss = "test" .. static int i = 12;

I'd rather not dive into the weeds here and make a general decision around whether we should refrain from such changes to semantics of source locations. I am happy to discuss how we can improve things systematically in a separate medium.

I think we need to understand what we want before we can make decisions on what needs changing, though. Are there invariants we want to introduce, like the source range for the AST node should encompass the source locations tracked within the AST node? Or are we fine with the AST node tracking source locations which exist outside of the source range for the node itself? How do we want users of the AST to understand what the source range represents?

I suspect the least amount of churn for tools is to continue to go with the maximal range and improve what we track at the edges. But I also suspect that will always be best-effort and tools are going to have to handle those on a case-by-case basis sometimes.

I am also leaning towards this direction, having all that information available at least gives the tools always something to build on top (rather than lacking information, which is a lot harder to recover).

I think there are separate problems here. We are missing source location information, definitely. For example, we do not track source locations for storage class specifiers. But that's orthogonal to answering the question of whether the source range for a VarDecl node should include or exclude storage class specifiers.

But I think we should be tackling that as a separate problem, if we want to reduce churn and regressions for source tools. E.g. if we decide that we're going to fix source ranges involving attribute tokens, we should try our best to cover all the places that parse/contain attributes. Instead of improving things here and there as we're making other changes. That's just death by thousand cuts, and most of the time these will be small enough regressions that no one prioritizes/fixes.

+1, that's why I'm trying to understand what the underlying goals are for the source range information itself. Historically, source location tracking has primarily been about diagnostic quality. We've been shifting that focus organically over the years to be about diagnostic quality as well as AST fidelity for tooling and now there's some tension from that shift. So if we're going to add some rigor, we need to figure out what makes sense.

I am currently leaning towards the idea that the source range for an AST node should be the union of the ranges tracked by the node and its children. e.g., the declaration static constexpr int i = 100; would have the source range covering int i = 100 but not static, constexpr, or ;. Someday, if we start tracking source locations for the storage class specifiers, the range may expand to include those. So we can mechanically build up the source range by using the tracked information from the AST rather than manually deciding the range when parsing and storing it in the AST. But then: macros make things discontiguous, so maybe that's not a good model?

@kadircet
Copy link
Member

I think we need to understand what we want before we can make decisions on what needs changing, though. Are there invariants we want to introduce, like the source range for the AST node should encompass the source locations tracked within the AST node? Or are we fine with the AST node tracking source locations which exist outside of the source range for the node itself? How do we want users of the AST to understand what the source range represents?

I think semantically connecting all the requirements of source tools to AST nodes is difficult.

I am currently leaning towards the idea that the source range for an AST node should be the union of the ranges tracked by the node and its children.

I think this makes sense and sounds principled on paper, but achieving that isn't feasible without breaking users every time we take a step towards it.

Someday, if we start tracking source locations for the storage class specifiers, the range may expand to include those.

I think that would be a breaking change for downstream consumers once again. People will rely on those changes not being part of the outer decls, and when we expand them, we'll regress those (e.g. tools that just rewrite types, will start dropping storage specifiers all of a sudden).

I really don't know the answer here, and the more I think the harder it feels. Even if we didn't have this "incremental issues result in regressions for the users" issue, as you've also pointed out, I don't think modelling AST nodes as "properly nested" works for C++ :/ Ignoring macros, pragmas, comments; declarator syntax itself means type and name of a declaration might be nested, instead of being siblings.

So I'd probably just keep the source range definition for composite definitions as-is today, claim them deprecated and just provide "self" locations for ast nodes, possibly making it multiple locations to model any discontinuities. e.g. a VarDecl would only point to its name. you'd need to drill down into specific parts of it if your tool is interested in the rest. This sounds like a much simpler contract, and can be incrementally implemented as well. Moreover it will be stable, no matter what changes we make to AST, as long as an AST node is there, source location associated with it will stay there.

@AaronBallman
Copy link
Collaborator

I think we need to understand what we want before we can make decisions on what needs changing, though. Are there invariants we want to introduce, like the source range for the AST node should encompass the source locations tracked within the AST node? Or are we fine with the AST node tracking source locations which exist outside of the source range for the node itself? How do we want users of the AST to understand what the source range represents?

I think semantically connecting all the requirements of source tools to AST nodes is difficult.

I am currently leaning towards the idea that the source range for an AST node should be the union of the ranges tracked by the node and its children.

I think this makes sense and sounds principled on paper, but achieving that isn't feasible without breaking users every time we take a step towards it.

Agreed, but any changes to source location fidelity will break users unless it's exposing a new source location we didn't previously expose. Tooling gets ABI stability guarantees, it does not get behavioral stability guarantees; we still need to be able to evolve the compiler as languages and the state of the art move forward.

Someday, if we start tracking source locations for the storage class specifiers, the range may expand to include those.

I think that would be a breaking change for downstream consumers once again. People will rely on those changes not being part of the outer decls, and when we expand them, we'll regress those (e.g. tools that just rewrite types, will start dropping storage specifiers all of a sudden).

I really don't know the answer here, and the more I think the harder it feels.

Same!

Even if we didn't have this "incremental issues result in regressions for the users" issue, as you've also pointed out, I don't think modelling AST nodes as "properly nested" works for C++ :/ Ignoring macros, pragmas, comments; declarator syntax itself means type and name of a declaration might be nested, instead of being siblings.

Yeah, that's a fair point.

So I'd probably just keep the source range definition for composite definitions as-is today, claim them deprecated and just provide "self" locations for ast nodes, possibly making it multiple locations to model any discontinuities. e.g. a VarDecl would only point to its name. you'd need to drill down into specific parts of it if your tool is interested in the rest. This sounds like a much simpler contract, and can be incrementally implemented as well. Moreover it will be stable, no matter what changes we make to AST, as long as an AST node is there, source location associated with it will stay there.

I think this may be the most principled way forward, but there are some moving parts.

This means we need to track a lot more source locations in the AST, which comes with overhead. We don't know the impacts of that increased overhead, some of it may be too painful for us to want to bear.

Once we've started tracking enough source location information for an AST node that we no longer need to track the range, we can deprecate the internal range API (since we now have something else we can switch to).

Once we've replaced all the range uses with the source location uses, we can get rid of the range API for that node.

But this leaves the question about what to do for tooling. We have clang_getCursorExtent() (and others) as exposed APIs. We promise people ABI stability, so deprecating an API in libclang is kind of an academic exercise because we can't remove the interface. So do we mark it as deprecated and update the comments to explain it no longer returns a range, just a single location? Or do we try to put range logic into libclang to try to keep some of the cursors limping along with the previous behavior? Something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants