Skip to content

Commit 6a9f79e

Browse files
committed
[pseudo] Eliminate the type-name identifier ambiguities in the grammar.
See https://reviews.llvm.org/D130626 for motivation. Identifier in the grammar has different categories (type-name, template-name, namespace-name), they requires semantic information to resolve. This patch is to eliminate the "local" ambiguities in type-name, and namespace-name, which gives us a performance boost of the parser: - eliminate all different type rules (class-name, enum-name, typedef-name), and fold them into a unified type-name, this removes the #1 type-name ambiguity, and gives us a big performance boost; - remove the namespace-alis rules, as they're hard and uninteresting; Note that we could eliminate more and gain more performance (like fold template-name, type-name, namespace together), but at current stage, we'd like keep all existing categories of the identifier (as they might assist in correlated disambiguation & keep the representation of important concepts uniform). | file |ambiguous nodes | forest size | glrParse performance | |SemaCodeComplete.cpp| 11k -> 5.7K | 10.4MB -> 7.9MB | 7.1MB/s -> 9.98MB/s | | AST.cpp | 1.3k -> 0.73K | 0.99MB -> 0.77MB | 6.7MB/s -> 8.4MB/s | Differential Revision: https://reviews.llvm.org/D130747
1 parent d7e06d5 commit 6a9f79e

File tree

2 files changed

+12
-18
lines changed

2 files changed

+12
-18
lines changed

clang-tools-extra/pseudo/lib/cxx/cxx.bnf

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,9 @@ _ := statement-seq
3434
_ := declaration-seq
3535

3636
# gram.key
37-
typedef-name := IDENTIFIER
38-
typedef-name := simple-template-id
37+
#! we don't distinguish between namespaces and namespace aliases, as it's hard
38+
#! and uninteresting.
3939
namespace-name := IDENTIFIER
40-
namespace-name := namespace-alias
41-
namespace-alias := IDENTIFIER
42-
class-name := IDENTIFIER
43-
class-name := simple-template-id
44-
enum-name := IDENTIFIER
4540
template-name := IDENTIFIER
4641

4742
# gram.basic
@@ -391,9 +386,12 @@ builtin-type := INT
391386
builtin-type := FLOAT
392387
builtin-type := DOUBLE
393388
builtin-type := VOID
394-
type-name := class-name
395-
type-name := enum-name
396-
type-name := typedef-name
389+
#! Unlike C++ standard grammar, we don't distinguish the underlying type (class,
390+
#! enum, typedef) of the IDENTIFIER, as these ambiguities are "local" and don't
391+
#! affect the final parse tree. Eliminating them gives a significant performance
392+
#! boost to the parser.
393+
type-name := IDENTIFIER
394+
type-name := simple-template-id
397395
elaborated-type-specifier := class-key nested-name-specifier_opt IDENTIFIER
398396
elaborated-type-specifier := class-key simple-template-id
399397
elaborated-type-specifier := class-key nested-name-specifier TEMPLATE_opt simple-template-id
@@ -551,7 +549,7 @@ private-module-fragment := module-keyword : PRIVATE ; declaration-seq_opt
551549
class-specifier := class-head { member-specification_opt [recover=Brackets] }
552550
class-head := class-key class-head-name class-virt-specifier_opt base-clause_opt
553551
class-head := class-key base-clause_opt
554-
class-head-name := nested-name-specifier_opt class-name
552+
class-head-name := nested-name-specifier_opt type-name
555553
class-virt-specifier := contextual-final
556554
class-key := CLASS
557555
class-key := STRUCT

clang-tools-extra/pseudo/test/glr.cpp

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,23 +12,19 @@ void foo() {
1212
// CHECK-NEXT: │ └─; := tok[8]
1313
// CHECK-NEXT: └─statement~simple-declaration := decl-specifier-seq init-declarator-list ;
1414
// CHECK-NEXT: ├─decl-specifier-seq~simple-type-specifier := <ambiguous>
15-
// CHECK-NEXT: │ ├─simple-type-specifier~type-name := <ambiguous>
16-
// CHECK-NEXT: │ │ ├─type-name~IDENTIFIER := tok[5]
17-
// CHECK-NEXT: │ │ ├─type-name~IDENTIFIER := tok[5]
18-
// CHECK-NEXT: │ │ └─type-name~IDENTIFIER := tok[5]
15+
// CHECK-NEXT: │ ├─simple-type-specifier~IDENTIFIER := tok[5]
1916
// CHECK-NEXT: │ └─simple-type-specifier~IDENTIFIER := tok[5]
2017
// CHECK-NEXT: ├─init-declarator-list~ptr-declarator := ptr-operator ptr-declarator
2118
// CHECK-NEXT: │ ├─ptr-operator~* := tok[6]
2219
// CHECK-NEXT: │ └─ptr-declarator~id-expression =#1
2320
// CHECK-NEXT: └─; := tok[8]
2421
}
2522

26-
// CHECK: 3 Ambiguous nodes:
23+
// CHECK: 2 Ambiguous nodes:
2724
// CHECK-NEXT: 1 simple-type-specifier
2825
// CHECK-NEXT: 1 statement
29-
// CHECK-NEXT: 1 type-name
3026
// CHECK-EMPTY:
3127
// CHECK-NEXT: 0 Opaque nodes:
3228
// CHECK-EMPTY:
33-
// CHECK-NEXT: Ambiguity: 0.40 misparses/token
29+
// CHECK-NEXT: Ambiguity: 0.20 misparses/token
3430
// CHECK-NEXT: Unparsed: 0.00%

0 commit comments

Comments
 (0)