Skip to content

Commit 8eb3470

Browse files
committed
[SpecialCaseList] Add option to use Globs instead of Regex to match patterns
Add an option in `SpecialCaseList` to use Globs instead of Regex to match patterns. `GlobPattern` was extended in https://reviews.llvm.org/D153587 to support brace expansions which allows us to use patterns like `*/src/foo.{c,cpp}`. It turns out that most patterns only take advantage of `*` so using Regex was overkill and required lots of escaping in practice. This often led to bugs due to forgetting to escape special characters. Since this would be a breaking change, we temporarily support Regex by default and use Globs when `#!special-case-list-v2` is the first line in the file. Users should switch to the glob format described in https://llvm.org/doxygen/classllvm_1_1GlobPattern.html. For example, `(abc|def)` should become `{abc,def}`. See discussion in https://reviews.llvm.org/D152762 and https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D154014
1 parent b6310e6 commit 8eb3470

File tree

6 files changed

+276
-195
lines changed

6 files changed

+276
-195
lines changed

clang/docs/SanitizerSpecialCaseList.rst

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ file at compile-time.
1515
Goal and usage
1616
==============
1717

18-
User of sanitizer tools, such as :doc:`AddressSanitizer`, :doc:`ThreadSanitizer`
18+
Users of sanitizer tools, such as :doc:`AddressSanitizer`, :doc:`ThreadSanitizer`
1919
or :doc:`MemorySanitizer` may want to disable or alter some checks for
2020
certain source-level entities to:
2121

@@ -54,48 +54,58 @@ Format
5454
Ignorelists consist of entries, optionally grouped into sections. Empty lines
5555
and lines starting with "#" are ignored.
5656

57-
Section names are regular expressions written in square brackets that denote
57+
.. note::
58+
59+
In `D154014 <https://reviews.llvm.org/D154014>`_ we transitioned to using globs instead
60+
of regexes to match patterns in special case lists. Since this was a
61+
breaking change, we will temporarily support the original behavior using
62+
regexes. If ``#!special-case-list-v2`` is the first line of the file, then
63+
we will use the new behavior using globs. For more details, see
64+
`this discourse post <https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666>`_.
65+
66+
67+
Section names are globs written in square brackets that denote
5868
which sanitizer the following entries apply to. For example, ``[address]``
59-
specifies AddressSanitizer while ``[cfi-vcall|cfi-icall]`` specifies Control
69+
specifies AddressSanitizer while ``[{cfi-vcall,cfi-icall}]`` specifies Control
6070
Flow Integrity virtual and indirect call checking. Entries without a section
6171
will be placed under the ``[*]`` section applying to all enabled sanitizers.
6272

63-
Entries contain an entity type, followed by a colon and a regular expression,
73+
Entries contain an entity type, followed by a colon and a glob,
6474
specifying the names of the entities, optionally followed by an equals sign and
65-
a tool-specific category, e.g. ``fun:*ExampleFunc=example_category``. The
66-
meaning of ``*`` in regular expression for entity names is different - it is
67-
treated as in shell wildcarding. Two generic entity types are ``src`` and
75+
a tool-specific category, e.g. ``fun:*ExampleFunc=example_category``.
76+
Two generic entity types are ``src`` and
6877
``fun``, which allow users to specify source files and functions, respectively.
6978
Some sanitizer tools may introduce custom entity types and categories - refer to
7079
tool-specific docs.
7180

7281
.. code-block:: bash
7382
83+
#!special-case-list-v2
84+
# The line above is explained in the note above
7485
# Lines starting with # are ignored.
75-
# Turn off checks for the source file (use absolute path or path relative
76-
# to the current working directory):
77-
src:/path/to/source/file.c
86+
# Turn off checks for the source file
87+
# Entries without sections are placed into [*] and apply to all sanitizers
88+
src:path/to/source/file.c
89+
src:*/source/file.c
7890
# Turn off checks for this main file, including files included by it.
7991
# Useful when the main file instead of an included file should be ignored.
8092
mainfile:file.c
8193
# Turn off checks for a particular functions (use mangled names):
82-
fun:MyFooBar
8394
fun:_Z8MyFooBarv
84-
# Extended regular expressions are supported:
85-
fun:bad_(foo|bar)
95+
# Glob brace expansions and character ranges are supported
96+
fun:bad_{foo,bar}
8697
src:bad_source[1-9].c
87-
# Shell like usage of * is supported (* is treated as .*):
98+
# "*" matches zero or more characters
8899
src:bad/sources/*
89100
fun:*BadFunction*
90101
# Specific sanitizer tools may introduce categories.
91102
src:/special/path/*=special_sources
92103
# Sections can be used to limit ignorelist entries to specific sanitizers
93104
[address]
94105
fun:*BadASanFunc*
95-
# Section names are regular expressions
96-
[cfi-vcall|cfi-icall]
106+
# Section names are globs
107+
[{cfi-vcall,cfi-icall}]
97108
fun:*BadCfiCall
98-
# Entries without sections are placed into [*] and apply to all sanitizers
99109
100110
``mainfile`` is similar to applying ``-fno-sanitize=`` to a set of files but
101111
does not need plumbing into the build system. This works well for internal

clang/lib/Basic/ProfileList.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ class ProfileSpecialCaseList : public llvm::SpecialCaseList {
3636
bool isEmpty() const { return Sections.empty(); }
3737

3838
bool hasPrefix(StringRef Prefix) const {
39-
for (auto &SectionIter : Sections)
40-
if (SectionIter.Entries.count(Prefix) > 0)
39+
for (const auto &It : Sections)
40+
if (It.second.Entries.count(Prefix) > 0)
4141
return true;
4242
return false;
4343
}

clang/lib/Basic/SanitizerSpecialCaseList.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,8 @@ SanitizerSpecialCaseList::createOrDie(const std::vector<std::string> &Paths,
3737
}
3838

3939
void SanitizerSpecialCaseList::createSanitizerSections() {
40-
for (auto &S : Sections) {
40+
for (auto &It : Sections) {
41+
auto &S = It.second;
4142
SanitizerMask Mask;
4243

4344
#define SANITIZER(NAME, ID) \

llvm/include/llvm/Support/SpecialCaseList.h

Lines changed: 52 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -5,54 +5,15 @@
55
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
66
//===----------------------------------------------------------------------===//
77
//
8-
// This is a utility class used to parse user-provided text files with
9-
// "special case lists" for code sanitizers. Such files are used to
10-
// define an "ABI list" for DataFlowSanitizer and allow/exclusion lists for
11-
// sanitizers like AddressSanitizer or UndefinedBehaviorSanitizer.
12-
//
13-
// Empty lines and lines starting with "#" are ignored. Sections are defined
14-
// using a '[section_name]' header and can be used to specify sanitizers the
15-
// entries below it apply to. Section names are regular expressions, and
16-
// entries without a section header match all sections (e.g. an '[*]' header
17-
// is assumed.)
18-
// The remaining lines should have the form:
19-
// prefix:wildcard_expression[=category]
20-
// If category is not specified, it is assumed to be empty string.
21-
// Definitions of "prefix" and "category" are sanitizer-specific. For example,
22-
// sanitizer exclusion support prefixes "src", "mainfile", "fun" and "global".
23-
// Wildcard expressions define, respectively, source files, main files,
24-
// functions or globals which shouldn't be instrumented.
25-
// Examples of categories:
26-
// "functional": used in DFSan to list functions with pure functional
27-
// semantics.
28-
// "init": used in ASan exclusion list to disable initialization-order bugs
29-
// detection for certain globals or source files.
30-
// Full special case list file example:
31-
// ---
32-
// [address]
33-
// # Excluded items:
34-
// fun:*_ZN4base6subtle*
35-
// global:*global_with_bad_access_or_initialization*
36-
// global:*global_with_initialization_issues*=init
37-
// type:*Namespace::ClassName*=init
38-
// src:file_with_tricky_code.cc
39-
// src:ignore-global-initializers-issues.cc=init
40-
// mainfile:main_file.cc
41-
//
42-
// [dataflow]
43-
// # Functions with pure functional semantics:
44-
// fun:cos=functional
45-
// fun:sin=functional
46-
// ---
47-
// Note that the wild card is in fact an llvm::Regex, but * is automatically
48-
// replaced with .*
8+
// This file implements a Special Case List for code sanitizers.
499
//
5010
//===----------------------------------------------------------------------===//
5111

5212
#ifndef LLVM_SUPPORT_SPECIALCASELIST_H
5313
#define LLVM_SUPPORT_SPECIALCASELIST_H
5414

5515
#include "llvm/ADT/StringMap.h"
16+
#include "llvm/Support/GlobPattern.h"
5617
#include "llvm/Support/Regex.h"
5718
#include <memory>
5819
#include <string>
@@ -66,6 +27,45 @@ namespace vfs {
6627
class FileSystem;
6728
}
6829

30+
/// This is a utility class used to parse user-provided text files with
31+
/// "special case lists" for code sanitizers. Such files are used to
32+
/// define an "ABI list" for DataFlowSanitizer and allow/exclusion lists for
33+
/// sanitizers like AddressSanitizer or UndefinedBehaviorSanitizer.
34+
///
35+
/// Empty lines and lines starting with "#" are ignored. Sections are defined
36+
/// using a '[section_name]' header and can be used to specify sanitizers the
37+
/// entries below it apply to. Section names are globs, and
38+
/// entries without a section header match all sections (e.g. an '[*]' header
39+
/// is assumed.)
40+
/// The remaining lines should have the form:
41+
/// prefix:glob_pattern[=category]
42+
/// If category is not specified, it is assumed to be empty string.
43+
/// Definitions of "prefix" and "category" are sanitizer-specific. For example,
44+
/// sanitizer exclusion support prefixes "src", "mainfile", "fun" and "global".
45+
/// "glob_pattern" defines source files, main files, functions or globals which
46+
/// shouldn't be instrumented.
47+
/// Examples of categories:
48+
/// "functional": used in DFSan to list functions with pure functional
49+
/// semantics.
50+
/// "init": used in ASan exclusion list to disable initialization-order bugs
51+
/// detection for certain globals or source files.
52+
/// Full special case list file example:
53+
/// ---
54+
/// [address]
55+
/// # Excluded items:
56+
/// fun:*_ZN4base6subtle*
57+
/// global:*global_with_bad_access_or_initialization*
58+
/// global:*global_with_initialization_issues*=init
59+
/// type:*Namespace::ClassName*=init
60+
/// src:file_with_tricky_code.cc
61+
/// src:ignore-global-initializers-issues.cc=init
62+
/// mainfile:main_file.cc
63+
///
64+
/// [dataflow]
65+
/// # Functions with pure functional semantics:
66+
/// fun:cos=functional
67+
/// fun:sin=functional
68+
/// ---
6969
class SpecialCaseList {
7070
public:
7171
/// Parses the special case list entries from files. On failure, returns
@@ -88,7 +88,7 @@ class SpecialCaseList {
8888
/// \code
8989
/// @Prefix:<E>=@Category
9090
/// \endcode
91-
/// where @Query satisfies wildcard expression <E> in a given @Section.
91+
/// where @Query satisfies the glob <E> in a given @Section.
9292
bool inSection(StringRef Section, StringRef Prefix, StringRef Query,
9393
StringRef Category = StringRef()) const;
9494

@@ -97,7 +97,7 @@ class SpecialCaseList {
9797
/// \code
9898
/// @Prefix:<E>=@Category
9999
/// \endcode
100-
/// where @Query satisfies wildcard expression <E> in a given @Section.
100+
/// where @Query satisfies the glob <E> in a given @Section.
101101
/// Returns zero if there is no exclusion entry corresponding to this
102102
/// expression.
103103
unsigned inSectionBlame(StringRef Section, StringRef Prefix, StringRef Query,
@@ -114,36 +114,36 @@ class SpecialCaseList {
114114
SpecialCaseList(SpecialCaseList const &) = delete;
115115
SpecialCaseList &operator=(SpecialCaseList const &) = delete;
116116

117-
/// Represents a set of regular expressions. Regular expressions which are
118-
/// "literal" (i.e. no regex metacharacters) are stored in Strings. The
119-
/// reason for doing so is efficiency; StringMap is much faster at matching
120-
/// literal strings than Regex.
117+
/// Represents a set of globs and their line numbers
121118
class Matcher {
122119
public:
123-
bool insert(std::string Regexp, unsigned LineNumber, std::string &REError);
120+
Error insert(StringRef Pattern, unsigned LineNumber, bool UseRegex);
124121
// Returns the line number in the source file that this query matches to.
125122
// Returns zero if no match is found.
126123
unsigned match(StringRef Query) const;
127124

128125
private:
129-
StringMap<unsigned> Strings;
126+
StringMap<std::pair<GlobPattern, unsigned>> Globs;
130127
std::vector<std::pair<std::unique_ptr<Regex>, unsigned>> RegExes;
131128
};
132129

133130
using SectionEntries = StringMap<StringMap<Matcher>>;
134131

135132
struct Section {
136133
Section(std::unique_ptr<Matcher> M) : SectionMatcher(std::move(M)){};
134+
Section() : Section(std::make_unique<Matcher>()) {}
137135

138136
std::unique_ptr<Matcher> SectionMatcher;
139137
SectionEntries Entries;
140138
};
141139

142-
std::vector<Section> Sections;
140+
StringMap<Section> Sections;
141+
142+
Expected<Section *> addSection(StringRef SectionStr, unsigned LineNo,
143+
bool UseGlobs = true);
143144

144145
/// Parses just-constructed SpecialCaseList entries from a memory buffer.
145-
bool parse(const MemoryBuffer *MB, StringMap<size_t> &SectionsMap,
146-
std::string &Error);
146+
bool parse(const MemoryBuffer *MB, std::string &Error);
147147

148148
// Helper method for derived classes to search by Prefix, Query, and Category
149149
// once they have already resolved a section entry.

0 commit comments

Comments
 (0)