Skip to content

Commit 5249baa

Browse files
committed
[IR] Introduce captures attribute
This introduces the `captures` attribute as described in: https://discourse.llvm.org/t/rfc-improvements-to-capture-tracking/81420 This initial patch only introduces the IR/bitcode support for the attribute and its in-memory representation as `CaptureInfo`. This will be followed by a patch to remove (and upgrade) the `nocapture` attribute, and then by actual inference/analysis support. Based on the RFC feedback, I've used a syntax similar to the `memory` attribute, though the only "location" that can be specified right now is `ret`. I've added some pretty extensive documentation to LangRef on the semantics. One non-obvious bit here is that using ptrtoint will not result in a "return-only" capture, even if the ptrtoint result is only used in the return value. Without this requirement we wouldn't be able to continue ordinary capture analysis on the return value.
1 parent 2906232 commit 5249baa

File tree

19 files changed

+469
-9
lines changed

19 files changed

+469
-9
lines changed

llvm/docs/LangRef.rst

Lines changed: 120 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1379,6 +1379,36 @@ Currently, only the following parameter attributes are defined:
13791379
function, returning a pointer to allocated storage disjoint from the
13801380
storage for any other object accessible to the caller.
13811381

1382+
``captures(...)``
1383+
This attributes restrict the ways in which the callee may capture the
1384+
pointer. This is not a valid attribute for return values. This attribute
1385+
applies only to the particular copy of the pointer passed in this argument.
1386+
1387+
The arguments of ``captures`` is a list of captured pointer components,
1388+
which may be ``none``, or a combination of:
1389+
1390+
- ``address``: The integral address of the pointer.
1391+
- ``provenance``: The ability to access the pointer for both read and write
1392+
after the function returns.
1393+
- ``read_provenance``: The ability to access the pointer only for reads
1394+
after the function returns.
1395+
1396+
Additionally, it is possible to specify that the pointer is captured via
1397+
the return value only, by using ``caputres(ret: ...)``.
1398+
1399+
The `pointer capture section <pointercapture>` discusses these semantics
1400+
in more detail.
1401+
1402+
Some examples of how to use the attribute:
1403+
1404+
- ``captures(none)``: Pointer not captured.
1405+
- ``captures(address, provenance)``: Equivalent to omitting the attribute.
1406+
- ``captures(address)``: Address may be captured, but not provenance.
1407+
- ``captures(address, read_provenance)``: Both address and provenance
1408+
captured, but only for read-only access.
1409+
- ``captures(ret: address, provenance)``: Pointer captured through return
1410+
value only.
1411+
13821412
.. _nocapture:
13831413

13841414
``nocapture``
@@ -3318,10 +3348,91 @@ Pointer Capture
33183348
---------------
33193349

33203350
Given a function call and a pointer that is passed as an argument or stored in
3321-
the memory before the call, a pointer is *captured* by the call if it makes a
3322-
copy of any part of the pointer that outlives the call.
3323-
To be precise, a pointer is captured if one or more of the following conditions
3324-
hold:
3351+
memory before the call, the call may capture two components of the pointer:
3352+
3353+
* The address of the pointer, which is its integral value. This also includes
3354+
parts of the address or any information about the address, including the
3355+
fact that it does not equal one specific value.
3356+
* The provenance of the pointer, which is the ability to perform memory
3357+
accesses through the pointer, in the sense of the :ref:`pointer aliasing
3358+
rules <pointeraliasing>`. We further distinguish whether only read acceses
3359+
are allowed, or both reads and writes.
3360+
3361+
For example, the following function captures the address of ``%a``, because
3362+
it is compared to a pointer, leaking information about the identitiy of the
3363+
pointer:
3364+
3365+
.. code-block:: llvm
3366+
3367+
@glb = global i8 0
3368+
3369+
define i1 @f(ptr %a) {
3370+
%c = icmp eq ptr %a, @glb
3371+
ret i1 %c
3372+
}
3373+
3374+
The function does not capture the provenance of the pointer, because the
3375+
``icmp`` instruction only operates on the pointer address. The following
3376+
function captures both the address and provenance of the pointer, as both
3377+
may be read from ``@glb`` after the function returns:
3378+
3379+
.. code-block:: llvm
3380+
3381+
@glb = global ptr null
3382+
3383+
define void @f(ptr %a) {
3384+
store ptr %a, ptr @glb
3385+
ret void
3386+
}
3387+
3388+
The following function captures *neither* the address nor the provenance of
3389+
the pointer:
3390+
3391+
.. code-block:: llvm
3392+
3393+
define i32 @f(ptr %a) {
3394+
%v = load i32, ptr %a
3395+
ret i32
3396+
}
3397+
3398+
While address capture includes uses of the address within the body of the
3399+
function, provenance capture refers exclusively to the ability to perform
3400+
accesses *after* the function returns. Memory accesses within the function
3401+
itself are not considered pointer captures.
3402+
3403+
We can further say that the capture only occurs through a specific location.
3404+
In the following example, the pointer (both address and provenance) is captured
3405+
through the return value only:
3406+
3407+
.. code-block:: llvm
3408+
3409+
define ptr @f(ptr %a) {
3410+
%gep = getelementptr i8, ptr %a, i64 4
3411+
ret ptr %gep
3412+
}
3413+
3414+
However, we always consider direct inspection of the pointer address
3415+
(e.g. using ``ptrtoint``) to be location-independent. The following example
3416+
is *not* considered a return-only capture, even though the ``ptrtoint``
3417+
ultimately only contribues to the return value:
3418+
3419+
.. code-block:: llvm
3420+
3421+
@lookup = constant [4 x i8] [i8 0, i8 1, i8 2, i8 3]
3422+
3423+
define ptr @f(ptr %a) {
3424+
%a.addr = ptrtoint ptr %a to i64
3425+
%mask = and i64 %a.addr, 3
3426+
%gep = getelementptr i8, ptr @lookup, i64 %mask
3427+
ret ptr %gep
3428+
}
3429+
3430+
This definition is chosen to allow capture analysis to continue with the return
3431+
value in the usual fashion.
3432+
3433+
The following describes possible ways to capture a pointer in more detail,
3434+
where unqualified uses of the word "capture" refer to capturing both address
3435+
and provenance.
33253436

33263437
1. The call stores any bit of the pointer carrying information into a place,
33273438
and the stored bits can be read from the place by the caller after this call
@@ -3360,30 +3471,30 @@ hold:
33603471
@lock = global i1 true
33613472

33623473
define void @f(ptr %a) {
3363-
store ptr %a, ptr* @glb
3474+
store ptr %a, ptr @glb
33643475
store atomic i1 false, ptr @lock release ; %a is captured because another thread can safely read @glb
33653476
store ptr null, ptr @glb
33663477
ret void
33673478
}
33683479

3369-
3. The call's behavior depends on any bit of the pointer carrying information.
3480+
3. The call's behavior depends on any bit of the pointer carrying information
3481+
(address capture only).
33703482

33713483
.. code-block:: llvm
33723484

33733485
@glb = global i8 0
33743486

33753487
define void @f(ptr %a) {
33763488
%c = icmp eq ptr %a, @glb
3377-
br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; escapes %a
3489+
br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; captures address of %a only
33783490
BB_EXIT:
33793491
call void @exit()
33803492
unreachable
33813493
BB_CONTINUE:
33823494
ret void
33833495
}
33843496

3385-
4. The pointer is used in a volatile access as its address.
3386-
3497+
4. The pointer is used as the pointer operand of a volatile access.
33873498

33883499
.. _volatile:
33893500

llvm/include/llvm/AsmParser/LLParser.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,7 @@ namespace llvm {
376376
bool inAttrGrp, LocTy &BuiltinLoc);
377377
bool parseRangeAttr(AttrBuilder &B);
378378
bool parseInitializesAttr(AttrBuilder &B);
379+
bool parseCapturesAttr(AttrBuilder &B);
379380
bool parseRequiredTypeAttr(AttrBuilder &B, lltok::Kind AttrToken,
380381
Attribute::AttrKind AttrKind);
381382

llvm/include/llvm/AsmParser/LLToken.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,11 @@ enum Kind {
207207
kw_inaccessiblememonly,
208208
kw_inaccessiblemem_or_argmemonly,
209209

210+
// Captures attribute:
211+
kw_address,
212+
kw_provenance,
213+
kw_read_provenance,
214+
210215
// nofpclass attribute:
211216
kw_all,
212217
kw_nan,

llvm/include/llvm/Bitcode/LLVMBitCodes.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -787,6 +787,7 @@ enum AttributeKindCodes {
787787
ATTR_KIND_CORO_ELIDE_SAFE = 98,
788788
ATTR_KIND_NO_EXT = 99,
789789
ATTR_KIND_NO_DIVERGENCE_SOURCE = 100,
790+
ATTR_KIND_CAPTURES = 101,
790791
};
791792

792793
enum ComdatSelectionKindCodes {

llvm/include/llvm/IR/Attributes.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,9 @@ class Attribute {
284284
/// Returns memory effects.
285285
MemoryEffects getMemoryEffects() const;
286286

287+
/// Returns information from captures attribute.
288+
CaptureInfo getCaptureInfo() const;
289+
287290
/// Return the FPClassTest for nofpclass
288291
FPClassTest getNoFPClass() const;
289292

@@ -436,6 +439,7 @@ class AttributeSet {
436439
UWTableKind getUWTableKind() const;
437440
AllocFnKind getAllocKind() const;
438441
MemoryEffects getMemoryEffects() const;
442+
CaptureInfo getCaptureInfo() const;
439443
FPClassTest getNoFPClass() const;
440444
std::string getAsString(bool InAttrGrp = false) const;
441445

@@ -1260,6 +1264,9 @@ class AttrBuilder {
12601264
/// Add memory effect attribute.
12611265
AttrBuilder &addMemoryAttr(MemoryEffects ME);
12621266

1267+
/// Add captures attribute.
1268+
AttrBuilder &addCapturesAttr(CaptureInfo CI);
1269+
12631270
// Add nofpclass attribute
12641271
AttrBuilder &addNoFPClassAttr(FPClassTest NoFPClassMask);
12651272

llvm/include/llvm/IR/Attributes.td

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,9 @@ def NoCallback : EnumAttr<"nocallback", IntersectAnd, [FnAttr]>;
183183
/// Function creates no aliases of pointer.
184184
def NoCapture : EnumAttr<"nocapture", IntersectAnd, [ParamAttr]>;
185185

186+
/// Specify how the pointer may be captured.
187+
def Captures : IntAttr<"captures", IntersectCustom, [ParamAttr]>;
188+
186189
/// Function is not a source of divergence.
187190
def NoDivergenceSource : EnumAttr<"nodivergencesource", IntersectAnd, [FnAttr]>;
188191

llvm/include/llvm/Support/ModRef.h

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,6 +273,93 @@ raw_ostream &operator<<(raw_ostream &OS, MemoryEffects RMRB);
273273
// Legacy alias.
274274
using FunctionModRefBehavior = MemoryEffects;
275275

276+
/// Components of the pointer that may be captured.
277+
enum class CaptureComponents : uint8_t {
278+
None = 0,
279+
Address = (1 << 0),
280+
ReadProvenance = (1 << 1),
281+
Provenance = (1 << 2) | ReadProvenance,
282+
All = Address | Provenance,
283+
LLVM_MARK_AS_BITMASK_ENUM(Provenance),
284+
};
285+
286+
inline bool capturesNothing(CaptureComponents CC) {
287+
return CC == CaptureComponents::None;
288+
}
289+
290+
inline bool capturesAnything(CaptureComponents CC) {
291+
return CC != CaptureComponents::None;
292+
}
293+
294+
inline bool capturesAddress(CaptureComponents CC) {
295+
return (CC & CaptureComponents::Address) != CaptureComponents::None;
296+
}
297+
298+
inline bool capturesReadProvenanceOnly(CaptureComponents CC) {
299+
return (CC & CaptureComponents::Provenance) ==
300+
CaptureComponents::ReadProvenance;
301+
}
302+
303+
inline bool capturesFullProvenance(CaptureComponents CC) {
304+
return (CC & CaptureComponents::Provenance) == CaptureComponents::Provenance;
305+
}
306+
307+
raw_ostream &operator<<(raw_ostream &OS, CaptureComponents CC);
308+
309+
/// Represents which components of the pointer may be captured and whether
310+
/// the capture is via the return value only. This represents the captures(...)
311+
/// attribute in IR.
312+
///
313+
/// For more information on the precise semantics see LangRef.
314+
class CaptureInfo {
315+
CaptureComponents Components;
316+
bool ReturnOnly;
317+
318+
public:
319+
CaptureInfo(CaptureComponents Components, bool ReturnOnly = false)
320+
: Components(Components),
321+
ReturnOnly(capturesAnything(Components) && ReturnOnly) {}
322+
323+
/// Create CaptureInfo that may capture all components of the pointer.
324+
static CaptureInfo all() { return CaptureInfo(CaptureComponents::All); }
325+
326+
/// Get the potentially captured components of the pointer.
327+
operator CaptureComponents() const { return Components; }
328+
329+
/// Whether the pointer is captured through the return value only.
330+
bool isReturnOnly() const { return ReturnOnly; }
331+
332+
bool operator==(CaptureInfo Other) const {
333+
return Components == Other.Components && ReturnOnly == Other.ReturnOnly;
334+
}
335+
336+
bool operator!=(CaptureInfo Other) const { return !(*this == Other); }
337+
338+
/// Compute union of CaptureInfos.
339+
CaptureInfo operator|(CaptureInfo Other) const {
340+
return CaptureInfo(Components | Other.Components,
341+
ReturnOnly && Other.ReturnOnly);
342+
}
343+
344+
/// Compute intersection of CaptureInfos.
345+
CaptureInfo operator&(CaptureInfo Other) const {
346+
return CaptureInfo(Components & Other.Components,
347+
ReturnOnly || Other.ReturnOnly);
348+
}
349+
350+
static CaptureInfo createFromIntValue(uint32_t Data) {
351+
return CaptureInfo(CaptureComponents(Data >> 1), Data & 1);
352+
}
353+
354+
/// Convert CaptureInfo into an encoded integer value (used by captures
355+
/// attribute).
356+
uint32_t toIntValue() const {
357+
return (uint32_t(Components) << 1) | ReturnOnly;
358+
}
359+
};
360+
361+
raw_ostream &operator<<(raw_ostream &OS, CaptureInfo Info);
362+
276363
} // namespace llvm
277364

278365
#endif

llvm/lib/AsmParser/LLLexer.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -704,6 +704,9 @@ lltok::Kind LLLexer::LexIdentifier() {
704704
KEYWORD(argmemonly);
705705
KEYWORD(inaccessiblememonly);
706706
KEYWORD(inaccessiblemem_or_argmemonly);
707+
KEYWORD(address);
708+
KEYWORD(provenance);
709+
KEYWORD(read_provenance);
707710

708711
// nofpclass attribute
709712
KEYWORD(all);

0 commit comments

Comments
 (0)