Skip to content

[PAC] Implement function pointer type discrimination #96992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 11, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions clang/include/clang/AST/ASTContext.h
Original file line number Diff line number Diff line change
Expand Up @@ -1283,6 +1283,9 @@ class ASTContext : public RefCountedBase<ASTContext> {
uint16_t
getPointerAuthVTablePointerDiscriminator(const CXXRecordDecl *RD);

/// Return the "other" type-specific discriminator for the given type.
uint16_t getPointerAuthTypeDiscriminator(QualType T) const;

/// Apply Objective-C protocol qualifiers to the given type.
/// \param allowOnPointerType specifies if we can apply protocol
/// qualifiers on ObjCObjectPointerType. It can be set to true when
Expand Down
3 changes: 3 additions & 0 deletions clang/include/clang/AST/Type.h
Original file line number Diff line number Diff line change
Expand Up @@ -2506,6 +2506,7 @@ class alignas(TypeAlignment) Type : public ExtQualsTypeCommonBase {
bool isFunctionNoProtoType() const { return getAs<FunctionNoProtoType>(); }
bool isFunctionProtoType() const { return getAs<FunctionProtoType>(); }
bool isPointerType() const;
bool isSignableType() const;
bool isAnyPointerType() const; // Any C pointer or ObjC object pointer
bool isCountAttributedType() const;
bool isBlockPointerType() const;
Expand Down Expand Up @@ -8001,6 +8002,8 @@ inline bool Type::isAnyPointerType() const {
return isPointerType() || isObjCObjectPointerType();
}

inline bool Type::isSignableType() const { return isPointerType(); }

inline bool Type::isBlockPointerType() const {
return isa<BlockPointerType>(CanonicalType);
}
Expand Down
1 change: 1 addition & 0 deletions clang/include/clang/Basic/Features.def
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ FEATURE(ptrauth_vtable_pointer_address_discrimination, LangOpts.PointerAuthVTPtr
FEATURE(ptrauth_vtable_pointer_type_discrimination, LangOpts.PointerAuthVTPtrTypeDiscrimination)
FEATURE(ptrauth_member_function_pointer_type_discrimination, LangOpts.PointerAuthCalls)
FEATURE(ptrauth_init_fini, LangOpts.PointerAuthInitFini)
FEATURE(ptrauth_function_pointer_type_discrimination, LangOpts.PointerAuthFunctionTypeDiscrimination)
EXTENSION(swiftcc,
PP.getTargetInfo().checkCallingConvention(CC_Swift) ==
clang::TargetInfo::CCCR_OK)
Expand Down
2 changes: 2 additions & 0 deletions clang/include/clang/Basic/LangOptions.def
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,8 @@ LANGOPT(PointerAuthAuthTraps, 1, 0, "pointer authentication failure traps")
LANGOPT(PointerAuthVTPtrAddressDiscrimination, 1, 0, "incorporate address discrimination in authenticated vtable pointers")
LANGOPT(PointerAuthVTPtrTypeDiscrimination, 1, 0, "incorporate type discrimination in authenticated vtable pointers")
LANGOPT(PointerAuthInitFini, 1, 0, "sign function pointers in init/fini arrays")
BENIGN_LANGOPT(PointerAuthFunctionTypeDiscrimination, 1, 0,
"Use type discrimination when signing function pointers")

LANGOPT(DoubleSquareBracketAttributes, 1, 0, "'[[]]' attributes extension for all language standard modes")
LANGOPT(ExperimentalLateParseAttributes, 1, 0, "experimental late parsing of attributes")
Expand Down
4 changes: 4 additions & 0 deletions clang/include/clang/CodeGen/CodeGenABITypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,10 @@ unsigned getLLVMFieldNumber(CodeGenModule &CGM,
/// Return a declaration discriminator for the given global decl.
uint16_t getPointerAuthDeclDiscriminator(CodeGenModule &CGM, GlobalDecl GD);

/// Return a type discriminator for the given function type.
uint16_t getPointerAuthTypeDiscriminator(CodeGenModule &CGM,
QualType FunctionType);

/// Given the language and code-generation options that Clang was configured
/// with, set the default LLVM IR attributes for a function definition.
/// The attributes set here are mostly global target-configuration and
Expand Down
2 changes: 2 additions & 0 deletions clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -4228,6 +4228,8 @@ defm ptrauth_vtable_pointer_address_discrimination :
defm ptrauth_vtable_pointer_type_discrimination :
OptInCC1FFlag<"ptrauth-vtable-pointer-type-discrimination", "Enable type discrimination of vtable pointers">;
defm ptrauth_init_fini : OptInCC1FFlag<"ptrauth-init-fini", "Enable signing of function pointers in init/fini arrays">;
defm ptrauth_function_pointer_type_discrimination : OptInCC1FFlag<"ptrauth-function-pointer-type-discrimination",
"Enable type discrimination on C function pointers">;
}

def fenable_matrix : Flag<["-"], "fenable-matrix">, Group<f_Group>,
Expand Down
263 changes: 263 additions & 0 deletions clang/lib/AST/ASTContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3140,6 +3140,269 @@ ASTContext::getPointerAuthVTablePointerDiscriminator(const CXXRecordDecl *RD) {
return llvm::getPointerAuthStableSipHash(Str);
}

/// Encode a function type for use in the discriminator of a function pointer
/// type. We can't use the itanium scheme for this since C has quite permissive
/// rules for type compatibility that we need to be compatible with.
///
/// Formally, this function associates every function pointer type T with an
/// encoded string E(T). Let the equivalence relation T1 ~ T2 be defined as
/// E(T1) == E(T2). E(T) is part of the ABI of values of type T. C type
/// compatibility requires equivalent treatment under the ABI, so
/// CCompatible(T1, T2) must imply E(T1) == E(T2), that is, CCompatible must be
/// a subset of ~. Crucially, however, it must be a proper subset because
/// CCompatible is not an equivalence relation: for example, int[] is compatible
/// with both int[1] and int[2], but the latter are not compatible with each
/// other. Therefore this encoding function must be careful to only distinguish
/// types if there is no third type with which they are both required to be
/// compatible.
static void encodeTypeForFunctionPointerAuth(const ASTContext &Ctx,
raw_ostream &OS, QualType QT) {
// FIXME: Consider address space qualifiers.
const Type *T = QT.getCanonicalType().getTypePtr();

// FIXME: Consider using the C++ type mangling when we encounter a construct
// that is incompatible with C.

switch (T->getTypeClass()) {
case Type::Atomic:
return encodeTypeForFunctionPointerAuth(
Ctx, OS, cast<AtomicType>(T)->getValueType());

case Type::LValueReference:
OS << "R";
encodeTypeForFunctionPointerAuth(Ctx, OS,
cast<ReferenceType>(T)->getPointeeType());
return;
case Type::RValueReference:
OS << "O";
encodeTypeForFunctionPointerAuth(Ctx, OS,
cast<ReferenceType>(T)->getPointeeType());
return;

case Type::Pointer:
// C11 6.7.6.1p2:
// For two pointer types to be compatible, both shall be identically
// qualified and both shall be pointers to compatible types.
// FIXME: we should also consider pointee types.
OS << "P";
return;

case Type::ObjCObjectPointer:
case Type::BlockPointer:
OS << "P";
return;

case Type::Complex:
OS << "C";
return encodeTypeForFunctionPointerAuth(
Ctx, OS, cast<ComplexType>(T)->getElementType());

case Type::VariableArray:
case Type::ConstantArray:
case Type::IncompleteArray:
case Type::ArrayParameter:
// C11 6.7.6.2p6:
// For two array types to be compatible, both shall have compatible
// element types, and if both size specifiers are present, and are integer
// constant expressions, then both size specifiers shall have the same
// constant value [...]
//
// So since ElemType[N] has to be compatible ElemType[], we can't encode the
// width of the array.
OS << "A";
return encodeTypeForFunctionPointerAuth(
Ctx, OS, cast<ArrayType>(T)->getElementType());

case Type::ObjCInterface:
case Type::ObjCObject:
OS << "<objc_object>";
return;

case Type::Enum:
// C11 6.7.2.2p4:
// Each enumerated type shall be compatible with char, a signed integer
// type, or an unsigned integer type.
//
// So we have to treat enum types as integers.
OS << "i";
return;

case Type::FunctionNoProto:
case Type::FunctionProto: {
// C11 6.7.6.3p15:
// For two function types to be compatible, both shall specify compatible
// return types. Moreover, the parameter type lists, if both are present,
// shall agree in the number of parameters and in the use of the ellipsis
// terminator; corresponding parameters shall have compatible types.
//
// That paragraph goes on to describe how unprototyped functions are to be
// handled, which we ignore here. Unprototyped function pointers are hashed
// as though they were prototyped nullary functions since thats probably
// what the user meant. This behavior is non-conforming.
// FIXME: If we add a "custom discriminator" function type attribute we
// should encode functions as their discriminators.
OS << "F";
const auto *FuncType = cast<FunctionType>(T);
encodeTypeForFunctionPointerAuth(Ctx, OS, FuncType->getReturnType());
if (const auto *FPT = dyn_cast<FunctionProtoType>(FuncType)) {
for (QualType Param : FPT->param_types()) {
Param = Ctx.getSignatureParameterType(Param);
encodeTypeForFunctionPointerAuth(Ctx, OS, Param);
}
if (FPT->isVariadic())
OS << "z";
}
OS << "E";
return;
}

case Type::MemberPointer: {
OS << "M";
const auto *MPT = T->getAs<MemberPointerType>();
encodeTypeForFunctionPointerAuth(Ctx, OS, QualType(MPT->getClass(), 0));
encodeTypeForFunctionPointerAuth(Ctx, OS, MPT->getPointeeType());
return;
}
case Type::ExtVector:
case Type::Vector:
OS << "Dv" << Ctx.getTypeSizeInChars(T).getQuantity();
break;

// Don't bother discriminating based on these types.
case Type::Pipe:
case Type::BitInt:
case Type::ConstantMatrix:
OS << "?";
return;

case Type::Builtin: {
const auto *BTy = T->getAs<BuiltinType>();
switch (BTy->getKind()) {
#define SIGNED_TYPE(Id, SingletonId) \
case BuiltinType::Id: \
OS << "i"; \
return;
#define UNSIGNED_TYPE(Id, SingletonId) \
case BuiltinType::Id: \
OS << "i"; \
return;
#define PLACEHOLDER_TYPE(Id, SingletonId) case BuiltinType::Id:
#define BUILTIN_TYPE(Id, SingletonId)
#include "clang/AST/BuiltinTypes.def"
llvm_unreachable("placeholder types should not appear here.");

case BuiltinType::Half:
OS << "Dh";
return;
case BuiltinType::Float:
OS << "f";
return;
case BuiltinType::Double:
OS << "d";
return;
case BuiltinType::LongDouble:
OS << "e";
return;
case BuiltinType::Float16:
OS << "DF16_";
return;
case BuiltinType::Float128:
OS << "g";
return;

case BuiltinType::Void:
OS << "v";
return;

case BuiltinType::ObjCId:
case BuiltinType::ObjCClass:
case BuiltinType::ObjCSel:
case BuiltinType::NullPtr:
OS << "P";
return;

// Don't bother discriminating based on OpenCL types.
case BuiltinType::OCLSampler:
case BuiltinType::OCLEvent:
case BuiltinType::OCLClkEvent:
case BuiltinType::OCLQueue:
case BuiltinType::OCLReserveID:
case BuiltinType::BFloat16:
case BuiltinType::VectorQuad:
case BuiltinType::VectorPair:
OS << "?";
return;

// Don't bother discriminating based on these seldom-used types.
case BuiltinType::Ibm128:
return;
#define IMAGE_TYPE(ImgType, Id, SingletonId, Access, Suffix) \
case BuiltinType::Id: \
return;
#include "clang/Basic/OpenCLImageTypes.def"
#define EXT_OPAQUE_TYPE(ExtType, Id, Ext) \
case BuiltinType::Id: \
return;
#include "clang/Basic/OpenCLExtensionTypes.def"
#define SVE_TYPE(Name, Id, SingletonId) \
case BuiltinType::Id: \
return;
#include "clang/Basic/AArch64SVEACLETypes.def"
case BuiltinType::Dependent:
llvm_unreachable("should never get here");
case BuiltinType::AMDGPUBufferRsrc:
case BuiltinType::WasmExternRef:
#define RVV_TYPE(Name, Id, SingletonId) case BuiltinType::Id:
#include "clang/Basic/RISCVVTypes.def"
llvm_unreachable("not yet implemented");
}
}
case Type::Record: {
const RecordDecl *RD = T->getAs<RecordType>()->getDecl();
const IdentifierInfo *II = RD->getIdentifier();
if (!II)
if (const TypedefNameDecl *Typedef = RD->getTypedefNameForAnonDecl())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getTypedefNameForAnonDecl() doesn't actually have any semantic significance in C; it's a C++ thing. Not sure how much we care.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this code is needed to add more diversification in the following case:

typedef struct {
  int a;
} S;

void f(S);
void (*fp)(S) = f;

Function type discrimination is used in C++ mode too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the following two files, you have an ODR violation in C++... but you're not violating the equivalent C type compatibility rule. So this will reject code which is technically valid C.

If this is an intentional deviation from the C standard, that might be fine, but it needs an explicit comment explaining that.

typedef struct {
  int a;
} S1;
void f(S1);
typedef struct {
  int a;
} S2;
void f(S2);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, that exact code will work as usual because function symbols are not themselves signed. A call to f under one function type will work correctly even if it is defined using the other. However, you are still generally correct.

If you take the address of a function, the resulting function pointer value is signed in that context according to that context's concept of its type. If you then convert it, it will be authenticated for its old type and re-signed for its new type. There will only be a problem if the function pointer value is reinterpreted as a new type without such a conversion.

Therefore, if f is instead declared as taking a function pointer value:

void f(void (*fp)(S1));
void f(void (*fp)(S2));

A call under the first signature will sign the argument differently from how a definition under the second signature will expect it, and code that is technically valid C will fail to work.

This deviation is indeed intentional. We felt that typedefs of anonymous structs were too common of a pattern in C to simply completely fail to distinguish them, and we felt that a structural hash of the members was likely to be problematic in other ways. It is reasonable to ask that we document this, though. Akira, I suggest:

      // In C++, an immediate typedef of an anonymous struct or union
      // is considered to name it for ODR purposes, but C's specification
      // of type compatibility does not have a similar rule.  Using the typedef
      // name in function type discriminators anyway, as we do here,
      // therefore technically violates the C standard: two function pointer
      // types defined in terms of two typedef'd anonymous structs with
      // different names are formally still compatible, but we are assigning
      // them different discriminators and therefore incompatible ABIs.
      //
      // This is a relatively minor violation that significantly improves
      // discrimination in some cases and has not caused problems in
      // practice.  Regardless, it is now part of the ABI in places where
      // function type discrimination is used, and it can no longer be
      // changed except on new platforms.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this patch doesn't have the code to resign function pointers, which is needed for type conversions. That will be implemented in subsequent patches.

II = Typedef->getDeclName().getAsIdentifierInfo();

if (!II) {
OS << "<anonymous_record>";
return;
}
OS << II->getLength() << II->getName();
return;
}
case Type::DeducedTemplateSpecialization:
case Type::Auto:
#define NON_CANONICAL_TYPE(Class, Base) case Type::Class:
#define DEPENDENT_TYPE(Class, Base) case Type::Class:
#define NON_CANONICAL_UNLESS_DEPENDENT_TYPE(Class, Base) case Type::Class:
#define ABSTRACT_TYPE(Class, Base)
#define TYPE(Class, Base)
#include "clang/AST/TypeNodes.inc"
llvm_unreachable("unexpected non-canonical or dependent type!");
return;
}
}

uint16_t ASTContext::getPointerAuthTypeDiscriminator(QualType T) const {
assert(!T->isDependentType() &&
"cannot compute type discriminator of a dependent type");

SmallString<256> Str;
llvm::raw_svector_ostream Out(Str);

if (T->isFunctionPointerType() || T->isFunctionReferenceType())
T = T->getPointeeType();

if (T->isFunctionType())
encodeTypeForFunctionPointerAuth(*this, Out, T);
else
llvm_unreachable(
"type discrimination of non-function type not implemented yet");

return llvm::getPointerAuthStableSipHash(Str);
}

QualType ASTContext::getObjCGCQualType(QualType T,
Qualifiers::GC GCAttr) const {
QualType CanT = getCanonicalType(T);
Expand Down
5 changes: 5 additions & 0 deletions clang/lib/CodeGen/CGExprConstant.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2220,6 +2220,11 @@ llvm::Constant *ConstantLValueEmitter::emitPointerAuthPointer(const Expr *E) {

// The assertions here are all checked by Sema.
assert(Result.Val.isLValue());
const auto *Base = Result.Val.getLValueBase().get<const ValueDecl *>();
if (const auto *Decl = dyn_cast_or_null<FunctionDecl>(Base)) {
assert(Result.Val.getLValueOffset().isZero());
return CGM.getRawFunctionPointer(Decl);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this code not part of ConstantEmitter()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called to return an unsigned pointer. The returned pointer is used to create a signed pointer for __builtin_ptrauth_sign_constant in emitPointerAuthSignConstant. ConstantEmitter returns a pointer that is signed with zero or a non-zero discriminator that is based on the function type etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, why are you special-casing functions here? This looks like emitPointerAuthPointer can return either a signed pointer or an unsigned pointer, depending on what exactly the expression is referring to, which seems bad.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checks in SemaChecking.cpp and the assertion in emitPointerAuthPointer ensure that the expression of the first argument to __builtin_ptrauth_sign_constant evaluates to an LValue with a ValueDecl base. With that restriction, a call to emitAbstract returns a signed pointer only when the argument points to a function definition (see ConstantLValueEmitter::tryEmitBase). Since we are special-casing functions here, the call to emitAbstract returns an unsigned pointer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that there's an unnecessary null check in the code above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I follow what this is doing, then... but it seems like a terrible way to structure the code; it's repeating code, and the relevant logic is split over multiple disconnected functions.

Can we add an argument to ConstantLValueEmitter() so users can request that functions are either signed or unsigned, or something like that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to ConstantEmitter? I'd be okay with a flag that specifically requests a raw pointer; that doesn't seem too error-prone.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised code looks fine.

}
return ConstantEmitter(CGM, Emitter.CGF)
.emitAbstract(E->getExprLoc(), Result.Val, E->getType());
}
Expand Down
Loading
Loading