-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[DataLayout] Introduce DataLayout::getPointerAddressSize(AS) #137412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataLayout] Introduce DataLayout::getPointerAddressSize(AS) #137412
Conversation
Created using spr 1.3.6-beta.1
What's the difference between |
Some pointer types can be larger than the underlying address space (e.g. CHERI capabilities where the size of the pointer is 2*addr) since there is additional metadata. For most "normal" architectures both of these will be the same. |
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-llvm-ir Author: Alexander Richardson (arichardson) ChangesThis function can be used to retrieve the number of bits that can be used Importantly, this is not always the same as the index width since some Currently, this new API is not used anywhere, but the plan is to introduce Originally uploaded as https://reviews.llvm.org/D135158 Full diff: https://github.com/llvm/llvm-project/pull/137412.diff 4 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 33c85c7ba9d29..213f3f7e0e5f5 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3134,16 +3134,24 @@ as follows:
``A<address space>``
Specifies the address space of objects created by '``alloca``'.
Defaults to the default address space of 0.
-``p[n]:<size>:<abi>[:<pref>][:<idx>]``
+``p[n]:<size>:<abi>[:<pref>][:<idx>][:<addr>]``
This specifies the *size* of a pointer and its ``<abi>`` and
``<pref>``\erred alignments for address space ``n``. ``<pref>`` is optional
- and defaults to ``<abi>``. The fourth parameter ``<idx>`` is the size of the
- index that used for address calculation, which must be less than or equal
- to the pointer size. If not
- specified, the default index size is equal to the pointer size. All sizes
- are in bits. The address space, ``n``, is optional, and if not specified,
+ and defaults to ``<abi>``.
+ The fourth parameter ``<idx>`` is the size of the index that used for
+ address calculations such as :ref:`getelementptr <i_getelementptr>`.
+ It must be less than or equal to the pointer size. If not specified, the
+ default index size is equal to the pointer size.
+ The fifth parameter ``<addr>`` specifies the width of addresses in this
+ address space. If not specified, the default address size is equal to the
+ index size. The address size may be wider than the index size as it could be
+ calculated relative to a base address. For example AMDGPU buffer fat
+ pointers use a 48-bit address range, but only allow for 32 bits of indexing.
+ All sizes are in bits.
+ The address space, ``n``, is optional, and if not specified,
denotes the default address space 0. The value of ``n`` must be
in the range [1,2^24).
+
``i<size>:<abi>[:<pref>]``
This specifies the alignment for an integer type of a given bit
``<size>``. The value of ``<size>`` must be in the range [1,2^24).
@@ -12996,9 +13004,9 @@ This instruction requires several arguments:
- Caller and callee both have the calling convention ``fastcc`` or ``tailcc``.
- The call is in tail position (ret immediately follows call and ret
uses value of call or is void).
- - Option ``-tailcallopt`` is enabled, ``llvm::GuaranteedTailCallOpt`` is
+ - Option ``-tailcallopt`` is enabled, ``llvm::GuaranteedTailCallOpt`` is
``true``, or the calling convention is ``tailcc``.
- - `Platform-specific constraints are met.
+ - `Platform-specific constraints are met.
<CodeGenerator.html#tail-call-optimization>`_
#. The optional ``notail`` marker indicates that the optimizers should not add
diff --git a/llvm/include/llvm/IR/DataLayout.h b/llvm/include/llvm/IR/DataLayout.h
index 2ad080e6d0cd2..3e7f3b79c0a8c 100644
--- a/llvm/include/llvm/IR/DataLayout.h
+++ b/llvm/include/llvm/IR/DataLayout.h
@@ -78,6 +78,7 @@ class DataLayout {
Align ABIAlign;
Align PrefAlign;
uint32_t IndexBitWidth;
+ uint32_t AddressBitWidth;
/// Pointers in this address space don't have a well-defined bitwise
/// representation (e.g. may be relocated by a copying garbage collector).
/// Additionally, they may also be non-integral (i.e. containing additional
@@ -148,7 +149,7 @@ class DataLayout {
/// Sets or updates the specification for pointer in the given address space.
void setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth, Align ABIAlign,
Align PrefAlign, uint32_t IndexBitWidth,
- bool IsNonIntegral);
+ uint32_t AddressBitWidth, bool IsNonIntegral);
/// Internal helper to get alignment for integer of given bitwidth.
Align getIntegerAlignment(uint32_t BitWidth, bool abi_or_pref) const;
@@ -324,12 +325,26 @@ class DataLayout {
/// the backends/clients are updated.
Align getPointerPrefAlignment(unsigned AS = 0) const;
- /// Layout pointer size in bytes, rounded up to a whole
- /// number of bytes.
+ /// Layout pointer size in bytes, rounded up to a whole number of bytes. The
+ /// difference between this function and getPointerAddressSize() is this one
+ /// returns the size of the entire pointer type (this includes metadata bits
+ /// for fat pointers) and the latter only returns the number of address bits.
+ /// \sa DataLayout::getPointerAddressSizeInBits
/// FIXME: The defaults need to be removed once all of
/// the backends/clients are updated.
unsigned getPointerSize(unsigned AS = 0) const;
+ /// Returns the integral size of a pointer in a given address space in bytes.
+ /// For targets that store bits in pointers that are not part of the address,
+ /// this returns the number of bits that can be manipulated using operations
+ /// that change the address (e.g. addition/subtraction).
+ /// For example, a 64-bit CHERI-enabled target has 128-bit pointers of which
+ /// only 64 are used to represent the address and the remaining ones are used
+ /// for metadata such as bounds and access permissions. In this case
+ /// getPointerSize() returns 16, but getPointerAddressSize() returns 8.
+ /// \sa DataLayout::getPointerSize
+ unsigned getPointerAddressSize(unsigned AS) const;
+
// Index size in bytes used for address calculation,
/// rounded up to a whole number of bytes.
unsigned getIndexSize(unsigned AS) const;
@@ -365,6 +380,14 @@ class DataLayout {
return getPointerSpec(AS).BitWidth;
}
+ /// The size of an address in for the given AS. This is usually the same size
+ /// as the index width but in same cases (e.g. AMDGPU buffer fat pointers with
+ /// 48-bit addresses and 32-bit offsets), the address size can be larger than
+ /// the valid range of indexing.
+ unsigned getPointerAddressSizeInBits(unsigned AS) const {
+ return getPointerSpec(AS).AddressBitWidth;
+ }
+
/// Size in bits of index used for address calculation in getelementptr.
unsigned getIndexSizeInBits(unsigned AS) const {
return getPointerSpec(AS).IndexBitWidth;
diff --git a/llvm/lib/IR/DataLayout.cpp b/llvm/lib/IR/DataLayout.cpp
index 0cf0bfc9702d3..d818ae8c42371 100644
--- a/llvm/lib/IR/DataLayout.cpp
+++ b/llvm/lib/IR/DataLayout.cpp
@@ -152,6 +152,7 @@ bool DataLayout::PointerSpec::operator==(const PointerSpec &Other) const {
return AddrSpace == Other.AddrSpace && BitWidth == Other.BitWidth &&
ABIAlign == Other.ABIAlign && PrefAlign == Other.PrefAlign &&
IndexBitWidth == Other.IndexBitWidth &&
+ AddressBitWidth == Other.AddressBitWidth &&
IsNonIntegral == Other.IsNonIntegral;
}
@@ -208,7 +209,7 @@ constexpr DataLayout::PrimitiveSpec DefaultVectorSpecs[] = {
// Default pointer type specifications.
constexpr DataLayout::PointerSpec DefaultPointerSpecs[] = {
// p0:64:64:64:64
- {0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, false},
+ {0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, 64, false},
};
DataLayout::DataLayout()
@@ -414,8 +415,9 @@ Error DataLayout::parsePointerSpec(StringRef Spec) {
assert(Spec.front() == 'p');
Spec.drop_front().split(Components, ':');
- if (Components.size() < 3 || Components.size() > 5)
- return createSpecFormatError("p[<n>]:<size>:<abi>[:<pref>[:<idx>]]");
+ if (Components.size() < 3 || Components.size() > 6)
+ return createSpecFormatError(
+ "p[<n>]:<size>:<abi>[:<pref>[:<idx>[:<addr>]]]");
// Address space. Optional, defaults to 0.
unsigned AddrSpace = 0;
@@ -454,8 +456,17 @@ Error DataLayout::parsePointerSpec(StringRef Spec) {
return createStringError(
"index size cannot be larger than the pointer size");
+ unsigned AddressBitWidth = IndexBitWidth;
+ if (Components.size() > 5)
+ if (Error Err = parseSize(Components[5], AddressBitWidth, "address size"))
+ return Err;
+
+ if (AddressBitWidth > BitWidth)
+ return createStringError(
+ "address size cannot be larger than the pointer size");
+
setPointerSpec(AddrSpace, BitWidth, ABIAlign, PrefAlign, IndexBitWidth,
- false);
+ AddressBitWidth, false);
return Error::success();
}
@@ -631,7 +642,7 @@ Error DataLayout::parseLayoutString(StringRef LayoutString) {
// the spec for AS0, and we then update that to mark it non-integral.
const PointerSpec &PS = getPointerSpec(AS);
setPointerSpec(AS, PS.BitWidth, PS.ABIAlign, PS.PrefAlign, PS.IndexBitWidth,
- true);
+ PS.AddressBitWidth, true);
}
return Error::success();
@@ -679,16 +690,19 @@ DataLayout::getPointerSpec(uint32_t AddrSpace) const {
void DataLayout::setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth,
Align ABIAlign, Align PrefAlign,
- uint32_t IndexBitWidth, bool IsNonIntegral) {
+ uint32_t IndexBitWidth,
+ uint32_t AddressBitWidth, bool IsNonIntegral) {
auto I = lower_bound(PointerSpecs, AddrSpace, LessPointerAddrSpace());
if (I == PointerSpecs.end() || I->AddrSpace != AddrSpace) {
PointerSpecs.insert(I, PointerSpec{AddrSpace, BitWidth, ABIAlign, PrefAlign,
- IndexBitWidth, IsNonIntegral});
+ IndexBitWidth, AddressBitWidth,
+ IsNonIntegral});
} else {
I->BitWidth = BitWidth;
I->ABIAlign = ABIAlign;
I->PrefAlign = PrefAlign;
I->IndexBitWidth = IndexBitWidth;
+ I->AddressBitWidth = AddressBitWidth;
I->IsNonIntegral = IsNonIntegral;
}
}
@@ -728,6 +742,10 @@ const StructLayout *DataLayout::getStructLayout(StructType *Ty) const {
return L;
}
+unsigned DataLayout::getPointerAddressSize(unsigned AS) const {
+ return divideCeil(getPointerAddressSizeInBits(AS), 8);
+}
+
Align DataLayout::getPointerABIAlignment(unsigned AS) const {
return getPointerSpec(AS).ABIAlign;
}
diff --git a/llvm/unittests/IR/DataLayoutTest.cpp b/llvm/unittests/IR/DataLayoutTest.cpp
index 16a603ff6416f..02a87a2a77d80 100644
--- a/llvm/unittests/IR/DataLayoutTest.cpp
+++ b/llvm/unittests/IR/DataLayoutTest.cpp
@@ -312,12 +312,12 @@ TEST(DataLayout, ParsePointerSpec) {
"p16777215:32:32:64:8", "p16777215:16777215:32768:32768:16777215"})
EXPECT_THAT_EXPECTED(DataLayout::parse(Str), Succeeded());
- for (StringRef Str :
- {"p", "p0", "p:32", "p0:32", "p:32:32:32:32:32", "p0:32:32:32:32:32"})
+ for (StringRef Str : {"p", "p0", "p:32", "p0:32", "p:32:32:32:32:32:32",
+ "p0:32:32:32:32:32:32"})
EXPECT_THAT_EXPECTED(
DataLayout::parse(Str),
FailedWithMessage("malformed specification, must be of the form "
- "\"p[<n>]:<size>:<abi>[:<pref>[:<idx>]]\""));
+ "\"p[<n>]:<size>:<abi>[:<pref>[:<idx>[:<addr>]]]\""));
// address space
for (StringRef Str : {"p0x0:32:32", "px:32:32:32", "p16777216:32:32:32:32"})
@@ -401,6 +401,27 @@ TEST(DataLayout, ParsePointerSpec) {
EXPECT_THAT_EXPECTED(
DataLayout::parse(Str),
FailedWithMessage("index size cannot be larger than the pointer size"));
+
+ // address size
+ for (StringRef Str : {"p:64:32:32:64:", "p0:64:32:32:64:"})
+ EXPECT_THAT_EXPECTED(
+ DataLayout::parse(Str),
+ FailedWithMessage("address size component cannot be empty"));
+
+ // Note: in the future we might allow 0 for address size to indicate pointers
+ // that do not have a meaning full address (e.g. relocatable GC pointers).
+ for (StringRef Str :
+ {"p:32:32:32:32:0", "p0:32:32:32:32:0x20", "p42:32:32:32:32:16777216"})
+ EXPECT_THAT_EXPECTED(
+ DataLayout::parse(Str),
+ FailedWithMessage("address size must be a non-zero 24-bit integer"));
+
+ for (StringRef Str :
+ {"p:16:16:16:16:17", "p0:32:64:64:32:64", "p42:16:64:64:16:32"})
+ EXPECT_THAT_EXPECTED(
+ DataLayout::parse(Str),
+ FailedWithMessage(
+ "address size cannot be larger than the pointer size"));
}
TEST(DataLayoutTest, ParseNativeIntegersSpec) {
@@ -523,6 +544,40 @@ TEST(DataLayout, GetIndexSize) {
}
}
+TEST(DataLayout, GetAddressSizeInBits) {
+ // Address size defaults to index size
+ std::tuple<StringRef, unsigned, unsigned, unsigned> Cases[] = {
+ {"", 64, 64, 64},
+ {"p:16:32", 16, 16, 16},
+ {"p0:32:64", 32, 32, 32},
+ {"p1:16:32:32:10", 64, 10, 64},
+ {"p1:31:32:64:10:20-p2:17:16:16:16:15", 64, 20, 15},
+ };
+ for (auto [Layout, V0, V1, V2] : Cases) {
+ DataLayout DL = cantFail(DataLayout::parse(Layout));
+ EXPECT_EQ(DL.getPointerAddressSizeInBits(0), V0) << Layout;
+ EXPECT_EQ(DL.getPointerAddressSizeInBits(1), V1) << Layout;
+ EXPECT_EQ(DL.getPointerAddressSizeInBits(2), V2) << Layout;
+ }
+}
+
+TEST(DataLayout, GetAddressSize) {
+ // Address size defaults to index size
+ std::tuple<StringRef, unsigned, unsigned, unsigned> Cases[] = {
+ {"", 8, 8, 8},
+ {"p:16:32", 2, 2, 2},
+ {"p0:27:64", 4, 4, 4},
+ {"p1:19:32:64:19:5", 8, 1, 8},
+ {"p1:33:32:64:33:23-p2:21:8:16:21:13", 8, 3, 2},
+ };
+ for (auto [Layout, V0, V1, V2] : Cases) {
+ DataLayout DL = cantFail(DataLayout::parse(Layout));
+ EXPECT_EQ(DL.getPointerAddressSize(0), V0) << Layout;
+ EXPECT_EQ(DL.getPointerAddressSize(1), V1) << Layout;
+ EXPECT_EQ(DL.getPointerAddressSize(2), V2) << Layout;
+ }
+}
+
TEST(DataLayout, GetPointerABIAlignment) {
std::tuple<StringRef, unsigned, unsigned, unsigned> Cases[] = {
{"", 8, 8, 8},
|
llvm/docs/LangRef.rst
Outdated
address space. If not specified, the default address size is equal to the | ||
index size. The address size may be wider than the index size as it could be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not 100% sure about this choice: should we use the index size or the pointer size as the default? In most case those will be the same so I am not sure it really matters.
Created using spr 1.3.6-beta.1
address space. If not specified, the default address size is equal to the | ||
index size. The address size may be wider than the index size as it could be | ||
calculated relative to a base address. For example AMDGPU buffer fat | ||
pointers use a 48-bit address range, but only allow for 32 bits of indexing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to explicitly document and enforce a constraint of the form size >= addr >= idx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good - If we ever end up supporting a target where that does not hold we can always adjust it again. E.g. some hypothetical target where addresses are 64 bits but the pointer representation is a 32-bit offset relative to some base address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... Isn't that the AMDGPU stack pointer in some of its implementations, even though you're not supposed to think of it like that? (Opinions have varied over the generations on if it's "32-bit offset from buffer resource the compiler creates" to "32-bit offset from implicit scratch base" and so on)? Though I'd need more consensus from AMDGPU forks on if we ever want to expose the lie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine that within the stack address space you have 32-bit range and the 32-bit offset from the fixed base uniquely identifies the underlying address?
In that case I don't think you'd need to expose the underlying "real address" for ptrtoaddr
. I'd view that as being similar to running a 32-bit program on 64-bit kernel where the virtual memory system translates those 32-bit addresses to a final 64 (well likely less) bit physical address.
I'm not sure if there is any advantage in exposing the 48-bit (?) final address, but if you think there is we can definitely relax this restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's much virtue to it, I'm just asking annoying questions
address calculations such as :ref:`getelementptr <i_getelementptr>`. | ||
It must be less than or equal to the pointer size. If not specified, the | ||
default index size is equal to the pointer size. | ||
The fifth parameter ``<addr>`` specifies the width of addresses in this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to get too bikeshed-y on this, but I find the pointer size vs address size terminology confusing at a glance, and I worry that we'll end up with a lot of pieces in the code base inadvertently using the wrong one.
If you feel strongly that this is the best terminology, I won't hold it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this could be confusing but I haven't been able to come up with a better name. Maybe pointer size should be "representation size"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally used pointer range in CHERI LLVM, but I wasn't happy with it because it's actually log2(pointer range). Address size is more correct, at least.
Created using spr 1.3.6-beta.1
FailedWithMessage("address size component cannot be empty")); | ||
|
||
// Note: in the future we might allow 0 for address size to indicate pointers | ||
// that do not have a meaning full address (e.g. relocatable GC pointers). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// that do not have a meaning full address (e.g. relocatable GC pointers). | |
// that do not have a meaningful address (e.g. relocatable GC pointers). |
Closing this PR that introduces a new component since the conclusion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/49 was to define it to be the same as the index width. Uploaded #139347 instead for the new version. |
This function can be used to retrieve the number of bits that can be used
for arithmetic in a given address space (i.e. the range of the address
space). For most in-tree targets this should not make any difference
but differentiating between the size of a pointer in bits and the address
range is extremely important e.g. for CHERI-enabled targets, where pointers
carry additional metadata such as bounds and permissions and only a subset
of the pointer bits is used as the address.
Importantly, this is not always the same as the index width since some
architectures have an address space that is wider than the GEP indexing
allows (e.g. because pointers are represented as base+offset).
The only in-tree target that this currently affects are AMDGPU buffer
fat pointers, which consist of a 48 bit base address and a 32-bit offset.
Currently, this new API is not used anywhere, but the plan is to introduce
a ptrtoaddr operation in the future. See the discussion on
https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/38
Originally uploaded as https://reviews.llvm.org/D135158