Skip to content

[BOLT] Support instrumentation hook via DT_FINI_ARRAY #67348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 8, 2023

Conversation

mtvec
Copy link
Contributor

@mtvec mtvec commented Sep 25, 2023

BOLT currently hooks its its instrumentation finalization function via DT_FINI. However, this method of calling finalization routines is not supported anymore on newer ABIs like RISC-V. DT_FINI_ARRAY is preferred there.

This patch adds support for hooking into DT_FINI_ARRAY instead if the binary does not have a DT_FINI entry. If it does, DT_FINI takes precedence so this patch should not change how the currently supported instrumentation targets behave.

DT_FINI_ARRAY points to an array in memory of DT_FINI_ARRAYSZ bytes. It consists of pointer-length entries that contain the addresses of finalization functions. However, the addresses are only filled-in by the dynamic linker at load time using relative relocations. This makes hooking via DT_FINI_ARRAY a bit more complicated than via DT_FINI.

The implementation works as follows:

  • While scanning the binary: find the section where DT_FINI_ARRAY points to, read its first dynamic relocation and use its addend to find the address of the fini function we will use to hook;
  • While writing the output file: overwrite the addend of the dynamic relocation with the address of the runtime library's fini function.

Updating the dynamic relocation required a bit of boiler plate: since dynamic relocations are stored in a std::multiset which doesn't support getting mutable references to its items, functions were added to BinarySection to take an existing relocation and insert a new one.

Note on testing: I can currently only test this on RISC-V, but its instrumentation support hasn't been upstreamed yet as it depends on this patch which I would like to review separately. I cannot get this patch to work on x86, even when forcing the linker to omit DT_FINI. The reason is that, even though DT_FINI_ARRAY exists and has valid entries, the functions do not show up in the symbols table which triggers some asserts later in BOLT. I would propose to not add tests right now but wait until RISC-V instrumentation is upstreamed which will automatically test this patch.

Copy link
Member

@yota9 yota9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! These are my first impressions, I would probably return to re-reviewing this patch in a few days :)

mtvec added a commit to mtvec/llvm-project that referenced this pull request Sep 27, 2023
Relocation for 64-bit absolute values.

Note that this patch also implements two `Relocation` methods
(`encodeValue` and `getAbs64`) that will become necessary to support
instrumentation of non-PIE binaries (see llvm#67348). However, I didn't find
a way to test those independent of instrumentation.
@mtvec
Copy link
Contributor Author

mtvec commented Oct 5, 2023

Thanks for the fix! It seems to be good now. Only the tests seems to left. Just a hint, I would build the single test with 3 different states: only DT_FINI, only DT_FINI_ARRAY, and both for pie and non-pie cases

Added tests for the cases you suggested. Note that I tested on AArch64 because of the issues on x86 as explained in the description of this PR. This meant I had to update encodeValueAArch64 to accept R_AARCH64_ABS64 relocs.

each time checking the static and if needed dynamic relocs were updated.

Instead of doing this, I simply run the instrumented binary in the tests and verify the generated profile makes sense. I feel this should be enough because it implicitly verifies the fini function was hooked correctly. What do you think?

@yota9
Copy link
Member

yota9 commented Oct 5, 2023

Instead of doing this, I simply run the instrumented binary in the tests and verify the generated profile makes sense. I feel this should be enough because it implicitly verifies the fini function was hooked correctly. What do you think?

Thanks, I would check the test a bit later. But in BOLT we are trying to avoid runtime tests where it is possible, for example because there are no aarch64 tests machines right now. I'm OK with adding both runtime and non-runtime tests, but I think runtime-only for this case is not enough, sorry :(

@mtvec
Copy link
Contributor Author

mtvec commented Oct 5, 2023

But in BOLT we are trying to avoid runtime tests where it is possible, for example because there are no aarch64 tests machines right now.

With "runtime tests" you mean tests where a binary is executed? Or tests that trigger runtime instrumentation?

As far as I can tell, the only way to trigger this new functionality is to instrument the binary. And the only way to instrument an AArch64 binary, is to compile and run BOLT on an AArch64 machine (the runtime library cannot currently be cross-compiled). Or am I missing something?

@yota9
Copy link
Member

yota9 commented Oct 5, 2023

I mean that tests in runtime folder are going to be executed. So ideally we would like the non-runtime test without binary execution to check the functionality. It is true that currently we're not supporting cross-compilation for runtime binary, that is why for non-runtime tests we have extra arch requirement. But we have this task in our backlog, so hopefully it would be changed and we would be able to run these tests too.

@mtvec
Copy link
Contributor Author

mtvec commented Oct 5, 2023

I mean that tests in runtime folder are going to be executed. So ideally we would like the non-runtime test without binary execution to check the functionality. It is true that currently we're not supporting cross-compilation for runtime binary, that is why for non-runtime tests we have extra arch requirement. But we have this task in our backlog, so hopefully it would be changed and we would be able to run these tests too.

Just to be sure: you also don't see another way to test this so we keep the current test?

@yota9
Copy link
Member

yota9 commented Oct 5, 2023

I mean that tests in runtime folder are going to be executed. So ideally we would like the non-runtime test without binary execution to check the functionality. It is true that currently we're not supporting cross-compilation for runtime binary, that is why for non-runtime tests we have extra arch requirement. But we have this task in our backlog, so hopefully it would be changed and we would be able to run these tests too.

Just to be sure: you also don't see another way to test this so we keep the current test?

We can keep the current test I guess, but it would be better to add non-runtime.. Since currently it relies on the binary successfully exit - no, I don't. The non-runtime test should be easy to do, but it is just monotonic job, although the filecheck prefixes would be repeated across the multiple tests, so I guess it won't be too hard to find the needed symbol and check relocation(s) against it..

@mtvec
Copy link
Contributor Author

mtvec commented Oct 6, 2023

We can keep the current test I guess, but it would be better to add non-runtime.. Since currently it relies on the binary successfully exit

Just to understand things better: why is this a problem?

The non-runtime test should be easy to do, but it is just monotonic job, although the filecheck prefixes would be repeated across the multiple tests, so I guess it won't be too hard to find the needed symbol and check relocation(s) against it..

As far as I understand, the problem is not writing the test but actually running it since we currently need an AArch64 host for that. And if we have an AArch64 host, we might as well just execute the instrumented binary for testing purposes. Or am I missing something?

@yota9
Copy link
Member

yota9 commented Oct 6, 2023

As far as I understand, the problem is not writing the test but actually running it since we currently need an AArch64 host for that. And if we have an AArch64 host, we might as well just execute the instrumented binary for testing purposes. Or am I missing something?

No, you're right about it :)

Just to understand things better: why is this a problem?

I think you might misunderstood me. It is not a problem to have this runtime test, I only say that we prefer the non runtime tests in BOLT. We still can have this one, it is fine, but ideally since it is easy to arrange I would suggest to add non runtime test too.

@mtvec
Copy link
Contributor Author

mtvec commented Oct 6, 2023

I think you might misunderstood me. It is not a problem to have this runtime test, I only say that we prefer the non runtime tests in BOLT.

I guess I wanted to know why non-runtime tests are preferred. Is it so that we can run them once we have a way to cross-compile the runtime library?

I tried adding non-runtime tests but the problem is that __bolt_instr_fini (which ends up in DT_FINI or DT_FINI_ARRAY) is not exposed in the binary's symbol table. This means that the tests would need to hard-code an address that can change at any time. This feels extremely fragile.

@yota9
Copy link
Member

yota9 commented Oct 6, 2023

Is it so that we can run them once we have a way to cross-compile the runtime library?

True. Also in this case you would check your changes directly, which is nice too :)

I tried adding non-runtime tests but the problem is that __bolt_instr_fini (which ends up in DT_FINI or DT_FINI_ARRAY) is not exposed in the binary's symbol table. This means that the tests would need to hard-code an address that can change at any time. This feels extremely fragile.

I see. Yes hardcode is not a good option. Let me think a bit about it too. A guess it would be enough to check that the address is somewhere in inserted section, but it might be tricky to implement this too.

@yota9
Copy link
Member

yota9 commented Oct 8, 2023

Hi @mtvec . Sorry for late response, didn't have time to study the problem. Anyway I came up that adding proper INIT/FINI symbols to symtab would be the most right way to deal with your test problems. Fill free to rebase your patch on top of the #68505 , I think it would eliminate your problem with creating a test. Thanks!

yota9 added a commit to yota9/llvm-project that referenced this pull request Oct 9, 2023
Add absent start & fini symbols, currently setted by bolt for runtime
libraries at DT_INIT and DT_FINI. The proper tests would be added by the
llvm#67348 PR.
@mtvec
Copy link
Contributor Author

mtvec commented Oct 10, 2023

Thanks for that @yota9! I added non-runtime tests.

yota9 added a commit that referenced this pull request Oct 10, 2023
Add absent start & fini symbols, currently setted by bolt for runtime
libraries at DT_INIT and DT_FINI. The proper tests would be added by the
#67348 PR.
Copy link
Member

@yota9 yota9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test! I've leaved couple of comments, would return to this review in a couple of days to re-review it last time :)

Copy link
Member

@yota9 yota9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase the changes, I've committed adding symbols today.

@mtvec
Copy link
Contributor Author

mtvec commented Oct 11, 2023

Please rebase the changes, I've committed adding symbols today.

Sure, I was postponing this to avoid comments getting messed up (which apparently happens sometime when force-pushing). Let's see if it is ok :)

@mtvec mtvec force-pushed the bolt-instr-hook-fini-array branch from 624d9be to ab11fbc Compare October 11, 2023 12:16
Copy link
Member

@yota9 yota9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch :)

@mtvec
Copy link
Contributor Author

mtvec commented Oct 16, 2023

Thanks a lot for your help reviewing this patch @yota9!

@maksfb @rafaelauler @aaupov: does this look good for you as well?

@mtvec
Copy link
Contributor Author

mtvec commented Oct 30, 2023

@maksfb @rafaelauler @aaupov: does this look good for you as well?

Ping.

@rafaelauler
Copy link
Contributor

rafaelauler commented Oct 31, 2023

Overall looks good to me.

BOLT currently hooks its its instrumentation finalization function via DT_FINI. However, this method of calling finalization routines is not supported anymore on newer ABIs like RISC-V. DT_FINI_ARRAY is preferred there.

What about DT_INIT, what does the ABI say? Usually there is symmetry between INIT and FINI, INITARRAY and FINIARRAY.

BTW ideally we would like to add a new entry to dynamic table and insert ourselves into DT_PREINIT_ARRAY, but this is outside the scope of this patch. We don't currently rewrite the dynamic table so we are not doing that for this reason. We need to run our code before every constructor, that's why PREINIT is more adequate.

EDIT: PREINIT_ARRAY is only for binaries, they're not used for DSOs. The context on this issue is that we currently might crash with instrumentation if your binary runs constructors in a dependent lib, and that dependent lib runs instrumented code from your binary. PREINIT_ARRAY will guarantee that we run before any constructors in DSOs linked in DT_NEEDED.

While scanning the binary: find the section where DT_FINI_ARRAY points to, read its first dynamic relocation and use its addend to find the address of the fini function we will use to hook;

If we're being pedantic, I think we need to patch the last entry of FINI_ARRAY, not the first. If you patch the first entry, you will still run destructors after your runtime lib. But I don't think we're going to crash because of this (at least for instrumentation), since instrumentation finalization code will just dump profile. No need to implement this, but we should at least have a comment documenting this behavior, so nobody is surprised in the future if they try to implement some runtime lib that expects to run its finalization code after application code.

@mtvec
Copy link
Contributor Author

mtvec commented Nov 2, 2023

What about DT_INIT, what does the ABI say? Usually there is symmetry between INIT and FINI, INITARRAY and FINIARRAY.

At least on RISC-V, DT_INIT is also not used. However, BOLT patches the ELF entry point for runtime initialization in executables. I see it also patches DT_INIT for, iiuc, libraries, so we might want to support DT_INIT_ARRAY for this in the future.

BTW ideally we would like to add a new entry to dynamic table and insert ourselves into DT_PREINIT_ARRAY, but this is outside the scope of this patch. We don't currently rewrite the dynamic table so we are not doing that for this reason. We need to run our code before every constructor, that's why PREINIT is more adequate.

EDIT: PREINIT_ARRAY is only for binaries, they're not used for DSOs. The context on this issue is that we currently might crash with instrumentation if your binary runs constructors in a dependent lib, and that dependent lib runs instrumented code from your binary. PREINIT_ARRAY will guarantee that we run before any constructors in DSOs linked in DT_NEEDED.

This indeed makes sense for a future enhancement.

If we're being pedantic, I think we need to patch the last entry of FINI_ARRAY, not the first. If you patch the first entry, you will still run destructors after your runtime lib. But I don't think we're going to crash because of this (at least for instrumentation), since instrumentation finalization code will just dump profile. No need to implement this, but we should at least have a comment documenting this behavior, so nobody is surprised in the future if they try to implement some runtime lib that expects to run its finalization code after application code.

The entries in DT_FINI_ARRAY are run in reverse order so I think using the first entry is actually the correct thing to do (although admittedly, I didn't think about this before you mentioned it :)).

mtvec added a commit to mtvec/llvm-project that referenced this pull request Nov 2, 2023
BOLT currently hooks its its instrumentation finalization function via
`DT_FINI`. However, this method of calling finalization routines is not
supported anymore on newer ABIs like RISC-V. `DT_FINI_ARRAY` is
preferred there.

This patch adds support for hooking into `DT_FINI_ARRAY` instead if the
binary does not have a `DT_FINI` entry. If it does, `DT_FINI` takes
precedence so this patch should not change how the currently supported
instrumentation targets behave.

`DT_FINI_ARRAY` points to an array in memory of `DT_FINI_ARRAYSZ` bytes.
It consists of pointer-length entries that contain the addresses of
finalization functions. However, the addresses are only filled-in by the
dynamic linker at load time using relative relocations. This makes
hooking via `DT_FINI_ARRAY` a bit more complicated than via `DT_FINI`.

The implementation works as follows:
- While scanning the binary: find the section where `DT_FINI_ARRAY`
  points to, read its first dynamic relocation and use its addend to
  find the address of the fini function we will use to hook;
- While writing the output file: overwrite the addend of the dynamic
  relocation with the address of the runtime library's fini function.

Updating the dynamic relocation required a bit of boiler plate: since
dynamic relocations are stored in a `std::multiset` which doesn't
support getting mutable references to its items, functions were added to
`BinarySection` to take an existing relocation and insert a new one.

Note on testing: I can currently only test this on RISC-V, but its
instrumentation support hasn't been upstreamed yet as it depends on this
patch which I would like to review separately. I cannot get this patch
to work on x86, even when forcing the linker to omit `DT_FINI`. The
reason is that, even though `DT_FINI_ARRAY` exists and has valid
entries, the functions do not show up in the symbols table which
triggers some asserts later in BOLT. I would propose to not add tests
right now but wait until RISC-V instrumentation is upstreamed which will
automatically test this patch.
Copy link
Contributor

@rafaelauler rafaelauler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

BOLT currently hooks its its instrumentation finalization function via
`DT_FINI`. However, this method of calling finalization routines is not
supported anymore on newer ABIs like RISC-V. `DT_FINI_ARRAY` is
preferred there.

This patch adds support for hooking into `DT_FINI_ARRAY` instead if the
binary does not have a `DT_FINI` entry. If it does, `DT_FINI` takes
precedence so this patch should not change how the currently supported
instrumentation targets behave.

`DT_FINI_ARRAY` points to an array in memory of `DT_FINI_ARRAYSZ` bytes.
It consists of pointer-length entries that contain the addresses of
finalization functions. However, the addresses are only filled-in by the
dynamic linker at load time using relative relocations. This makes
hooking via `DT_FINI_ARRAY` a bit more complicated than via `DT_FINI`.

The implementation works as follows:
- While scanning the binary: find the section where `DT_FINI_ARRAY`
  points to, read its first dynamic relocation and use its addend to
  find the address of the fini function we will use to hook;
- While writing the output file: overwrite the addend of the dynamic
  relocation with the address of the runtime library's fini function.

Updating the dynamic relocation required a bit of boiler plate: since
dynamic relocations are stored in a `std::multiset` which doesn't
support getting mutable references to its items, functions were added to
`BinarySection` to take an existing relocation and insert a new one.

Note on testing: I can currently only test this on RISC-V, but its
instrumentation support hasn't been upstreamed yet as it depends on this
patch which I would like to review separately. I cannot get this patch
to work on x86, even when forcing the linker to omit `DT_FINI`. The
reason is that, even though `DT_FINI_ARRAY` exists and has valid
entries, the functions do not show up in the symbols table which
triggers some asserts later in BOLT. I would propose to not add tests
right now but wait until RISC-V instrumentation is upstreamed which will
automatically test this patch.
@mtvec mtvec force-pushed the bolt-instr-hook-fini-array branch from e67bb1a to 823bd8a Compare November 8, 2023 10:49
@mtvec mtvec merged commit 96b5e09 into llvm:main Nov 8, 2023
@mtvec mtvec deleted the bolt-instr-hook-fini-array branch November 8, 2023 11:01
@aaupov
Copy link
Contributor

aaupov commented Nov 8, 2023

Please check test failure on aarch64 buildbot: https://lab.llvm.org/buildbot/#/builders/221/builds/20084

@mtvec
Copy link
Contributor Author

mtvec commented Nov 9, 2023

Please check test failure on aarch64 buildbot: https://lab.llvm.org/buildbot/#/builders/221/builds/20084

Woops, should be fixed by c4b096a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants