Description
(As observed on rustc 1.40.0-nightly (4a8c5b20c 2019-10-23)
targeting thumbv7em-none-eabihf
; CC @tmandry)
I'm making my first aggressive use of async fn
in an application. It's a deeply-embedded performance-sensitive application, and I wind up inspecting the disassembly output a lot (using objdump
).
This is complicated by the fact that basically all of my functions are named poll_with_tls_context
. (Some of them aren't -- some of them are named after future combinators.)
For example, here is my function called poll_with_tls_context
calling another one, also named poll_with_tls_context
:
; This is an ARMv-7M Thumb-2 listing.
080003b8 <core::future::poll_with_tls_context>:
80003b8: b570 push {r4, r5, r6, lr}
80003ba: 4604 mov r4, r0
; irrelevant setup omitted...
80003f4: f000 fa3c bl 8000870 <core::future::poll_with_tls_context> ; note different addr
80003f8: 2101 movs r1, #1
80003fa: 2800 cmp r0, #0
; ...and so on
(The observant reader will note poll_with_tls_context
does not appear in libcore. That's correct -- I've hacked async
in a #[no_std]
environment. I'm pretty sure the hack is not the problem.)
I understand why this is happening: poll_with_tls_context
is an implementation detail of the current lowering of async fn
, and it is being specialized to the future type it's given, hence many such functions. But I also don't think it's ideal.
(For what it's worth, I can change the situation by forcing poll_with_tls_context
to inline, though this produces unacceptable code bloat in my application (and this option isn't available for people who aren't open to using a patched libstd). By default, poll_with_tls_context
doesn't inline, but get_task_context
does, which seems like the right result for size/speed.)
I am compiling at opt-level = 3
with an override for debug = true
in my release profile.