Skip to content

Large files containing many tokens of const data compile very slowly and use a lot of memory (in MIR_borrow_checking and expand_crate) #134404

Open
@Manishearth

Description

@Manishearth

ICU4X has a concept of "baked data", a way of "baking" locale data into the source of a program in the form of consts. This has a bunch of performance benefits: loading data from the binary is essentially free and doesn't involve any sort of deserialization.

However, we have been facing issues with cases where a single crate contains a lot of data.

I have a minimal testcase here: https://github.com/Manishearth/icu4x_compile_sample. It removes most of the cruft whilst still having an interesting-enough AST in the const data. cargo build in the demo folder takes 51s, using almost a gigabyte of RAM. Removing the macro does improve things slightly, but not overly slow.

Some interesting snippets of time-passes:

...
time:   1.194; rss:   52MB ->  595MB ( +543MB)	expand_crate
time:   1.194; rss:   52MB ->  595MB ( +543MB)	macro_expand_crate
...
time:   3.720; rss:  682MB ->  837MB ( +155MB)	type_check_crate
...
time:  55.505; rss:  837MB -> 1058MB ( +221MB)	MIR_borrow_checking
...
time:   0.124; rss: 1080MB ->  624MB ( -456MB)	free_global_ctxt
Full time-passes
time:   0.001; rss:   47MB ->   49MB (   +1MB)	parse_crate
time:   0.001; rss:   50MB ->   50MB (   +0MB)	incr_comp_prepare_session_directory
time:   0.000; rss:   50MB ->   51MB (   +1MB)	setup_global_ctxt
time:   0.000; rss:   52MB ->   52MB (   +0MB)	crate_injection
time:   1.194; rss:   52MB ->  595MB ( +543MB)	expand_crate
time:   1.194; rss:   52MB ->  595MB ( +543MB)	macro_expand_crate
time:   0.013; rss:  595MB ->  595MB (   +0MB)	AST_validation
time:   0.008; rss:  595MB ->  597MB (   +1MB)	finalize_macro_resolutions
time:   0.285; rss:  597MB ->  642MB (  +45MB)	late_resolve_crate
time:   0.012; rss:  642MB ->  642MB (   +0MB)	resolve_check_unused
time:   0.020; rss:  642MB ->  642MB (   +0MB)	resolve_postprocess
time:   0.326; rss:  595MB ->  642MB (  +46MB)	resolve_crate
time:   0.011; rss:  610MB ->  610MB (   +0MB)	write_dep_info
time:   0.011; rss:  610MB ->  611MB (   +0MB)	complete_gated_feature_checking
time:   0.058; rss:  765MB ->  729MB (  -35MB)	drop_ast
time:   1.213; rss:  610MB ->  681MB (  +71MB)	looking_for_derive_registrar
time:   1.421; rss:  610MB ->  682MB (  +72MB)	misc_checking_1
time:   0.086; rss:  682MB ->  690MB (   +8MB)	coherence_checking
time:   3.720; rss:  682MB ->  837MB ( +155MB)	type_check_crate
time:   0.000; rss:  837MB ->  837MB (   +0MB)	MIR_coroutine_by_move_body
time:  55.505; rss:  837MB -> 1058MB ( +221MB)	MIR_borrow_checking
time:   1.571; rss: 1058MB -> 1068MB (  +10MB)	MIR_effect_checking
time:   0.217; rss: 1068MB -> 1067MB (   -1MB)	module_lints
time:   0.217; rss: 1068MB -> 1067MB (   -1MB)	lint_checking
time:   0.311; rss: 1067MB -> 1068MB (   +0MB)	privacy_checking_modules
time:   0.607; rss: 1068MB -> 1068MB (   +0MB)	misc_checking_3
time:   0.000; rss: 1136MB -> 1137MB (   +1MB)	monomorphization_collector_graph_walk
time:   0.778; rss: 1068MB -> 1064MB (   -4MB)	generate_crate_metadata
time:   0.005; rss: 1064MB -> 1085MB (  +22MB)	codegen_to_LLVM_IR
time:   0.007; rss: 1076MB -> 1085MB (  +10MB)	LLVM_passes
time:   0.014; rss: 1064MB -> 1085MB (  +22MB)	codegen_crate
time:   0.257; rss: 1084MB -> 1080MB (   -4MB)	encode_query_results
time:   0.270; rss: 1084MB -> 1080MB (   -4MB)	incr_comp_serialize_result_cache
time:   0.270; rss: 1084MB -> 1080MB (   -4MB)	incr_comp_persist_result_cache
time:   0.271; rss: 1084MB -> 1080MB (   -4MB)	serialize_dep_graph
time:   0.124; rss: 1080MB ->  624MB ( -456MB)	free_global_ctxt
time:   0.000; rss:  624MB ->  624MB (   +0MB)	finish_ongoing_codegen
time:   0.127; rss:  624MB ->  653MB (  +29MB)	link_rlib
time:   0.135; rss:  624MB ->  653MB (  +29MB)	link_binary
time:   0.138; rss:  624MB ->  618MB (   -6MB)	link_crate
time:   0.139; rss:  624MB ->  618MB (   -6MB)	link
time:  65.803; rss:   32MB ->  187MB ( +155MB)	total

Even without the intermediate macro, expand_crate still increases RAM significantly, though the increase is halved:

time:   0.715; rss:   52MB ->  254MB ( +201MB)	expand_crate
time:   0.715; rss:   52MB ->  254MB ( +201MB)	macro_expand_crate

I understand that to some extent, we are simply feeding Rust a file that is megabytes in size and we cannot expect it to be too fast. It's interesting that MIR borrow checking is slowed down so much by this (there's relatively little to borrow check. I suspect there is MIR construction happening here too). The fact that the RAM usage is almost in the gigabytes is also somewhat concerning; the problematic source file is 7MB, but compilation takes a gigabyte of RAM, which is quite significant. Pair this with the fact that we have many such data files per crate (some of which are large) we end up hitting CI limits.

With the actual problem we were facing (unicode-org/icu4x#5230 (comment)), our time-passes numbers were:

...
time:   1.013; rss:   51MB -> 1182MB (+1130MB)	expand_crate
time:   1.013; rss:   51MB -> 1182MB (+1131MB)	macro_expand_crate
...
time:   6.609; rss: 1308MB -> 1437MB ( +128MB)	type_check_crate
time:  36.802; rss: 1437MB -> 2248MB ( +811MB)	MIR_borrow_checking
time:   2.214; rss: 2248MB -> 2270MB (  +22MB)	MIR_effect_checking
...

I'm hoping there is at least some low hanging fruit that can be improved here, or advice on how to avoid this problem. So far we've managed to stay within CI limits by reducing the number of tokens, converting stuff like icu::experimental::dimension::provider::units::UnitsDisplayNameV1 { patterns: icu::experimental::relativetime::provider::PluralPatterns { strings: icu::plurals::provider::PluralElementsPackedCow { elements: alloc::borrow::Cow::Borrowed(unsafe { icu::plurals::provider::PluralElementsPackedULE::from_byte_slice_unchecked(b"\0\x01 acre") }) }, _phantom: core::marker::PhantomData } }, into icu::experimental::dimension::provider::units::UnitsDisplayNameV1::new_baked(b"\0\x01 acre"). This works to some extent but the problems remain in the same order of magnitude and can recur as we add more data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-compilememIssue: Problems and improvements with respect to memory usage during compilation.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions