Skip to content

CXX-2665 audit and refactor EVG matrices to use supported distros and extended coverage #1328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Feb 12, 2025

Conversation

eramongodb
Copy link
Contributor

@eramongodb eramongodb commented Feb 7, 2025

Summary

Resolves CXX-2665. Verified by this patch (normal patch) and this patch (limited distros).

Refactors Evergreen task matrices to use a (more) consistent selection of distros and task coverage patterns.

Distro Selection

CXX-2665 recommends using RHEL distros by default, as it has the highest availability of MongoDB Server versions to test against. This PR therefore uses the rhel80 distro as "The Default Distro" (Linux x86_64) on which most tasks are executed. Any task which requires using a different distro is accompanied by a comment documenting the reason for requiring the chosen distro (e.g. packaging tasks).

Important

The rhel80 distro is now "The Default Distro" for all Evergreen tasks.

This PR attempts to preserve task coverage for the following OS+arch combinations:

  • Linux x86_64 ("The Default")
  • Linux ARM64
  • Linux Power (aka ppc64le)
  • Linux zSeries (aka s390x)
  • MacOS ARM64
  • MacOS x86_64
  • Windows x86_64

For these additional OS+arch combinations, distros providing the (nearly) highest host availability (capacity to scale and execute multiple tasks in parallel) are chosen instead. According to this metric, the selected distros for each OS+arch listed above is:

  • rhel80 (Linux x86_64)
  • ubuntu2004-arm64 (Linux ARM64)
  • rhel8-power (Linux Power)
  • rhel8-zseries (Linux zSeries)
  • macos-14-arm64 (MacOS ARM64)
  • macos-14 (MacOS x86_64)
  • windows-vsCurrent (Windows x86_64)

Some of the (non-resource-limited) chosen distros are capable of executing up to several hundred tasks simultaneously. This greatly increases task execution throughput and reduces overall build times for a given patch.

Note

The average host availability and scalability for a given distro may change over time.

Matrix Updates

The increased throughput of tasks allowed auditing and extending coverage for our task matrices to provide a much more consistent and comprehensive coverage of all related features. Most notably, the total task count for the following matrices have been extended to:

  • Integration: 150 tasks -> 464 tasks.
  • Compile Only: 14 tasks -> 136 tasks.
  • Sanitizers: 10 tasks -> 45 tasks.
  • Valgrind: 8 tasks -> 18 tasks.

In total, the C++ Driver EVG configuration has been extended to 230 tasks -> 716 tasks total (including non-patchable tasks). Despite the greater number of tasks, due to the selection of high-availability distros, the actual real-world time taken to finish executing all tasks is actually less than before (with the exception of the MacOS x86_64 distro, which is not a "limited" distro but is not as scalable).

Due to the large number of tasks in the integration matrix (which can lead to performance issues in the Spruce UI), it has been split into five separate matrices (Linux, MacOS, Windows, mongocryptd, and extra alignment), each with its own display task (excluding limited distros due to DEVPROD-13331). This reduces the number of tasks listed on the build page down to just 23 tasks in total.

Drive-By Improvements

  • RHEL uses a different CMAKE_INSTALL_LIBDIR (lib64) than what is typical for other platforms. Moved all lib64 vs. lib handling out of the EVG configuration and into scripts $LIB_DIR for consistency.
  • The generate-uninstall.sh misbehaved on RHEL due to \n not being expanded in echo "${dirs}" as expected. This was changed to printf "${dirs}" instead.
  • Ensured the API examples runner prints the exception message at the component level (not just in main()) to ensure the message is always emitted even when more than one component may throw an exception (due to threads and/or forking). This revealed the cause of an API examples runner failure on Windows due to the client::options::auto_encryption example failing to spawn a mongocryptd instance. This example was fixed to silence the warning should it be emitted.
  • Numerous extended-alignment-related warnings are addressed to support extended compilation task coverage.
    • -Wover-aligned and -Walign-mismatch warnings are silenced in cases where alignment is ensured by external factors (either manually or by the bson/mongoc library).
    • std::align and an extended storage buffer size is used to ensure data members which require extended alignment are always aligned properly regardless of the alignment of the parent impl object. (Aligned operator new is only available in C++17 and newer.)
    • Use of std::aligned_storage is replaced with alignas to avoid deprecation warnings as well as _ENABLE_EXTENDED_ALIGNED_STORAGE errors with VS 2017 and newer.
  • Replaced DYLD_LIBRARY_PATH with DYLD_FALLBACK_LIBRARY_PATH to avoid potential undesirable library linkage behavior. (See: Dyamic Library Programming Topics)

Unresolved Work

  • Failed to extend MacOS and Windows integration matrix coverage to support static builds due to errors emitted during the build-example-projects.sh step in test.sh. Filed CXX-3217 (Windows) and CXX-3218 (MacOS).
  • Failed to avoid use of explicit -rpath linker flags on MacOS despite setting DYLD_FALLBACK_LIBRARY_PATH and related environment variables and CMake configuration variables. Filed CXX-3219.
  • Unclear how to address -Walign-mismatch warnings when test_util::mock<T> is used for functions returning a pointer to an over-aligned object (e.g. bson_t* foo() -> bson_t* ptr = foo(); -> bar(ptr) -> warning: ...). Using the assume_aligned attribute on test_util::mock<T>::operator() does not seem to be enough to convince the compiler that the resulting pointed-to object is correcty aligned. Warnings are left unaddressed for now.

@eramongodb eramongodb requested a review from kevinAlbs February 7, 2025 17:39
@eramongodb eramongodb self-assigned this Feb 7, 2025
@kevinAlbs kevinAlbs requested review from rcsanchez97 and a user and removed request for kevinAlbs February 7, 2025 19:28
@eramongodb
Copy link
Contributor Author

Reduced task coverage on Power and zSeries to csfle + latest + replica (same as for the mongocryptd and extra alignment matrices) due to observing frequent task timeouts for sharded clusters + likely minimal value testing plain or single topology coverage on these distros.

@eramongodb
Copy link
Contributor Author

Reverted changes pertaining to addressing extra alignment warnings and UBSan errors. These will be addressed in a separate, followup PR.

Nevertheless, as additional drive-by improvements, references to _GLIBCXX_USE_CXX11_ABI=0 are removed (it is unclear why this is present; rationale is not given by 89a48a4) and the path to llvm-symbolizer is updated from /usr/lib/llvm-3.8/bin to /opt/mongodbtoolchain/v4/bin (required to produce meaningful stack trace output on errors).

Copy link
Contributor

@rcsanchez97 rcsanchez97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with the one suggested minor wording change.

elif command -v apt-get >/dev/null; then
sudo apt-get install -q -y valgrind
else
echo "Unknown how to valgrind on this distro: ${distro_id:?}" 1>&2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "Unknown how to valgrind on this distro: ${distro_id:?}" 1>&2
echo "Unknown how to install valgrind on this distro: ${distro_id:?}" 1>&2

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@eramongodb eramongodb merged commit c99d928 into mongodb:master Feb 12, 2025
18 of 21 checks passed
@eramongodb eramongodb deleted the cxx-2665 branch February 12, 2025 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants