-
Notifications
You must be signed in to change notification settings - Fork 545
CXX-2665 audit and refactor EVG matrices to use supported distros and extended coverage #1328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reduced task coverage on Power and zSeries to csfle + latest + replica (same as for the mongocryptd and extra alignment matrices) due to observing frequent task timeouts for sharded clusters + likely minimal value testing plain or single topology coverage on these distros. |
Reverted changes pertaining to addressing extra alignment warnings and UBSan errors. These will be addressed in a separate, followup PR. Nevertheless, as additional drive-by improvements, references to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with the one suggested minor wording change.
.evergreen/scripts/test.sh
Outdated
elif command -v apt-get >/dev/null; then | ||
sudo apt-get install -q -y valgrind | ||
else | ||
echo "Unknown how to valgrind on this distro: ${distro_id:?}" 1>&2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo "Unknown how to valgrind on this distro: ${distro_id:?}" 1>&2 | |
echo "Unknown how to install valgrind on this distro: ${distro_id:?}" 1>&2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Summary
Resolves CXX-2665. Verified by this patch (normal patch) and this patch (limited distros).
Refactors Evergreen task matrices to use a (more) consistent selection of distros and task coverage patterns.
Distro Selection
CXX-2665 recommends using RHEL distros by default, as it has the highest availability of MongoDB Server versions to test against. This PR therefore uses the
rhel80
distro as "The Default Distro" (Linux x86_64) on which most tasks are executed. Any task which requires using a different distro is accompanied by a comment documenting the reason for requiring the chosen distro (e.g. packaging tasks).Important
The
rhel80
distro is now "The Default Distro" for all Evergreen tasks.This PR attempts to preserve task coverage for the following OS+arch combinations:
For these additional OS+arch combinations, distros providing the (nearly) highest host availability (capacity to scale and execute multiple tasks in parallel) are chosen instead. According to this metric, the selected distros for each OS+arch listed above is:
rhel80
(Linux x86_64)ubuntu2004-arm64
(Linux ARM64)rhel8-power
(Linux Power)rhel8-zseries
(Linux zSeries)macos-14-arm64
(MacOS ARM64)macos-14
(MacOS x86_64)windows-vsCurrent
(Windows x86_64)Some of the (non-resource-limited) chosen distros are capable of executing up to several hundred tasks simultaneously. This greatly increases task execution throughput and reduces overall build times for a given patch.
Note
The average host availability and scalability for a given distro may change over time.
Matrix Updates
The increased throughput of tasks allowed auditing and extending coverage for our task matrices to provide a much more consistent and comprehensive coverage of all related features. Most notably, the total task count for the following matrices have been extended to:
In total, the C++ Driver EVG configuration has been extended to 230 tasks -> 716 tasks total (including non-patchable tasks). Despite the greater number of tasks, due to the selection of high-availability distros, the actual real-world time taken to finish executing all tasks is actually less than before (with the exception of the MacOS x86_64 distro, which is not a "limited" distro but is not as scalable).
Due to the large number of tasks in the integration matrix (which can lead to performance issues in the Spruce UI), it has been split into five separate matrices (Linux, MacOS, Windows, mongocryptd, and extra alignment), each with its own display task (excluding limited distros due to DEVPROD-13331). This reduces the number of tasks listed on the build page down to just 23 tasks in total.
Drive-By Improvements
CMAKE_INSTALL_LIBDIR
(lib64) than what is typical for other platforms. Moved alllib64
vs.lib
handling out of the EVG configuration and into scripts$LIB_DIR
for consistency.generate-uninstall.sh
misbehaved on RHEL due to\n
not being expanded inecho "${dirs}"
as expected. This was changed toprintf "${dirs}"
instead.main()
) to ensure the message is always emitted even when more than one component may throw an exception (due to threads and/or forking). This revealed the cause of an API examples runner failure on Windows due to theclient::options::auto_encryption
example failing to spawn amongocryptd
instance. This example was fixed to silence the warning should it be emitted.-Wover-aligned
and-Walign-mismatch
warnings are silenced in cases where alignment is ensured by external factors (either manually or by the bson/mongoc library).std::align
and an extended storage buffer size is used to ensure data members which require extended alignment are always aligned properly regardless of the alignment of the parentimpl
object. (Alignedoperator new
is only available in C++17 and newer.)std::aligned_storage
is replaced withalignas
to avoid deprecation warnings as well as_ENABLE_EXTENDED_ALIGNED_STORAGE
errors with VS 2017 and newer.DYLD_LIBRARY_PATH
withDYLD_FALLBACK_LIBRARY_PATH
to avoid potential undesirable library linkage behavior. (See: Dyamic Library Programming Topics)Unresolved Work
build-example-projects.sh
step intest.sh
. Filed CXX-3217 (Windows) and CXX-3218 (MacOS).-rpath
linker flags on MacOS despite settingDYLD_FALLBACK_LIBRARY_PATH
and related environment variables and CMake configuration variables. Filed CXX-3219.-Walign-mismatch
warnings whentest_util::mock<T>
is used for functions returning a pointer to an over-aligned object (e.g.bson_t* foo()
->bson_t* ptr = foo();
->bar(ptr)
->warning: ...
). Using theassume_aligned
attribute ontest_util::mock<T>::operator()
does not seem to be enough to convince the compiler that the resulting pointed-to object is correcty aligned. Warnings are left unaddressed for now.