Skip to content

CDRIVER-4500: SDAM structured log and unified test support #1842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
81 commits merged into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
25d922c
sync SDAM spec tests
Dec 5, 2024
133c251
Add server_discovery_and_monitoring/unified tests
Dec 5, 2024
4d25c8b
Add skips for SDAM unified tests that require pool support
Dec 5, 2024
f149d2d
unified: error instead of segfault when checking events/logs on missi…
Dec 6, 2024
e518e64
unified entities: make 'close' distinct from 'delete'
Dec 6, 2024
ccceffe
log-and-monitor type for apm monitoring state plus structured log ins…
Dec 6, 2024
183fa8d
structured_log: add 'oid' item
Dec 9, 2024
bede749
structured log: "Starting topology monitoring"
Dec 9, 2024
068f9ff
Debug logging about suppressed logs in unified tests
Dec 9, 2024
e66620f
td logging: monitor_changed even when apm monitoring callbacks unset
Dec 9, 2024
05cf0ae
structured log items for topology description from topology or standa…
Dec 9, 2024
9f8fa6e
structured log: "Topology description changed"
Dec 9, 2024
988946b
unified tests: improve error logging about structured logs
Dec 9, 2024
c9b5f20
structured log: "Starting server monitoring"
Dec 9, 2024
8633fe5
monitoring: reset 'opened' state when instance changes
Dec 9, 2024
6073776
structured log: "Stopped server monitoring"
Dec 9, 2024
43215fe
structured log: "Stopped topology monitoring"
Jan 16, 2025
9f136fd
unified test runner: heartbeat started/succeeded/failed events
Jan 18, 2025
7dd15d9
entity-map: pass server monitoring mode URI option
Jan 18, 2025
ce7bf62
unified tests: runCommand should report reply as result for matching
Jan 18, 2025
1016f19
unified tests: allow sdam eventType
Jan 18, 2025
415d3a7
server-monitor heartbeat structured log messages
Jan 21, 2025
7b2abe4
heartbeat structured logs for topology scanner
Jan 21, 2025
f4b2fd5
waitForEvent: do an entire stream-selection, to advance blocking topo…
Jan 21, 2025
dafbf39
I think we need to skip serverMonitoringMode unified tests until we g…
Jan 21, 2025
e99cd54
unified waitForEvent: Do not skip structured logs from synthetic stre…
Jan 22, 2025
2db0f72
Skip heartbeat logging tests that depend on CMAP connection ID
Jan 22, 2025
86be66f
unified runner: topology open/close APM events
Jan 22, 2025
1bb0ae6
Several more skips for tests that depend on CMAP features
Jan 22, 2025
5f94175
Comment clarification
Jan 22, 2025
7c793dd
Additional test failures diagnosed as requiring skips
Jan 22, 2025
937d550
unified tests: assertEventCount operation
Jan 22, 2025
b06e760
unified runner: eventType filtering
Jan 22, 2025
08466ff
Additional test skips for CMAP dependency
Jan 22, 2025
d0b35e8
Flexible filtering for structured logs in tests
Jan 23, 2025
3fa42a2
Require test_structured_log_filter_pop to confirm a match
Jan 23, 2025
423fbe6
Resume skipping cmap expectEvents checks, to preserve existing test c…
Jan 23, 2025
54ffda3
Loud skip log, for consistency
Jan 23, 2025
d1f67e5
Skip unsupported assertEventCount
Jan 23, 2025
747abd1
Move structured log filter from global to per-entity
Jan 24, 2025
729b93e
Another test failure traced to missing optional CRUD feature
Jan 25, 2025
cce7f95
Document more test skips with CDRIVER-5870
Jan 27, 2025
f291dca
Simplify _entity_log_filter_accepts
Jan 27, 2025
93bf1e2
New comment for topology lifecycle test skip
Jan 27, 2025
5ae6672
Fix entity-map destructor for findcursor without cursor
Jan 28, 2025
05946d8
Additional CDRIVER-5870 skip
Jan 28, 2025
01e778c
Fix test skip for interruptInUse-pool-clear file
Jan 28, 2025
0309537
CDRIVER-4758 /server_discovery_and_monitoring/prose/heartbeat
Jan 28, 2025
e023d22
Additional debug output from loadbalancer tests
Jan 28, 2025
97fab0e
Modify loadbalancer test to match expected specified SDAM events
Jan 28, 2025
cdd4801
More test skips for driverConnectionId
Jan 28, 2025
11a3f31
Test skip for multithreaded runner support
Jan 28, 2025
3d596e8
Implement recordTopologyDescription, assertTopologyType, waitForPrima…
Jan 28, 2025
5a13330
Move test runner operations that were miscategorized as client operat…
Jan 28, 2025
fc08332
Formatting and debugging for waitForPrimaryChange
Jan 28, 2025
7e49fc5
Test skip for streaming heartbeats
Jan 28, 2025
dea4adb
Fix TSAN violation by enforcing documented mongoc_client_pool_set_apm…
Jan 28, 2025
626086c
Header style changes
Jan 28, 2025
95e0257
Update error message text in ASSERT_CAPTURED_LOG
Jan 28, 2025
d2181e4
Fix header style for log-and-monitor prelude
Jan 29, 2025
de89e87
NEWS: unreleased version note
Jan 29, 2025
722475c
suggested comment
Jan 30, 2025
f69df45
suggested "static" keyword for test function
Jan 30, 2025
64861c6
suggested INT64_C macros
Jan 30, 2025
6c70af7
fix NEWS entry (#1844)
kevinAlbs Jan 29, 2025
be245b4
suggested additional whitespace for skipped tests list
Jan 30, 2025
87fbfab
Fix documented param name
Jan 30, 2025
a5162c1
Move up comment about closed entities
Jan 30, 2025
34b647b
Additional comment requested in _mongoc_topology_description_monitor_…
Jan 30, 2025
d658671
Prefer unlocked td access during topology destroy
Jan 30, 2025
8d86d15
Allow unsafe mongoc_client_pool_set_apm_callbacks usage
Jan 30, 2025
10a3bd2
Replace mongoc_log_and_monitor_serial_t with bson_oid_t
Jan 30, 2025
87e953d
Expanded comment about apm_mutex
Jan 30, 2025
494b888
Merge branch 'master' into CDRIVER-4500
Jan 30, 2025
91ccb6a
Additional comment explaining the SDAM lifecycle workaround
Jan 31, 2025
1300364
Private header shouldn't be in HEADERS
Feb 3, 2025
a7404ca
const parameter for mongoc_log_and_monitor_instance_set_apm_callbacks
Feb 3, 2025
fcc1446
Add NEWS about topology event changes
Feb 3, 2025
19ab55d
Simplify topology 'opening' state
Feb 4, 2025
ec2009c
Defer server opening until after topology is opened
Feb 4, 2025
ec86c49
Update comment about lazy server opening in mongoc_topology_descripti…
Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
libmongoc 1.30.0 (Unreleased)
=============================

Fixes:

* Additional APM events required by the SDAM specification will now be delivered:
* Servers that have seen `server_opening` will now see a `server_closed` prior to `topology_closed`.
* Before `topology_closed`, a `topology_changed` event will transition to `Unknown` topology type.

Deprecated:

* Support for Debian 9 and Debian 10.
Expand Down
1 change: 1 addition & 0 deletions src/libmongoc/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -621,6 +621,7 @@ set (MONGOC_SOURCES
${PROJECT_SOURCE_DIR}/src/mongoc/mongoc-list.c
${PROJECT_SOURCE_DIR}/src/mongoc/mongoc-linux-distro-scanner.c
${PROJECT_SOURCE_DIR}/src/mongoc/mongoc-log.c
${PROJECT_SOURCE_DIR}/src/mongoc/mongoc-log-and-monitor-private.c
${PROJECT_SOURCE_DIR}/src/mongoc/mongoc-matcher.c
${PROJECT_SOURCE_DIR}/src/mongoc/mongoc-matcher-op.c
${PROJECT_SOURCE_DIR}/src/mongoc/mongoc-memcmp.c
Expand Down
49 changes: 20 additions & 29 deletions src/libmongoc/src/mongoc/mongoc-client-pool.c
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include <mongoc/mongoc-client-pool.h>
#include <mongoc/mongoc-client-private.h>
#include <mongoc/mongoc-client-side-encryption-private.h>
#include <mongoc/mongoc-log-and-monitor-private.h>
#include <mongoc/mongoc-queue-private.h>
#include <mongoc/mongoc-thread-private.h>
#include <mongoc/mongoc-topology-private.h>
Expand Down Expand Up @@ -54,8 +55,6 @@ struct _mongoc_client_pool_t {
bool error_api_set;
bool structured_log_opts_set;
bool client_initialized;
mongoc_apm_callbacks_t apm_callbacks;
void *apm_context;
int32_t error_api_version;
mongoc_server_api_t *api;
// `last_known_serverids` is a sorted array of uint32_t.
Expand Down Expand Up @@ -283,7 +282,6 @@ _initialize_new_client (mongoc_client_pool_t *pool, mongoc_client_t *client)

pool->client_initialized = true;
client->error_api_version = pool->error_api_version;
_mongoc_client_set_apm_callbacks_private (client, &pool->apm_callbacks, pool->apm_context);

client->api = mongoc_server_api_copy (pool->api);

Expand Down Expand Up @@ -577,35 +575,28 @@ bool
mongoc_client_pool_set_apm_callbacks (mongoc_client_pool_t *pool, mongoc_apm_callbacks_t *callbacks, void *context)
{
BSON_ASSERT_PARAM (pool);
BSON_OPTIONAL_PARAM (callbacks);
BSON_OPTIONAL_PARAM (context);

mongoc_topology_t *const topology = BSON_ASSERT_PTR_INLINE (pool)->topology;
mc_tpld_modification tdmod = mc_tpld_modify_begin (topology);

// Prevent setting callbacks more than once
// Enforce documented thread-safety restrictions
if (pool->apm_callbacks_set) {
mc_tpld_modify_drop (tdmod);
MONGOC_ERROR ("Can only set callbacks once");
MONGOC_ERROR ("mongoc_client_pool_set_apm_callbacks can only be called once per pool");
return false;
}
} else if (pool->client_initialized) {
MONGOC_ERROR ("mongoc_client_pool_set_apm_callbacks can only be called before mongoc_client_pool_pop");
/* @todo Since 2017 this requirement has been documented but not actually enforced. For now we are leaving it
* unenforced, for backward compatibility. This usage remains unsafe and incorrect. When possible, this should be
* modified to return false without modifying the APM callbacks. */
mongoc_log_and_monitor_instance_set_apm_callbacks (&pool->topology->log_and_monitor, callbacks, context);
pool->apm_callbacks_set = true;
return true;

// Update callbacks on the pool
if (callbacks) {
pool->apm_callbacks = *callbacks;
} else {
pool->apm_callbacks = (mongoc_apm_callbacks_t){0};
// Now we can be sure no other threads are relying on concurrent access to the instance yet.
mongoc_log_and_monitor_instance_set_apm_callbacks (&pool->topology->log_and_monitor, callbacks, context);
pool->apm_callbacks_set = true;
return true;
}
pool->apm_context = context;

// Update callbacks on the topology
mongoc_topology_set_apm_callbacks (topology, tdmod.new_td, callbacks, context);

// Signal that we have already set the callbacks
pool->apm_callbacks_set = true;

// Save our updated topology
mc_tpld_modify_commit (tdmod);

return true;
}

bool
Expand All @@ -618,14 +609,14 @@ mongoc_client_pool_set_structured_log_opts (mongoc_client_pool_t *pool, const mo
* and only before the first client is initialized. Structured logging is generally
* expected to warn but not quit when encountering initialization errors. */
if (pool->structured_log_opts_set) {
MONGOC_WARNING ("mongoc_client_pool_set_structured_log_opts can only be called once per pool");
MONGOC_ERROR ("mongoc_client_pool_set_structured_log_opts can only be called once per pool");
return false;
} else if (pool->client_initialized) {
MONGOC_WARNING ("mongoc_client_pool_set_structured_log_opts can only be called before mongoc_client_pool_pop");
MONGOC_ERROR ("mongoc_client_pool_set_structured_log_opts can only be called before mongoc_client_pool_pop");
return false;
} else {
// Now we can be sure no other threads are relying on concurrent access to the instance yet.
mongoc_topology_set_structured_log_opts (pool->topology, opts);
mongoc_log_and_monitor_instance_set_structured_log_opts (&pool->topology->log_and_monitor, opts);
pool->structured_log_opts_set = true;
return true;
}
Expand Down
6 changes: 0 additions & 6 deletions src/libmongoc/src/mongoc/mongoc-client-private.h
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,6 @@ struct _mongoc_client_t {
mongoc_read_concern_t *read_concern;
mongoc_write_concern_t *write_concern;

mongoc_apm_callbacks_t apm_callbacks;
void *apm_context;

int32_t error_api_version;
bool error_api_set;

Expand Down Expand Up @@ -151,9 +148,6 @@ _mongoc_client_get_rr (const char *hostname,
mongoc_client_t *
_mongoc_client_new_from_topology (mongoc_topology_t *topology);

bool
_mongoc_client_set_apm_callbacks_private (mongoc_client_t *client, mongoc_apm_callbacks_t *callbacks, void *context);

mongoc_stream_t *
mongoc_client_default_stream_initiator (const mongoc_uri_t *uri,
const mongoc_host_list_t *host,
Expand Down
76 changes: 27 additions & 49 deletions src/libmongoc/src/mongoc/mongoc-client.c
Original file line number Diff line number Diff line change
Expand Up @@ -1122,7 +1122,7 @@ _mongoc_client_new_from_topology (mongoc_topology_t *topology)
}
#endif

mongoc_structured_log (topology->structured_log,
mongoc_structured_log (topology->log_and_monitor.structured_log,
MONGOC_STRUCTURED_LOG_LEVEL_DEBUG,
MONGOC_STRUCTURED_LOG_COMPONENT_CONNECTION,
"Client created");
Expand Down Expand Up @@ -1646,8 +1646,6 @@ _mongoc_client_retryable_read_command_with_stream (mongoc_client_t *client,
* server does not support retryable reads, fall through and allow the
* original error to be reported. */
if (is_retryable && _mongoc_read_error_get_type (ret, error, reply) == MONGOC_READ_ERR_RETRY) {
bson_error_t ignored_error;

/* each read command may be retried at most once */
is_retryable = false;

Expand All @@ -1667,8 +1665,13 @@ _mongoc_client_retryable_read_command_with_stream (mongoc_client_t *client,
.has_operation_id = true,
.operation_id = parts->assembled.operation_id,
};
retry_server_stream = mongoc_cluster_stream_for_reads (
&client->cluster, &ss_log_context, parts->read_prefs, parts->assembled.session, ds, NULL, &ignored_error);
retry_server_stream = mongoc_cluster_stream_for_reads (&client->cluster,
&ss_log_context,
parts->read_prefs,
parts->assembled.session,
ds,
NULL /* reply */,
NULL /* error */);

mongoc_deprioritized_servers_destroy (ds);
}
Expand Down Expand Up @@ -2165,14 +2168,14 @@ _mongoc_client_monitor_op_killcursors (mongoc_cluster_t *cluster,
const char *collection)
{
bson_t doc;
mongoc_client_t *client;
mongoc_apm_command_started_t event;

ENTRY;

client = cluster->client;
mongoc_client_t *client = cluster->client;
const mongoc_log_and_monitor_instance_t *log_and_monitor = &client->topology->log_and_monitor;

if (!client->apm_callbacks.started) {
if (!log_and_monitor->apm_callbacks.started) {
return;
}

Expand All @@ -2189,9 +2192,9 @@ _mongoc_client_monitor_op_killcursors (mongoc_cluster_t *cluster,
&server_stream->sd->service_id,
server_stream->sd->server_connection_id,
NULL,
client->apm_context);
log_and_monitor->apm_context);

client->apm_callbacks.started (&event);
log_and_monitor->apm_callbacks.started (&event);
mongoc_apm_command_started_cleanup (&event);
bson_destroy (&doc);

Expand All @@ -2207,16 +2210,16 @@ _mongoc_client_monitor_op_killcursors_succeeded (mongoc_cluster_t *cluster,
int64_t operation_id,
const char *db)
{
mongoc_client_t *client;
bson_t doc;
bson_array_builder_t *cursors_unknown;
mongoc_apm_command_succeeded_t event;

ENTRY;

client = cluster->client;
mongoc_client_t *client = cluster->client;
const mongoc_log_and_monitor_instance_t *log_and_monitor = &client->topology->log_and_monitor;

if (!client->apm_callbacks.succeeded) {
if (!log_and_monitor->apm_callbacks.succeeded) {
EXIT;
}

Expand All @@ -2239,9 +2242,9 @@ _mongoc_client_monitor_op_killcursors_succeeded (mongoc_cluster_t *cluster,
&server_stream->sd->service_id,
server_stream->sd->server_connection_id,
false,
client->apm_context);
log_and_monitor->apm_context);

client->apm_callbacks.succeeded (&event);
log_and_monitor->apm_callbacks.succeeded (&event);

mongoc_apm_command_succeeded_cleanup (&event);
bson_destroy (&doc);
Expand All @@ -2256,15 +2259,15 @@ _mongoc_client_monitor_op_killcursors_failed (mongoc_cluster_t *cluster,
int64_t operation_id,
const char *db)
{
mongoc_client_t *client;
bson_t doc;
mongoc_apm_command_failed_t event;

ENTRY;

client = cluster->client;
mongoc_client_t *client = cluster->client;
const mongoc_log_and_monitor_instance_t *log_and_monitor = &client->topology->log_and_monitor;

if (!client->apm_callbacks.failed) {
if (!log_and_monitor->apm_callbacks.failed) {
EXIT;
}

Expand All @@ -2285,9 +2288,9 @@ _mongoc_client_monitor_op_killcursors_failed (mongoc_cluster_t *cluster,
&server_stream->sd->service_id,
server_stream->sd->server_connection_id,
false,
client->apm_context);
log_and_monitor->apm_context);

client->apm_callbacks.failed (&event);
log_and_monitor->apm_callbacks.failed (&event);

mongoc_apm_command_failed_cleanup (&event);
bson_destroy (&doc);
Expand Down Expand Up @@ -2576,33 +2579,6 @@ mongoc_client_set_stream_initiator (mongoc_client_t *client, mongoc_stream_initi
}


bool
_mongoc_client_set_apm_callbacks_private (mongoc_client_t *client, mongoc_apm_callbacks_t *callbacks, void *context)
{
BSON_ASSERT_PARAM (client);

if (callbacks) {
memcpy (&client->apm_callbacks, callbacks, sizeof (mongoc_apm_callbacks_t));
} else {
memset (&client->apm_callbacks, 0, sizeof (mongoc_apm_callbacks_t));
}

client->apm_context = context;

/* A client pool sets APM callbacks for the entire pool. */
if (client->topology->single_threaded) {
mongoc_topology_set_apm_callbacks (client->topology,
/* We are safe to modify the shared_descr directly, since we are
* single-threaded */
mc_tpld_unsafe_get_mutable (client->topology),
callbacks,
context);
}

return true;
}


bool
mongoc_client_set_apm_callbacks (mongoc_client_t *client, mongoc_apm_callbacks_t *callbacks, void *context)
{
Expand All @@ -2614,7 +2590,9 @@ mongoc_client_set_apm_callbacks (mongoc_client_t *client, mongoc_apm_callbacks_t
return false;
}

return _mongoc_client_set_apm_callbacks_private (client, callbacks, context);
mongoc_log_and_monitor_instance_set_apm_callbacks (&client->topology->log_and_monitor, callbacks, context);

return true;
}


Expand All @@ -2625,7 +2603,7 @@ mongoc_client_set_structured_log_opts (mongoc_client_t *client, const mongoc_str
BSON_OPTIONAL_PARAM (opts);

if (client->topology->single_threaded) {
mongoc_topology_set_structured_log_opts (client->topology, opts);
mongoc_log_and_monitor_instance_set_structured_log_opts (&client->topology->log_and_monitor, opts);
return true;
} else {
MONGOC_WARNING ("Cannot set structured log options on a pooled client, use "
Expand Down
Loading