Skip to content

[CDRIVER-6017] Tweak visitor behavior when finding corrupt BSON #2019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 21 additions & 9 deletions src/libbson/doc/bson_validate_flags_t.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,27 +19,39 @@ Synopsis
BSON_VALIDATE_DOT_KEYS = (1 << 2),
BSON_VALIDATE_UTF8_ALLOW_NULL = (1 << 3),
BSON_VALIDATE_EMPTY_KEYS = (1 << 4),
BSON_VALIDATE_CORRUPT = (1 << 30),
} bson_validate_flags_t;

Description
-----------

``bson_validate_flags_t`` is a set of binary flags which may be combined to specify a level of BSON document validation.
``bson_validate_flags_t`` is a set of binary flags which may be combined to
specify a level of BSON document validation.

A value of ``0``, ``false``, or ``BSON_VALIDATE_NONE`` equivalently requests the minimum applicable level of validation.
A value of ``0``, ``false``, or ``BSON_VALIDATE_NONE`` equivalently requests the
minimum applicable level of validation.

In the context of validation APIs :symbol:`bson_validate()`, :symbol:`bson_validate_with_error()`, and :symbol:`bson_validate_with_error_and_offset()` the minimum validation still guarantees that a document can be successfully traversed by :symbol:`bson_iter_visit_all()`.
In the context of validation APIs :symbol:`bson_validate()`,
:symbol:`bson_validate_with_error()`, and
:symbol:`bson_validate_with_error_and_offset()` the minimum validation still
guarantees that a document can be successfully traversed by
:symbol:`bson_iter_visit_all()`.

Higher level APIs using this type may have different minimum validation levels. For example, ``libmongoc`` functions that take ``bson_validate_flags_t`` use ``0`` to mean the document contents are not visited and malformed headers will not be detected by the client.
Higher level APIs using this type may have different minimum validation levels.
For example, ``libmongoc`` functions that take ``bson_validate_flags_t`` use
``0`` to mean the document contents are not visited and malformed headers will
not be detected by the client.

Each defined flag aside from ``BSON_VALIDATE_NONE`` describes an optional validation feature that may be enabled, alone or in combination with other features:

* ``BSON_VALIDATE_NONE`` Minimum level of validation; in ``libbson``, validates element headers.
* ``BSON_VALIDATE_UTF8`` All keys and string values are checked for invalid UTF-8.
* ``BSON_VALIDATE_UTF8_ALLOW_NULL`` String values are allowed to have embedded NULL bytes.
* ``BSON_VALIDATE_NONE`` Minimum level of validation; in ``libbson``, validates
element headers and UTF-8 strings.
* ``BSON_VALIDATE_UTF8`` Deprecated. (All text is unconditionally checked for UTF-8 validity.)
* ``BSON_VALIDATE_UTF8_ALLOW_NULL`` UTF-8 string values are allowed to have NULL characters.
* ``BSON_VALIDATE_DOLLAR_KEYS`` Prohibit keys that start with ``$`` outside of a "DBRef" subdocument.
* ``BSON_VALIDATE_DOT_KEYS`` Prohibit keys that contain ``.`` anywhere in the string.
* ``BSON_VALIDATE_EMPTY_KEYS`` Prohibit zero-length keys.
* ``BSON_VALIDATE_CORRUPT`` - This is not a validation flag, but will appear as
an error code if validation fails for some other reason not listed above.
Checks for BSON corruption cannot be disabled.

.. seealso::

Expand Down
21 changes: 16 additions & 5 deletions src/libbson/src/bson/bson-iter.c
Original file line number Diff line number Diff line change
Expand Up @@ -2025,6 +2025,7 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */
while (_bson_iter_next_internal (iter, 0, &key, &bson_type, &unsupported)) {
if (*key && !bson_utf8_validate (key, strlen (key), false)) {
iter->err_off = iter->off;
VISIT_CORRUPT (iter, data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take this opportunity to extend test suite coverage of the .visit_corrupt callback function to account for these new checks? atm there only seems to be a single test case. Perhaps the VALIDATE_TEST macro can be extended to also check .visit_corrupt callback behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is a good idea.

break;
}

Expand All @@ -2048,7 +2049,8 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

if (!bson_utf8_validate (utf8, utf8_len, true)) {
iter->err_off = iter->off;
return true;
VISIT_CORRUPT (iter, data);
break;
}

if (VISIT_UTF8 (iter, key, utf8_len, utf8, data)) {
Expand All @@ -2064,6 +2066,7 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

if (!bson_init_static (&b, docbuf, doclen)) {
iter->err_off = iter->off;
VISIT_CORRUPT (iter, data);
break;
}
if (VISIT_DOCUMENT (iter, key, &b, data)) {
Expand All @@ -2079,6 +2082,7 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

if (!bson_init_static (&b, docbuf, doclen)) {
iter->err_off = iter->off;
VISIT_CORRUPT (iter, data);
break;
}
if (VISIT_ARRAY (iter, key, &b, data)) {
Expand Down Expand Up @@ -2136,8 +2140,10 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */
const char *options = NULL;
regex = bson_iter_regex (iter, &options);

if (!bson_utf8_validate (regex, strlen (regex), true)) {
if (!bson_utf8_validate (regex, strlen (regex), false) ||
!bson_utf8_validate (options, strlen (options), false)) {
iter->err_off = iter->off;
VISIT_CORRUPT (iter, data);
return true;
}

Expand All @@ -2154,6 +2160,7 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

if (!bson_utf8_validate (collection, collection_len, true)) {
iter->err_off = iter->off;
VISIT_CORRUPT (iter, data);
return true;
}

Expand All @@ -2169,7 +2176,8 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

if (!bson_utf8_validate (code, code_len, true)) {
iter->err_off = iter->off;
return true;
VISIT_CORRUPT (iter, data);
break;
}

if (VISIT_CODE (iter, key, code_len, code, data)) {
Expand All @@ -2184,7 +2192,8 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

if (!bson_utf8_validate (symbol, symbol_len, true)) {
iter->err_off = iter->off;
return true;
VISIT_CORRUPT (iter, data);
break;
}

if (VISIT_SYMBOL (iter, key, symbol_len, symbol, data)) {
Expand All @@ -2202,11 +2211,13 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

if (!bson_utf8_validate (code, length, true)) {
iter->err_off = iter->off;
return true;
VISIT_CORRUPT (iter, data);
break;
}

if (!bson_init_static (&b, docbuf, doclen)) {
iter->err_off = iter->off;
VISIT_CORRUPT (iter, data);
break;
}
if (VISIT_CODEWSCOPE (iter, key, length, code, &b, data)) {
Expand Down
47 changes: 38 additions & 9 deletions src/libbson/src/bson/bson-types.h
Original file line number Diff line number Diff line change
Expand Up @@ -185,25 +185,54 @@ typedef struct {


/**
* bson_validate_flags_t:
* @brief Flags an error code for BSON validation functions.
*
* This enumeration is used for validation of BSON documents. It allows
* selective control on what you wish to validate.
* Pass these flags bits to control the behavior of the `bson_validate` family
* of functions.
*
* %BSON_VALIDATE_NONE: No additional validation occurs.
* %BSON_VALIDATE_UTF8: Check that strings are valid UTF-8.
* %BSON_VALIDATE_DOLLAR_KEYS: Check that keys do not start with $.
* %BSON_VALIDATE_DOT_KEYS: Check that keys do not contain a period.
* %BSON_VALIDATE_UTF8_ALLOW_NULL: Allow NUL bytes in UTF-8 text.
* %BSON_VALIDATE_EMPTY_KEYS: Prohibit zero-length field names
* Additionally, if validation fails, then the error code set on a `bson_error_t`
* will have the value corresponding to the reason that validation failed.
*/
typedef enum {
/**
* @brief No special validation behavior specified.
*/
BSON_VALIDATE_NONE = 0,
/**
* @brief This flag has no effect
*
* @note Because invalid UTF-8 text is always invalid in BSON, the `bson_validate`
* function will always reject invalid UTF-8 data as corrupt BSON.
*/
BSON_VALIDATE_UTF8 = (1 << 0),
/**
* @brief Check that element keys do not begin with an ASCII dollar `$`
*/
BSON_VALIDATE_DOLLAR_KEYS = (1 << 1),
/**
* @brief Check that element keys do not contain an ASCII period `.`
*/
BSON_VALIDATE_DOT_KEYS = (1 << 2),
/**
* @brief If set then it is *not* an error for a UTF-8 string to contain
* embedded null characters.
*
* By default, `bson_validate` APIs will reject BSON UTF-8 elements that
* contain embedded null characters.
*/
BSON_VALIDATE_UTF8_ALLOW_NULL = (1 << 3),
/**
* @brief Check that no element key is a zero-length empty string.
*/
BSON_VALIDATE_EMPTY_KEYS = (1 << 4),
/**
* @brief This is not a flag that controls behavior, but is instead used to indicate
* that a BSON document is corrupted in some way. This is the value that will
* appear as an error code.
*
* Passing this as a flag has no effect.
*/
BSON_VALIDATE_CORRUPT = (1 << 30),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse BSON_VALIDATE_UTF8 instead? It already "has no effect" as a behavior control flag and already describes the very UTF-8 validity checks which BSON_VALIDATE_CORRUPT is meant to describe (appears redundant).

s/BSON_VALIDATE_CORRUPT/BSON_VALIDATE_UTF8/g

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but VISIT_CORRUPT is now (as of this PR) also used for when a subdocument is invalid (previously this was also just return true;, meaning that bson_validate was also missing this case).

I considered adding a .visit_invalid_utf8 callback, which is potentially useful in general, but there's ambiguity in the semantics of "corruption", because the spec technically says that BSON strings are UTF-8, so a non-utf-8 string could arguably be considered corruption.

} bson_validate_flags_t;


Expand Down
69 changes: 44 additions & 25 deletions src/libbson/src/bson/bson.c
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@

#include <string.h>
#include <math.h>
#include <assert.h>

#include <mlib/config.h>

Expand Down Expand Up @@ -2505,19 +2506,19 @@ bson_array_as_canonical_extended_json (const bson_t *bson, size_t *length)
#define VALIDATION_ERR(_flag, _msg, ...) bson_set_error (&state->error, BSON_ERROR_INVALID, _flag, _msg, __VA_ARGS__)

static bool
_bson_iter_validate_utf8 (const bson_iter_t *iter, const char *key, size_t v_utf8_len, const char *v_utf8, void *data)
_bson_iter_validate_utf8 (const bson_iter_t *iter, const char *key, size_t u8len, const char *u8data, void *data)
{
bson_validate_state_t *state = data;
bool allow_null;
bson_validate_state_t *const state = data;

if ((state->flags & BSON_VALIDATE_UTF8)) {
allow_null = !!(state->flags & BSON_VALIDATE_UTF8_ALLOW_NULL);
const bool allow_null = !!(state->flags & BSON_VALIDATE_UTF8_ALLOW_NULL);

if (!bson_utf8_validate (v_utf8, v_utf8_len, allow_null)) {
state->err_offset = iter->off;
VALIDATION_ERR (BSON_VALIDATE_UTF8, "invalid utf8 string for key \"%s\"", key);
return true;
}
// Assert: The visitor API already checks that all text is valid UTF-8
assert (bson_utf8_validate (u8data, u8len, true));

if (!allow_null && !bson_utf8_validate (u8data, u8len, false /* disallow null */)) {
state->err_offset = iter->off;
VALIDATION_ERR (BSON_VALIDATE_UTF8_ALLOW_NULL, "UTF-8 string for \"%s\" contains null characters", key);
return true;
}

if ((state->flags & BSON_VALIDATE_DOLLAR_KEYS)) {
Expand All @@ -2538,7 +2539,7 @@ _bson_iter_validate_corrupt (const bson_iter_t *iter, void *data)
bson_validate_state_t *state = data;

state->err_offset = iter->err_off;
VALIDATION_ERR (BSON_VALIDATE_NONE, "%s", "corrupt BSON");
VALIDATION_ERR (BSON_VALIDATE_CORRUPT, "%s", "corrupt BSON");
}


Expand Down Expand Up @@ -2601,13 +2602,31 @@ _bson_iter_validate_codewscope (
BSON_UNUSED (v_code_len);
BSON_UNUSED (v_code);

// Validate the code string
if (_bson_iter_validate_utf8 (iter, key, v_code_len, v_code, data)) {
return true;
}

if (!bson_validate (v_scope, state->flags, &offset)) {
state->err_offset = iter->off + offset;
VALIDATION_ERR (BSON_VALIDATE_NONE, "%s", "corrupt code-with-scope");
return false;
VALIDATION_ERR (BSON_VALIDATE_CORRUPT, "%s", "corrupt code-with-scope");
return true;
}

return true;
return false;
}

static bool
_bson_iter_validate_dbpointer (
const bson_iter_t *iter, const char *key, size_t coll_len, const char *coll, const bson_oid_t *oid, void *data)
{
BSON_UNUSED (key);
BSON_UNUSED (oid);
// Validate the collection name string
if (_bson_iter_validate_utf8 (iter, key, coll_len, coll, data)) {
return true;
}
return false;
}


Expand All @@ -2622,17 +2641,17 @@ static const bson_visitor_t bson_validate_funcs = {
NULL, /* visit_double */
_bson_iter_validate_utf8,
_bson_iter_validate_document,
_bson_iter_validate_document, /* visit_array */
NULL, /* visit_binary */
NULL, /* visit_undefined */
NULL, /* visit_oid */
NULL, /* visit_bool */
NULL, /* visit_date_time */
NULL, /* visit_null */
NULL, /* visit_regex */
NULL, /* visit_dbpoint */
NULL, /* visit_code */
NULL, /* visit_symbol */
_bson_iter_validate_document, /* visit_array */
NULL, /* visit_binary */
NULL, /* visit_undefined */
NULL, /* visit_oid */
NULL, /* visit_bool */
NULL, /* visit_date_time */
NULL, /* visit_null */
NULL, /* visit_regex */
_bson_iter_validate_dbpointer, /* visit_dbpointer */
_bson_iter_validate_utf8, /* visit_code */
_bson_iter_validate_utf8, /* visit_symbol */
_bson_iter_validate_codewscope,
};

Expand Down
Loading