Skip to content

Machine-readable dialect (not vocabulary) definition document #1423

Open
@gregsdennis

Description

@gregsdennis

IMPORTANT: This changes how meta-schemas are organized but not really how they work.

Relevant to this discussion:

I've been thinking about all of these ☝️ things together to get a larger picture of where vocabularies could go. The discussions I've been a part of have all described a vocabulary definition file as serving several purposes:

  • enumerating the keywords the vocab defines
  • assigning each keyword an ID
  • syntactically defining them and providing assertion functionality (i.e. schemas that validate their values) ⭐
  • categorizing them into their function (e.g. assertion, annotation, applicator)
    • multiple categories may apply per keyword, e.g. properties functions as all of these

Impact to the Meta-Schema

The ⭐ in particular is where the meta-schema is changed. Currently the schema for a keyword's value is contained in the meta-schema body, generally under a properties keyword. However, if the vocabulary definition file carries and enforces the schema for a keyword's value, then the meta-schema's entry is redundant. This means that the entire properties keyword for a meta-schema could be removed as it's all in the vocab files.

I don't think this is a breaking change, however. A significant reorganization, sure, but the functionality is all still there. Moreover, we can make this change iteratively.

Suppose the only change we make to how the meta-schema is processed is that $vocabulary acquires some validation behavior, applying the keyword schemas from all of the vocabularies it lists (it becomes an in-place applicator similar to properties). Ideally, those keyword schemas would be the same as what's already in the meta-schema. However, even if they're not, the meta-schema is defining a dialect by virtue of declaring a set of vocabularies. In doing so, it's free to apply additional constraints to keywords.

For example, consider a modified Validation meta-schema where I've required that enum have unique values (which isn't a current requirement):

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "...",
  "$vocabulary": {
    "https://json-schema.org/draft/2020-12/vocab/validation": true
  },
  // ...
  "properties": {
    // ...
    "enum": {
      "type": "array",
      "items": true,
      "uniqueItems": true
    },
    // ...
  },
  // ...
}

enum, as defined in the vocabulary, doesn't have the uniqueness constraint. This is actually possible now: the above meta-schema should be supported without any issues.

Now consider adding in-place-applicator / assertion functionality to $vocabulary which (for enum) enforces the type and items constraints but not uniqueItems. The functionality of this meta-schema is unchanged.

Going further, we could change the original Validation meta-schema to this:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://json-schema.org/draft/2020-12/meta/validation",
  "$vocabulary": {
    "https://json-schema.org/draft/2020-12/vocab/validation": true
  },
  "$dynamicAnchor": "meta",
  "title": "Validation vocabulary meta-schema",
  "type": [
    "object",
    "boolean"
  ]
}

We don't need properties because that's only defining the keywords, which are now defined in the vocabulary document identified by https://json-schema.org/draft/2020-12/vocab/validation, and we don't need $defs because that was only used to support the subschemas in properties.

In fact we may not even need the vocab meta-schemas anymore. Because the top-level meta-schema lists all of the vocabularies, it would automatically perform all of the validation that the vocab meta-schemas currently provide. We could remove the allOf making it just:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://json-schema.org/draft/2020-12/schema",
  "$vocabulary": {
      "https://json-schema.org/draft/2020-12/vocab/core": true,
      "https://json-schema.org/draft/2020-12/vocab/applicator": true,
      "https://json-schema.org/draft/2020-12/vocab/unevaluated": true,
      "https://json-schema.org/draft/2020-12/vocab/validation": true,
      "https://json-schema.org/draft/2020-12/vocab/meta-data": true,
      "https://json-schema.org/draft/2020-12/vocab/format-annotation": true,
      "https://json-schema.org/draft/2020-12/vocab/content": true
  },
  "$dynamicAnchor": "meta",

  "title": "Core and Validation specifications meta-schema",
  "type": ["object", "boolean"]
}

(I've also removed the deprecated keywords listing.)

Adoption

First of all, we've agreed that vocabularies and the $vocabulary keyword are (at best) unstable, so modifying it (even in a breaking way) isn't out of the question.

Adding in-place-applicator / assertion behavior to $vocabulary in the way described above isn't a breaking change as long as we copy the keyword schemas correctly.

Later, once $vocabulary is promoted to being a stable feature, we can update the meta-schemas to remove the redundancies.

Readability and Accessibility

There is an issue of readability and accessibility when all of the keywords are defined in vocab files. While most people would be used to just looking in the meta-schema to see what keywords are available and how they're defined, now they'd have to follow another file reference to get that same information.

I don't think this is a big issue, though, and people will eventually get used to it.

On the other hand, creating a new meta-schema is immensely easier: you just list the vocabularies you want, and everything else is taken care of.

Automatic Support for Undefined Keyword Checking

With this in place, implementations will be able to look at the vocab files to see if and how a keyword is defined.

Further, the implementation would be able to detect trying to circumvent the "keywords must be defined in vocabs" requirement by defining a new keyword directly in the meta-schema. Currently, trying to do this is troublesome for implementations (annoying but not impossible).

(There may be some intersection here with x- keywords, but I haven't thought about it too hard.)

$vocabulary Requires Special Treatment

Currently $vocabulary is only to be processed when the schema that contains it is being processed as a meta-schema. I don't think this should change as it only defines what keywords the instance (another schema) can use.

In this way, maybe it does break the nice symmetry we have around "a meta-schema validating a schema" is just "a schema validating an instance." But it could be argued that such symmetry was broken when $vocabulary was introduced.

It may have an impact on the Test Suite since we do have a number of tests that validate schemas based on the meta-schema, and they'd need to be updated to pass along the context of "this is a meta-schema evaluation" in order to get the validation result from $vocabulary.

Out of scope

I haven't addressed

  • what the file might look like, specifically, only that it should contain the things I listed above
  • how the value of $vocabulary might change (which depends on whether optional vocabs are still worth having, see link at top)
  • how the referencing of a vocab file works (would it be an implicit reference or do we need $ref in some capacity?)

I'd like to get the concept defined before we start considering mechanics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    In Discussion

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions