Skip to content

v6 validation: "contains" #63

Closed
Closed
@handrews

Description

@handrews

Originally written by @geraintluff at https://github.com/json-schema/json-schema/wiki/contains-(v5-proposal)

Proposed keywords

  • contains

We also might want an equivalent for objects (like containsProperty).

Purpose

Specifying that an array must contain at least one matching item is awkward. It can currently be done, but only using some inside-out syntax:

{
    "type": "array",
    "not": {
        "items": {
            "not": {... whatever ...}
        }
    }
}

This would replace it with the much neater:

{
    "type": "array",
    "contains": {... whatever ...}
}

It would also enable us to specify multiple schemas that must be matched by distinct items (which is currently not supported).

Values

The value of contains would be either a schema, or an array of schemas.

Validation

If the value of contains is a schema, then validation would only succeed if at least one of the items in the array matches the provided sub-schema.

If the value of contains is an array, then validation would only succeed if it is possible to map each sub-schema in contains to a distinct array item matching that sub-schema. Two sub-schemas in contains cannot be mapped to the same array index.

Example

Plain schema

{
    "type": "array",
    "contains": {
        "type": "string"
    }
}

Valid: ["foo"], [5, null, "foo"]
Invalid: [], [5, null]

Array of schemas

{
    "type": "array",
    "items": {"type": "object"},
    "contains": [
        {"required": ["propA"]},
        {"required": ["propB"]}
    ]
}

Valid:

  • [{"propA": true}, {"propB": true}]
  • [{"propA": true}, {"propA": true, "propB": true}]

Invalid:

  • []
  • [{"propA": true}] - no match for second entry
  • [{"propA": true, "propB": true}] - entries in contains must describe different items

Concerns

Implementation

The plain-schema case is simple.

The array case is equivalent to Hall's Marriage Theorem. There are relatively efficient solutions for the general problem - but, I suspect a brute-force search will be surprisingly effective and efficient (due to the relatively small number of entries in contains).

It may or may not be worth warning schema authors about stuffing hundreds of entries into contains, because a naive implementation could easily end up having O(n3m) complexity.

Complexity of understanding (for humans)

Behaviour for the array for may be slightly complicated. For example:

{
    "type": "array",
    "contains": [
        {"enum": ["A", "B"]},
        {"enum": ["A", "B", "C"]},
        {"enum": ["A", "D"]},
    ]
}

In this case, ["A", "B", "C"] is valid.

However, this is not due to the syntax - it's simply a complex constraint.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions