Description
Originally written by @geraintluff at https://github.com/json-schema/json-schema/wiki/contains-(v5-proposal)
Proposed keywords
contains
We also might want an equivalent for objects (like containsProperty
).
Purpose
Specifying that an array must contain at least one matching item is awkward. It can currently be done, but only using some inside-out syntax:
{
"type": "array",
"not": {
"items": {
"not": {... whatever ...}
}
}
}
This would replace it with the much neater:
{
"type": "array",
"contains": {... whatever ...}
}
It would also enable us to specify multiple schemas that must be matched by distinct items (which is currently not supported).
Values
The value of contains
would be either a schema, or an array of schemas.
Validation
If the value of contains
is a schema, then validation would only succeed if at least one of the items in the array matches the provided sub-schema.
If the value of contains
is an array, then validation would only succeed if it is possible to map each sub-schema in contains
to a distinct array item matching that sub-schema. Two sub-schemas in contains
cannot be mapped to the same array index.
Example
Plain schema
{
"type": "array",
"contains": {
"type": "string"
}
}
Valid: ["foo"]
, [5, null, "foo"]
Invalid: []
, [5, null]
Array of schemas
{
"type": "array",
"items": {"type": "object"},
"contains": [
{"required": ["propA"]},
{"required": ["propB"]}
]
}
Valid:
[{"propA": true}, {"propB": true}]
[{"propA": true}, {"propA": true, "propB": true}]
Invalid:
[]
[{"propA": true}]
- no match for second entry[{"propA": true, "propB": true}]
- entries incontains
must describe different items
Concerns
Implementation
The plain-schema case is simple.
The array case is equivalent to Hall's Marriage Theorem. There are relatively efficient solutions for the general problem - but, I suspect a brute-force search will be surprisingly effective and efficient (due to the relatively small number of entries in contains
).
It may or may not be worth warning schema authors about stuffing hundreds of entries into contains
, because a naive implementation could easily end up having O(n3m) complexity.
Complexity of understanding (for humans)
Behaviour for the array for may be slightly complicated. For example:
{
"type": "array",
"contains": [
{"enum": ["A", "B"]},
{"enum": ["A", "B", "C"]},
{"enum": ["A", "D"]},
]
}
In this case, ["A", "B", "C"]
is valid.
However, this is not due to the syntax - it's simply a complex constraint.