What is "$ref" and how does it work?

The question of whether `$ref` behaves like inclusion or delegation has come up several times.  

* Inclusion would mean that the `$ref` object can be replaced with its target (in a lazily evaluated process).
* Delegation means that the target of the `$ref` is processed against the current instance location, and the "results" (boolean assertion outcome and optionally the collected annotations) of `$ref` are simply the results of the target schema.

## Inclusion

In its original form, `$ref` in draft-03 and in the separate JSON Reference I-D is explicitly defined as inclusion.  Implementations MAY choose to replace the reference object with its target.

There are some subtleties involved in replacement.  You definitely need to adjust the `$id` when you do the replacement or else base URIs get messed up.  Tools such as [JSON Schema Ref Parser](https://github.com/BigstickCarpet/json-schema-ref-parser) that "dereference" `$ref`s do just that.

You also need to deal with `$schema`, which (when the target is in a different schema document) can be different in the source and target schemas.  The obvious solution is to set `$schema` in the copied-over replacement, however @epoberezkin has observed that this conflicts with how non-schema instances work with meta-schemas.

We cannot make any assumption about instance documents.  A key feature of JSON Schema is that it works with plain old `application/json` documents.  There is no way for a plain JSON document to change what schema is used to process it.  Changing `$schema` in the middle of validating a schema against its meta-schema introduces behavior that is not possible with other instance documents.

## Delegation

With delegation, these problems do not exist.  Each subschema is evaluated in the context of its containing schema document, regardless of whether processing reached it from elsewhere in that same document or from a `$ref` in a separate document.  Since the processing is done per-document, each document can use a different `$schema`.

Results are returned in the form of a single overall boolean assertion outcome (so it doesn't matter to the referencing document what the assertions were or how they were processed) and optionally a set of annotation data (which is a set of name-value pairs of some sort).

The only subtlety is in combining data for the same annotation that appeared in both a local document subschema and a remote `$ref`'d document subschema.  However, this is easily addressed:  once the annotation data is "returned" across the `$ref`, it is combined with other annotation data by the rules of the schema containing the `$ref`.  This keeps all processing consistent within each schema document, such that the rules can change independently on each side (for instance if they are upgraded to a new draft at different times).

Another nice property of delegation (also from @epoberezkin) is that `$ref` can become a "normal" keyword that has assertion and/or annotation results that are combined with adjacent schema keywords just like everything else.

## Alternate approaches

***NOTE:** This section is about showing that it is possible to handle the limitations in other ways.  *Neither of these approaches is a serious proposal for recommendation!**

### An alternate approach to inclusion

The one use case that is not well-handled by delegation is that of packing multiple schema documents into a single distribution unit (file, resource, whatever).  There's some debate as to how valid or important this use case is, but it does come up.  This is only done by replacing specific non-cyclic `$ref`s, and does not involve trying to "dereference" all `$ref`s in the system.

For those who really want to do this, it occurs to me that there is another way to handle it:  [`data:` URIs](https://tools.ietf.org/html/rfc2397):

Let's say that this is our reference target:

```JSON
{
    "$schema": "http://json-schema.org/draft-06/schema#",
    "propertyNames": {"pattern": "^foo"}
}
```

Here we see how it can be inlined into a draft-04 schema using a `data:` URI:

```JSON
{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "allOf": [{
        "$ref": "data:application/schema+json,%7B%22%24schema%22%3A%20%22http%3A//json-schema.org/draft-06/schema%23%22%2C%20%22propertyNames%22%3A%20%7B%22pattern%22%3A%20%22%5Efoo%22%7D%7D"
    }]
}
```

Of course this looks horrible and would never be suitable for human consumption.  But it would work.

I'm not seriously advocating this as a recommendation, but it does illustrate that the problem is solvable in the delegation model if you are willing to make some tradeoffs.

### An alternate way to change processing rules

The limitation of replacement is that changing `$schema` in the middle of a file make validating schemas against meta-schemas different from validating other instances against schemas, as we cannot specify a schema change mechanism for plain `application/json` instance documents.

However, we could extend the instance-to-schema linking process to allow associating an instance JSON Pointer with each linked schema.  For schemas-as-instances, this would be equivalent to setting `$schema` at the location identified by the JSON Pointer.

This would have to be done as a link attribute/media type parameter thing of some sort.  Like the `data:` URI solution, this gets ugly pretty quickly, and makes it more likely to hit HTTP header length limitations.

It's also arguably a lot harder to hold in one's head than simply saying that `$schema` is scoped to a JSON document rather than an individual schema object.  While the data URI alternative given above is ugly, the ugliness comes from a separate standard, and is purely opt-in.  Schema authors have to choose to use those URIs, and implementations would choose whether to support data URIs or not.

This approach of targeting different schemas at different portions of the instance would place a significant burden on all implementations.

## Conclusions

Based on this write-up, I am inclined to formalize `$ref` as delegation.  It is a more flexible model, and the technique for working around its limitations correctly restricts the burden of implementation to only those who choose to use or support it.  Inclusion is less flexible, and the workaround (if we did anything about it at all) is burdensome to all implementations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What is "$ref" and how does it work? #514

Inclusion

Delegation

Alternate approaches

An alternate approach to inclusion

An alternate way to change processing rules

Conclusions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

What is "$ref" and how does it work? #514

Description

Inclusion

Delegation

Alternate approaches

An alternate approach to inclusion

An alternate way to change processing rules

Conclusions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions