Skip to content

"$id": Eliminate base URI shadowing #726

Closed
@handrews

Description

@handrews

"URI shadowing" (h/t to @johandorland for the term) refers to the scenario when there are multiple $ids in a schema, with at least one $id in a subschema of another object containing $id (such a the root schema object).

When did this become a thing?

Here is the schema and various resolved URI examples, copied and pasted directly from the current spec:

{
       "$id": "http://example.com/root.json",
       "definitions": {
           "A": { "$id": "#foo" },
           "B": {
               "$id": "other.json",
               "definitions": {
                   "X": { "$id": "#bar" },
                   "Y": { "$id": "t/inner.json" }
               }
           },
           "C": {
               "$id": "urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f"
           }
       }
   }

   The schemas at the following URI-encoded JSON Pointers [RFC6901]
   (relative to the root schema) have the following base URIs, and are
   identifiable by any listed URI in accordance with Section 5 above:

   # (document root)

         http://example.com/root.json

         http://example.com/root.json#

   #/definitions/A

         http://example.com/root.json#foo

         http://example.com/root.json#/definitions/A

   #/definitions/B

         http://example.com/other.json

         http://example.com/other.json#

         http://example.com/root.json#/definitions/B

   #/definitions/B/definitions/X

         http://example.com/other.json#bar

         http://example.com/other.json#/definitions/X

         http://example.com/root.json#/definitions/B/definitions/X

   #/definitions/B/definitions/Y

         http://example.com/t/inner.json

         http://example.com/t/inner.json#

         http://example.com/other.json#/definitions/Y

         http://example.com/root.json#/definitions/B/definitions/Y

   #/definitions/C

         urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f

         urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f#

         http://example.com/root.json#/definitions/C

Notice that all of the locations (the list section headers) are done as JSON Pointer fragments, relative to the root of the entire example schema.

Prior to draft-handrews-json-schema-01, the analogous example section did not show any examples with JSON Pointer fragments. Using the exact same schema, it showed these resolved URIs only:

   # (document root)  http://example.com/root.json#

   #/definitions/A  http://example.com/root.json#foo

   #/definitions/B  http://example.com/other.json

   #/definitions/B/definitions/X  http://example.com/other.json#bar

   #/definitions/B/definitions/Y  http://example.com/t/inner.json

   #/definitions/C  urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f

But you can see that the locations were done the same way- as a relative JSON Pointer fragment from the overall document root.

The key point here is that we added examples of URI shadowing in handrews-*-01, which was a clarification of draft-07 (draft-07 was originally published as handrews-*00). So we didn't even introduce these with draft-07. There was some confusion over it (and other aspects of $id and $ref) and we added the examples when we clarified the text.

Why did we do this?

Good question. The change was made in PR #550, which was primarily about removing the terminology around "internal references" vs "external references". Which had itself been a great improvement by @awwright over the draft-04 language around "inline resolution" and "canonical resolution". None of these prior approaches directly explained how JSON Pointer fragments work across the presence of $id.

The issue mentioned by that PR only mentions JSON Pointer fragments once, at the end, after the PR was submitted: #545 (comment)

I know I had questions over how that should be handled when I tried implementing what were then the new draft-06 proposals. And I know that I ended up implementing URI shadowing. I don't remember why.

Quite a few people weighed in on and reviewed #545 and #550, so this isn't something that slipped in by accident.

Did anyone rely on this before handrews-*-01?

No clue.

Did anyone actually implement what's in handrews-*-01?

I think @johandorland did? He commented on the issue/pr. Not sure if anyone else did. @Julian?

Does the test suite test this?

No.

So, really, why did we do this?

I think the rationale was that since the locations in the existing example were given as JSON Pointer fragments, that must mean that those fragments were valid, which would definitely imply that every parent $id introduces a base URI that must be tracked throughout all subschemas, even if another $id appears.

Can we just not?

I think so. JSON Pointer fragment evaluation is just defined in terms of JSON document structure, without any thought given to changing base URIs or embedding one document in another. But media types get to specify fragment syntax and semantics, so I think we can reasonably say that it stops when the base URI is reset, and from that point on, the prior base URI no longer applies at all.

That is actually how we interpret plain name fragments. Note in this part of the example:

   #/definitions/B/definitions/X

         http://example.com/other.json#bar

         http://example.com/other.json#/definitions/X

         http://example.com/root.json#/definitions/B/definitions/X

the #bar plain name fragment can be used with the innermost base http://example.com/other.json, but not with the outer base http://example.com/root.json.

So restricting the use of JSON Pointer fragments to not cross $id boundaries would make their behavior more consistent with how we handle plain name fragments.

Similar to #724, this considers $id to establish a document boundary, with the new base URI it creates applying to the document within that boundary. Unlike #724, what is proposed here does not make any other changes to $id's behavior. In particular, the effect of an $id with a JSON Pointer fragment remains undefined, although we can address that separately later if we want to.

So what would this look like?

There are two options WITHIN THE SCOPE OF THIS ISSUE. If you want to talk about other options, file them yourself :-) Comments that go off-topic here will be deleted.

Here's the example schema again so you don't have to scroll so much:

{
       "$id": "http://example.com/root.json",
       "definitions": {
           "A": { "$id": "#foo" },
           "B": {
               "$id": "other.json",
               "definitions": {
                   "X": { "$id": "#bar" },
                   "Y": { "$id": "t/inner.json" }
               }
           },
           "C": {
               "$id": "urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f"
           }
       }
   }
}

Option 1: Exactly one valid URI involving a JSON Pointer fragment for each location:

   # (document root)

         http://example.com/root.json

         http://example.com/root.json#

   #/definitions/A

         http://example.com/root.json#foo

         http://example.com/root.json#/definitions/A

   #/definitions/B

         http://example.com/other.json

         http://example.com/other.json#

   #/definitions/B/definitions/X

         http://example.com/other.json#bar

         http://example.com/other.json#/definitions/X

   #/definitions/B/definitions/Y

         http://example.com/t/inner.json

         http://example.com/t/inner.json#

   #/definitions/C

         urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f

         urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f#

This is a lot more straightforward, and all of these addresses work whether the schemas were loaded as one document (as in this example), or if each of root.json, other.json, and inner.json had been loaded separately, and $ref-ed each other.

Option 2: But what about the URI used to fetch the document?

If it's possible to retrieve this example from http://example.com/alias.json (as opposed to or in addition to its declared $id of http://example.com/root.json), then should fragments also work from that base?

I can see it either way.

One option is to consider fetching this from alias.json and discovering that its declared $id is root.json is to conceptualize this as a redirect, in which case the fragment is applied to the redirect target (at least in HTTP). h/t to @jdesrosiers for this mental model. In this sense, anywhere we have root.json we would also require supporting alias.json

On the other hand, JSON Schema repeatedly talks about URIs as identifiers rather than locators, so I think we could rationally say that the $id is how the schema needs to be referenced. It's fine if it's fetched from elsewhere outside of the reference process (like being pre-loaded from a local filesystem) but does that mean that we should support those loading URIs when an $id is provided? (obviously if no $id, then the loading URI is the only available base).

Supporting the loading URI as a shadowed base, but nothing else, might be a bit odd, or it might be more intuitive. I really have no idea at this point 😆

Metadata

Metadata

Assignees

No one assigned

    Labels

    clarificationItems that need to be clarified in the specificationcore

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions