Skip to content

Make $id conform to RFC 3986 suggestion for base URI elements #729

Closed
@handrews

Description

@handrews

Per RFC 3986 section 4.3:

Some protocol elements allow only the absolute form of a URI without
a fragment identifier. For example, defining a base URI for later
use by relative references calls for an absolute-URI syntax rule that
does not allow a fragment.

Let's please follow this advice and require $id to resolve to an absolute URI, without a fragment. By this I mean that it cannot contain a fragment, but MAY be a relative URI reference (resolved in the same way that it is currently). If it makes things easier, rather than "MUST NOT contain a fragment", we can say that "any fragment MUST be ignored." Either option makes it absolutely clear that fragments in $id have no useful effect.

Currently, there are three behavioral cases of fragments in $id:

  • "$id": "#foo": Plain name fragment definitions. We can replace with "$anchor": "foo", analogous to fragment definition keywords in several other media types. Very easy to explain, clearly separate from any concerns over base URIs or embedded documents.
  • "$id": "https://example.com/root#": Empty fragment in root schema object $ids. This has absolutely no effect compared to the same thing without a fragment at all. It primarily exists as an explicitly supported case because older meta-schema ids were written this way, and we've never bothered to change that.
  • ALL other cases have undefined behavior!

Per section 8.2.3 of the current draft

The effect of using a fragment in "$id" that isn't blank or doesn't
follow the plain name syntax is undefined. [[CREF3: How should an
"$id" URI reference containing a fragment with other components be
interpreted? There are two cases: when the other components match
the current base URI and when they change the base URI. ]]

This note explicitly makes the behavior of JSON Pointer-fragment-only $ids undefined, and notes that we have no idea how fragment-plus-other-component $ids behave, meaning that their behavior is also undefined in the current spec.


So there are several problems with the current behavior which this addresses:

  1. We are currently going against the best practice for base URI elements, which complicates explaining how base URIs work to those that don't already understand them.
  2. By having one form of fragment-only URI references perform a useful function, we give the impression that fragments in $id as a whole should work. However, this is incorrect- every other non-trivial usage has undefined behavior. Although some of it looks like it should work, at least to some people.
  3. Because the two well-defined behaviors (setting the self identifier which is also the base URI vs defining a plane name fragment) are, due to the rules around base URIs and fragments, apparently completely separate from each other, schema authors and implementors find $id confusing. It requires a fairly deep understanding of RFC 3986 to understand why these behaviors can possibly work in the same keyword at all.

Removing fragments from $id (and using $anchor for plain name fragment definition instead) solves these problems.

  • We would be in line with the informal RFC 3986 suggestion regarding base URI elements
  • Saying that fragments in $id MUST be ignored matches how base URIs are computed in RFC 3986
  • We would (not coincidentally) now have a self/base-setting element that behaves analogous to such elements in other media types, where fragments are either forbidden or clearly have no purpose
  • To the relative layperson, there would no longer appear to be two disjoint use cases for $id

Some of the fragment-in-$id cases that technically have undefined behavior happen to be effective no-ops, as with the the empty fragment case. (Recall that the empty fragment is technically a JSON Pointer for the entire document).

So sometimes we see schemas like this generated by tools:

{
    "definitions": {
        "foo": {
            "$id": "#/definitions/foo"
        }
    }
}

which is basically a no-op (note that the $id fragment matches the actual position). So if we say that fragments in $id MUST be ignored, its's still a no-op and that's fine. I would prefer an implementation to raise an error in this case going forward, but I don't feel strongly about that.

My point here is that the only other fragment usage I've seen in the wild doesn't actually do anything.


@awwright if I'm reading #719 (comment) you are more or less OK with this idea? I agree that the names could be better, but keeping $id for the self+base form is less disruptive.

@jdesrosiers you are encouraged to comment on the merits of this if you'd like, but please do not discuss your approach from #724 here, as we have already determined (in #179) that mixing the two discussions confuses everyone.


Everyone: fair warning, I feel very strongly about this. Unlike #726 (Eliminate base URI shadowing), which I think we should do but if we don't that's fine, I'm incredibly fed up with the problems with $id and feel like many of them come from having such a wide rage of syntactically valid values that have undefined semantics.

If you want to keep the current situation of exactly one use of fragments in $id having defined semantics, and all other possible uses having undefined semantics, please make an active case for why such a thing is desirable.

Yes, changing "$id" to "$anchor" is a breaking change. That is unfortunate. But I feel like $id as it stands is so problematic that this is worthwhile. And importantly, this just forbids one part of $id's current syntax, and leaves the rest of $id's well-defined behavior alone.

Although perhaps @Julian can speak to how difficult this would be in his implementation, as I think he found the prior change of id to $id particularly disruptive.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions