Description
Per RFC 3986 section 4.3:
Some protocol elements allow only the absolute form of a URI without
a fragment identifier. For example, defining a base URI for later
use by relative references calls for an absolute-URI syntax rule that
does not allow a fragment.
Let's please follow this advice and require $id
to resolve to an absolute URI, without a fragment. By this I mean that it cannot contain a fragment, but MAY be a relative URI reference (resolved in the same way that it is currently). If it makes things easier, rather than "MUST NOT contain a fragment", we can say that "any fragment MUST be ignored." Either option makes it absolutely clear that fragments in $id
have no useful effect.
Currently, there are three behavioral cases of fragments in $id
:
"$id": "#foo"
: Plain name fragment definitions. We can replace with"$anchor": "foo"
, analogous to fragment definition keywords in several other media types. Very easy to explain, clearly separate from any concerns over base URIs or embedded documents."$id": "https://example.com/root#"
: Empty fragment in root schema object$id
s. This has absolutely no effect compared to the same thing without a fragment at all. It primarily exists as an explicitly supported case because older meta-schemaid
s were written this way, and we've never bothered to change that.- ALL other cases have undefined behavior!
Per section 8.2.3 of the current draft
The effect of using a fragment in "$id" that isn't blank or doesn't
follow the plain name syntax is undefined. [[CREF3: How should an
"$id" URI reference containing a fragment with other components be
interpreted? There are two cases: when the other components match
the current base URI and when they change the base URI. ]]
This note explicitly makes the behavior of JSON Pointer-fragment-only $id
s undefined, and notes that we have no idea how fragment-plus-other-component $id
s behave, meaning that their behavior is also undefined in the current spec.
So there are several problems with the current behavior which this addresses:
- We are currently going against the best practice for base URI elements, which complicates explaining how base URIs work to those that don't already understand them.
- By having one form of fragment-only URI references perform a useful function, we give the impression that fragments in
$id
as a whole should work. However, this is incorrect- every other non-trivial usage has undefined behavior. Although some of it looks like it should work, at least to some people. - Because the two well-defined behaviors (setting the self identifier which is also the base URI vs defining a plane name fragment) are, due to the rules around base URIs and fragments, apparently completely separate from each other, schema authors and implementors find
$id
confusing. It requires a fairly deep understanding of RFC 3986 to understand why these behaviors can possibly work in the same keyword at all.
Removing fragments from $id
(and using $anchor
for plain name fragment definition instead) solves these problems.
- We would be in line with the informal RFC 3986 suggestion regarding base URI elements
- Saying that fragments in
$id
MUST be ignored matches how base URIs are computed in RFC 3986 - We would (not coincidentally) now have a self/base-setting element that behaves analogous to such elements in other media types, where fragments are either forbidden or clearly have no purpose
- To the relative layperson, there would no longer appear to be two disjoint use cases for
$id
Some of the fragment-in-$id
cases that technically have undefined behavior happen to be effective no-ops, as with the the empty fragment case. (Recall that the empty fragment is technically a JSON Pointer for the entire document).
So sometimes we see schemas like this generated by tools:
{
"definitions": {
"foo": {
"$id": "#/definitions/foo"
}
}
}
which is basically a no-op (note that the $id
fragment matches the actual position). So if we say that fragments in $id
MUST be ignored, its's still a no-op and that's fine. I would prefer an implementation to raise an error in this case going forward, but I don't feel strongly about that.
My point here is that the only other fragment usage I've seen in the wild doesn't actually do anything.
@awwright if I'm reading #719 (comment) you are more or less OK with this idea? I agree that the names could be better, but keeping $id
for the self+base form is less disruptive.
@jdesrosiers you are encouraged to comment on the merits of this if you'd like, but please do not discuss your approach from #724 here, as we have already determined (in #179) that mixing the two discussions confuses everyone.
Everyone: fair warning, I feel very strongly about this. Unlike #726 (Eliminate base URI shadowing), which I think we should do but if we don't that's fine, I'm incredibly fed up with the problems with $id
and feel like many of them come from having such a wide rage of syntactically valid values that have undefined semantics.
If you want to keep the current situation of exactly one use of fragments in $id
having defined semantics, and all other possible uses having undefined semantics, please make an active case for why such a thing is desirable.
Yes, changing "$id"
to "$anchor"
is a breaking change. That is unfortunate. But I feel like $id
as it stands is so problematic that this is worthwhile. And importantly, this just forbids one part of $id
's current syntax, and leaves the rest of $id
's well-defined behavior alone.
Although perhaps @Julian can speak to how difficult this would be in his implementation, as I think he found the prior change of id
to $id
particularly disruptive.