Description
Everybody seems to agree that base URI change is one area of the standard that:
- lacks compatibility between implementations even only in JavaScript
- takes substantial efforts to implement, even inconsistently
- rarely used
Let's think for a second what problems id and $ref solve. The only problem that really needs to be solved is schema re-use (and the usage practice shows it). If you were writing code, the file traditionally defines a namespace and all the symbols that can be accessed from outside of the file should be explicitly made public. Although @awwright cites the departure from the file model in some standard bodies, it doesn't seem to be relevant to software development and JSON, and the fact there is JSON-LD means little for JSON schema, there should be JSON-LD schema for it. Software developers will not depart from file model in a hurry and unless JSON-schema acknowledges it it would simply lose touch with them.
I suggest that to both enable code re-use and ensure the stability of it, we require explicitly declaring pointers to objects that can be used from outside of the schema file. I.e. not only suggesting to drop base URI change from the spec but also drop JSON pointer support in references. That would allow schema authors to have clearly defined symbols that can be used outside and at the same time have freedom to refactor and restructure the rest of their schemas as they wish. One of the valid arguments against $merge was that schema authors may want to prevent modification. This argument equally applies to the desire of popular schema authors to prevent direct access to some areas of the schema and only publish some access points that they would maintain in consistent way (like a schema public API).
The proposal is to:
- use one attribute as schema uri that identifies the schema globally and where it (optionally) can be retrieved from. E.g. $uri. This attribute can be used only once per schema file on the top level only and it is also a base URI for other references inside.
- use another attribute to define public names pointing to parts of the schema that can be re-used in other schemas. E.g. $id. This keyword value MUST be identifier (
^[a-z]+[a-z0-9_]*$
for public names that can be accessed from outside of the schema file and^_[a-z]+[a-z0-9_]*$
for private names that can only be used within the file) and its value MUST be unique within the file; redefining it would make schema invalid. The $id keyword should be used at the root of sub-schema that will be referenced by it. - references should use the format
<uri>#<id>
for references to other schemas (and<uri>
will be resolved based on $uri) and#<id>
for references within the file (which is consistent with $uri providing the base uri for resolution). Given that $ref implies using JSON pointer and using it violates isolation (by providing direct access into private code that can be changed without notice), the proposal is to drop $ref and instead use some other keyword as part of JSON-schema spec, e.g. $call and/or $include. Both keywords can be used with different meaning - $call would be validated in the context of the source schema and $include in the current context. $include is optional, $call is more or less what we have now.
I appreciate that this is the most radical change proposal to simplify schema re-use issue. Please consider it not comparing with what we have now, because we have a mess leading to the lack of compatibility, but from the point of view of existing software development practices. Modularisation, isolation, etc. are normal things in writing code, but for some reasons they are not available to schema authors who craft thousand line documents simply to avoid using $ref that is not consistently implemented and have no way to reliably expose anything less than the whole schema file (reliably = provide guarantee for consumers that it won't change without notice).
I think many of the proposals previously submitted here are focussed on theoretical aspects rather than on practical problems of users and implementations. E.g. @awwright repeatedly cites XML and HTML as inspiration, and I think that these arguments although theoretically correct are in essence fundamentally flawed and completely ignore the fact that JSON is on purpose a much simpler standard and that it is the main reason for its wide adoption. Given that this standard is JSON-Schema and not JSON-LD schema or XML-schema I don't see why arguments referring to the practices existing there should be seriously considered here, while the arguments referring to the actual usage practice of JSON-schema should be ignored. I suggest we ignore all references to XML/HTML practices as irrelevant for JSON Schema.
I would very much appreciate the feedback from @awwright @handrews @Relequestual @fge @jdesrosiers and in particular from the people who were implementing base URI change in existing JSON schema validators: @mafintosh @bugventure @AlexeyGrishin @atrniv @zaggino @automatthew @tdegrunt @Prestaul @natesilva @geraintluff @daveclayton @erosb @stevehu @Julian @hoxworth @hasbridge @justinrainbow @yuloh @JamesNK @RSuter @seagreen @sigu-399 (I see very few people from this list in these conversations which is another sign of standard deterioration and I don't think any decisions about changing the standard should be made without the wider involvement of people who create validators).