Skip to content

Fix $ref to $id #107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Nov 29, 2021
Merged

Fix $ref to $id #107

merged 15 commits into from
Nov 29, 2021

Conversation

mati-o
Copy link
Contributor

@mati-o mati-o commented Sep 29, 2021

Hi,
I am attempting to fix #55 and microsoft/vscode#92042. I have implemented a simple workaround to detect whether a $ref points to an $id and then I attempt to find the definition behind that $id.

I have implemented it inside findSection of resolveSchemaContent as I though it would be the simplest low-level solution. I have also added some unit tests. Everything works just as before - all tests pass!

I'll be more than happy to see this PR accepted, let me know if anything else is required.

@ghost
Copy link

ghost commented Sep 29, 2021

CLA assistant check
All CLA requirements met.

@mati-o mati-o changed the title Fix ref to $id Fix $ref to $id Sep 29, 2021
@KapitanOczywisty
Copy link
Contributor

KapitanOczywisty commented Sep 29, 2021

$id could be used in any place, not only in the definitions, even after referencing node. Allowing to use $id only in definitions is a step in the right direction, but still very limited upgrade.

I'd probably in resolveRefs function collect all $id and unresolved $refs (when $id occurs after $ref) to resolve them after walking all nodes.

{
  "type": "object",
  "properties": {
    "p1": {
      "$ref": "#hello"
    },
    "p2": {
      "$ref": "#world"
    },
    "p3": {
      "id": "#hello",
      "type": "string",
      "const": "hello"
    },
    "p4": {
      "type": "object",
      "properties": {
        "deep": {
          "$id": "#world",
          "type": "string",
          "const": "world"
        }
      },
      "additionalProperties": false
    }
  },
  "additionalProperties": false
}

Also there should be a check with node referencing their parent, to test recursion:

{
  "type": "object",
  "definitions": {
    "foo": {
      "id": "#foo",
      "type": "object",
      "properties": {
        "bar": {
          "type": "string",
          "const": "hello"
        },
        "foo": {
          "$ref": "#foo"
        }
      },
      "additionalProperties": false
    }
  },
  "properties": {
    "foo": {
      "$ref": "#foo"
    }
  },
  "additionalProperties": false
}
{
  "foo": {
    "foo": {
      "foo": {
        "bar": "hello"
      }
    }
  }
}

And lastly I'd add some test when $ref is referencing not existing $id to test correct error handling.

@mati-o
Copy link
Contributor Author

mati-o commented Sep 29, 2021

I'll take a look later, it won't hurt to add that test. As far as supporting $id everywhere, while true, I have supported definitions since afaik it's a common practice. I'll try to see how it is possible to implement with minimal performance penalty, probably like you said - collect them during resolution. Hope it won't be complicated 😅

Comment on lines +436 to 452
const merge = (target: JSONSchema, section: any): void => {
for (const key in section) {
if (section.hasOwnProperty(key) && !target.hasOwnProperty(key)) {
(<any>target)[key] = section[key];
}
}
};

const mergeByJsonPointer = (target: JSONSchema, sourceRoot: JSONSchema, sourceURI: string, refSegment: string | undefined): void => {
const path = refSegment ? decodeURIComponent(refSegment) : undefined;
const section = findSection(sourceRoot, path);
if (section) {
for (const key in section) {
if (section.hasOwnProperty(key) && !target.hasOwnProperty(key)) {
(<any>target)[key] = section[key];
}
}
merge(target, section);
} else {
resolveErrors.push(localize('json.schema.invalidref', '$ref \'{0}\' in \'{1}\' can not be resolved.', path, sourceURI));
}
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored merge to mergeByJsonPointer which calls findSection and the actual merge which just copies the properties over. I did it since when I resolve a subschema, I don't need to look for it since I have the object in hand, but merge is still required.

Comment on lines 466 to 483
const pendingSubSchemas: Record<string, JSONSchema[]> = {};
const resolvedSubSchemas: Record<string, JSONSchema> = {};

const tryMergeSubSchema = (uri: string, target: JSONSchema) : boolean => {
if (resolvedSubSchemas[uri]) { // subschema is resolved, merge it to the target
merge(target, resolvedSubSchemas[uri]);
return true; // return success
} else { // This subschema has not been resolved yet
if (!pendingSubSchemas[uri]) {
pendingSubSchemas[uri] = [];
}

// Remember the target to merge later once resolved
pendingSubSchemas[uri].push(target);
return false; // return failure - merge didn't occur
}
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the heart of the solution. It basically keeps two maps for subschema references. One for those subschemas that we may or may not encounter later and the latter for those subschemas we have resolved.

The tryMergeSubSchema merges already resolved subschemas into the targets that $ref them, or keeps a record of the target object against the full URI of the subschema for later use.

Comment on lines 588 to 617
const handleId = (next: JSONSchema) => {
// TODO figure out should loops be preventse
const id = next.$id || next.id;
if (typeof id === 'string' && id.charAt(0) === '#') {
delete next.$id;
delete next.id;
// Use a blank separator, as the $id already has the '#'
const fullId = reconstructRefURI(parentSchemaURL, id, '');

if (!resolvedSubSchemas[fullId]) {
// This is the place we fill the blanks in
resolvedSubSchemas[fullId] = next;
} else if (resolvedSubSchemas[fullId] !== next) {
// Duplicate may occur in recursive $refs, but as long as they are the same object
// it's ok, otherwise report and error
resolveErrors.push(localize('json.schema.duplicateid', 'Duplicate id declaration: \'{0}\'', id));
}

// Resolve all pending requests and cleanup the queue list
if (pendingSubSchemas[fullId]) {
while (pendingSubSchemas[fullId].length) {
const target = pendingSubSchemas[fullId].shift();
merge(target!, next);
}

delete pendingSubSchemas[fullId];
}
}
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function resolves the actual subschema, whenever it encounters an $id or id in an object. It keeps a record of the full URI and the object and resolves any pending requests for that subschema.

Comment on lines 496 to 512

// A placeholder promise that might execute later a ref resolution for the newly resolved schema
let externalLinkPromise: Thenable<any> = Promise.resolve(true);
if(!isSubSchemaRef(refSegment)) {
// This is not a sub schema, merge the regular way
mergeByJsonPointer(node, unresolvedSchema.schema, uri, refSegment);
} else {
// This is a reference to a subschema
const fullId = reconstructRefURI(uri, refSegment);

if (!tryMergeSubSchema(fullId, node)) {
// We weren't able to merge currently so we'll try to resolve this schema first to obtain subschemas
// that could be missed
externalLinkPromise = resolveRefs(unresolvedSchema.schema, unresolvedSchema.schema, uri, referencedHandle.dependencies);
}
}
return externalLinkPromise.then(() => resolveRefs(node, unresolvedSchema.schema, uri, referencedHandle.dependencies));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we have referenced a subschema in an external schema i.e schema1 -> $ref -> schema2#foo, we'll need to find schema2's #foo so we trigger schema2 resolution eagerly using that externalLinkPromise.

@mati-o
Copy link
Contributor Author

mati-o commented Oct 1, 2021

I think I have figured this out! See the comments in the code too, hope it is clear enough. Let me know if more tests are needed

@aeschli
Copy link
Collaborator

aeschli commented Nov 12, 2021

@mati-o Thanks a lot for the PR, it looks good!
I'm quite new to the feature but read https://json-schema.org/understanding-json-schema/structuring.html

  • is $anchor the same as $id? Do we need to support it for refs?
  • a guess is more work needed for bundling?

@mati-o
Copy link
Contributor Author

mati-o commented Nov 13, 2021

  • is $anchor the same as $id? Do we need to support it for refs?

Apparently it is, never seen one using this though in my practice. Guess it could be addressed at some point later.

  • a guess is more work needed for bundling?

I've never seen this form of schema but it indeed requires some effort. In our regular $ref case, we could split everything using the # character easily. However in this case it's trickier to know which / is part of the original URI and which isn't, without some relative URL code. I think it would be better to wait for someone to complain about it :)

@aeschli
Copy link
Collaborator

aeschli commented Nov 15, 2021

I pushed two tests that show a problem when resolving $id across schemas.

The problem is when resolving schemas the $id nodes are removed. They can't be found anymore when the same schema is used a second time.
A solution would be to build up a map of id to node that we remember along with the resolved (or unresolved schema).

@mati-o
Copy link
Contributor Author

mati-o commented Nov 15, 2021

I pushed two tests that show a problem when resolving $id across schemas.

The problem is when resolving schemas the $id nodes are removed. They can't be found anymore when the same schema is used a second time.

A solution would be to build up a map of id to node that we remember along with the resolved (or unresolved schema).

I'll give it a look towards the end of the week day :), sounds like an easy fix

while (toWalk.length) {
const next = toWalk.pop()!;
if (seen.indexOf(next) >= 0) {
continue;
}
seen.push(next);
handleId(next);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aeschli I was thinking about the two tests you have added and since I already had a map from the full URI to the full schema, it got me thinking. The root cause of the problem was that the handleRef function was called before handleId, thus all the schemas with $ref and $id together were missing that $ref. By simply changing the order between them I have managed to solve these two cases without compromising the rest (luckily).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to remember the full URI to schema mapping also with the resolved schema. Resolving happens on the unresolved schema nodes, so we don't have the information afterwards anymore when the schema is used again.
I added a test case for that: Resolving external $ref to already resolved schema

while (toWalk.length) {
const next = toWalk.pop()!;
if (seen.indexOf(next) >= 0) {
continue;
}
seen.push(next);
handleId(next);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to remember the full URI to schema mapping also with the resolved schema. Resolving happens on the unresolved schema nodes, so we don't have the information afterwards anymore when the schema is used again.
I added a test case for that: Resolving external $ref to already resolved schema

@aeschli aeschli force-pushed the fix-ref-to-id branch 2 times, most recently from 278c86f to a4cc925 Compare November 29, 2021 14:48
@aeschli
Copy link
Collaborator

aeschli commented Nov 29, 2021

I pushed a change to keep the map of resolved ids in the schema handle.

Sorry I had to 'force push' as merge always ended up with too many changes.

@aeschli aeschli added this to the November 2021 milestone Nov 29, 2021
@aeschli aeschli self-assigned this Nov 29, 2021
@aeschli
Copy link
Collaborator

aeschli commented Nov 29, 2021

I think the current state is ready for testing and useful. so I'm pushing the PR.
We can test the new feature in the upcoming test week.

@mati-o Thanks for all your help!

Of course, there's still some gaps to fill. What comes to my mind:

if you are interested in continuing to help, that would be fantastic! Let's continue in a new PR then.

@aeschli aeschli merged commit 98cf9e0 into microsoft:main Nov 29, 2021
@mati-o
Copy link
Contributor Author

mati-o commented Dec 1, 2021

Thanks for merging! Glad I could help. I've been busy in the past two weeks so sorry for not being able to assist. I hope to find time in the near future to assist more

tryMergeSubSchema(fullId, next);
const id = segments[1];
if (id !== undefined && isSubSchemaRef(id)) { // A $ref to a sub-schema with an $id (i.e #hello)
tryMergeSubSchema(next, id, handle);
Copy link

@michalchar michalchar Dec 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aeschli Pay attention you've replaced parentSchemaURL with handle.
Therefore local refs in external schemas wont work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, created #121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

$ref should accept $id references
4 participants