Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow deferring reference resolution to support mutually referencing bundles #82

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

handrews
Copy link
Contributor

@handrews handrews commented Mar 21, 2023

EDIT: As a complement or alternative to this PR, PRs #83 and #85 provide near-parity with the status quo for complex mutual references including bundles — this PR adds the ability to handle sets of schemas where:

  • the "$id" contents are not known in advance, preventing configuration of a Source to retrieve them
  • mutual references are present, preventing direct instantiation

The sources added in PR #85 can be complex to configure, so despite this PR's own complexity, it offers an easier way to manage certain scenarios.

In particular, it enables building generic schema-oriented tools that will have to work with arbitrary sets of schemas provided by whatever code calls the tools (this is the use case that is of particular interest to me, as someone who mostly works on generic tools).


Original comment for this PR:'

Fixes #76, preserving backwards compatibility and the guarantee that schemas will not encounter a reference resolution error when evaluated.

  • Add resolve_references parameter to JSONSchema.__init__(), Catalog.__init__(), and create_catalog(), with the latter two passing through the value to the former whenever relevant. All default to True, preserving the current behavior
  • Add a JSONSchema.references_resolved data member to track the resolution state
  • Promote JSONSchema._resolve_references() to a public method JSONSchema.resolve_references, which checks self.references_resolved` to avoid a very expensive no-op if called twice
  • Add Catalog.resolve_references() to resolve all references within a cache, including resolving references to schemas added to the cache during reference resolution, iterating until all are resolved; it takes an optional cacheid parameter with the usual defaulting behavior
  • Check self.references_resolved in JSONSchema.evaluate() and raise a JSONSchemaError if references have not been resolved

For more details on usage, see the added tutorial page. This PR contains three commits:

  1. Adding the basic JSONSchema behavior, except for the change to evaluate() — this is the fundamental approach, and is sufficient for technical compliance. However, it is not very convenient in many ways
  2. Adding the Catalog behavior, allowing dynamic loading to use this behavior, and allowing a schema cache-level resolution to bring a cache into alignment with how it would be if the normal reference resolution had taken place — this provides all the tools necessary to replicate the current experience as a two-step rather than one-step process
  3. Add the JSONSchemaError if evaluate() is called with unresolved references

It would be trivial to replace commit 3 with a call to the catalog's schema cache-level resolve_references(), and I have that available and tested if that behavior would be preferred. It would mean that evaluate() could raise a CatalogError on the first invocation, which seemed like a bigger change. Plus the first evaluate() would be much slower. So I went with this more explicit approach.

At one point, in the 2nd commit I had the catalog tracking which schemas were and were not resolved, but this introduced substantial complexity to the code so I dropped it. I can post it as an alternative if desired. I suspect the performance was worse, except possibly in the case where a very large schema cache has a very small number of unresolved schemas, in which case the PR as written will spend extra time needlessly calling resolve_references() on a lot of schemas. But this is probably rare.

At an earlier point, I tried to make a call to JSONSchema.resolve_references() cascade through its keywords to resolve any schemas that it referenced, etc. This was even more complicated (and required #78, which this current PR does not), so I dropped it. Again, I can post it as an alternative if desired.

As it stands, this PR is the most minimal way I could think of to satisfy the specification requirements regarding bundled schemas.

@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.07% 🎉

Comparison is base (e6f71d0) 92.73% compared to head (be71a71) 92.80%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #82      +/-   ##
==========================================
+ Coverage   92.73%   92.80%   +0.07%     
==========================================
  Files          23       23              
  Lines        2052     2072      +20     
  Branches      435      440       +5     
==========================================
+ Hits         1903     1923      +20     
  Misses         97       97              
  Partials       52       52              
Files Changed Coverage Δ
jschon/__init__.py 90.00% <100.00%> (ø)
jschon/catalog/__init__.py 98.29% <100.00%> (+0.12%) ⬆️
jschon/jsonschema.py 92.01% <100.00%> (+0.22%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@handrews
Copy link
Contributor Author

Notably, making resolve_references() a public interface also helps with #99, as when a JSON Schema is embedded in another format, the check for root to determine whether to resolve references does not work, and it is necessary to call it externally.

@handrews
Copy link
Contributor Author

Looking this over for the first time in several months, with the benefit of working on oascomply, I'm still pretty satisfied with this. Making resolve_references() part of the public API is beneficial as I need to customize reference resolution to integrate with other parts of the package.

Currently I've essentially re-implemented this outside of the catalog, without having realized that that's what I was doing. There is also added complexity (that isn't and shouldn't be addressed in this PR) that the "metaschema" for OpenAPI documents is different depending on the context of the reference. I'll go deeper into that in a thread in #108 soon-ish.

But while I'm very much open to suggestion for anything better, this approach has held up well.

This adds a constructor option to defer reference resolution
until a later call to the now-public method resolve_references()
Constructor and create_catalog parameter gets passed through
to all JSONSchema constructor calls.  It defaults to True,
preserving backwards compatiblity.

The resolve_references() method calls resolve_references()
on all schemas in the given cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support instantiating mutually referencing schema bundles
2 participants