-
Notifications
You must be signed in to change notification settings - Fork 331
Significant performance regression in 1.0.80 #721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@felixkrull-neuland Thanks a lot for raising this issue. We will fix it and hope you can retest and verify the fix. |
@stevehu @felixkrull-neuland I'm investigating |
@stevehu @felixkrull-neuland An update... The HashSets were definitely a problem so I replaced them with ArrayLists but that only reduced the time about 33%. The other time sink was creating EnumSet for each keyword every time a schema is created (about 15 million times in the performance tests). I reworked this to cache the results and that reduced the time another 20-25%. Unfortunately, the overall time is still ~50% higher than what we get from 1.0.79. The best I can tell, this is due to the increased coverage obtained by not short-circuiting the anyOf keywords. I'm currently investigating ways to optimize PropertiesValidator and AdditionalPropertiesValidator, which are consuming the bulk of the remaining time. |
@felixkrull-neuland Please try 1.0.81 and let us know if this is still an issue. |
1.0.81 is better than 1.0.80, but it's still much slower than 1.0.79 for us, maybe two to three times slower on average and much worse worst-case behaviour. My thinking right now is that we're gonna stick with 1.0.79 for now and find a way to not have to validate hundreds of megabytes of JSON all the time (which we were planning to do anyway). I appreciate the optimization efforts, but given the new version does more/is more correct AFAIU, I don't know if it's feasible to make validation as fast as we would like it to be. |
Thanks for providing this library as open source software for the world to use. We've seen the same significant regression from 1.0.79 to 1.0.80 and 1.0.81 in an application which parses between ten and a few hundred megabytes of JSON at a time. This performance reduction is so significant that I think the description of this library in the README needs to be reworded significantly. I would be extremely surprised by the performance of version 1.0.81 if I chose this library based on the description in the README today. I appreciate the performance improvements that have been made, and 1.0.81 is much better than 1.0.80 in that regard. But it is still not possible for us to use these two versions any more. |
We want to resolve this issue, but it is very hard for us to replicate it as we don't have any example of a big schema or a big JSON for testing. Does anybody know where we can find a matching schema with a big JSON? Thanks. |
@stevehu @felixkrull-neuland @LemurP I'm reopening this issue so that I can submit additional changes to address this issue. What makes this difficult in addition to our not having good samples of large JSON instances, it that there is a lot of variability in @felixkrull-neuland Can you provide an obfuscated version of a schema that is significantly slower? If so, I think I can generate JSON instances from random data. We have added a lot of additional coverage from the JSON Schema Test Suite (~4000 tests now pass that did not before). The biggest change is the support for It would also help to know if you are enabling |
…der to improve performance. Resolves #721
* Simplifies how evaluated properties and array items are tracked in order to improve performance. Resolves #721 * Corrects issue with deserializing JSON Schema Test Suite tests. Resolves #804 * Adds configuration parameters to disable unevaluatedItems and unevaluatedProperties analysis --------- Co-authored-by: Faron Dutton <[email protected]>
We're seeing a significant performance regression after updating from 1.0.79 to 1.0.80 when validating large JSON documents ("large" = up to 300 MB non-pretty-printed). This regression is so severe that we had to revert to 1.0.79 because it makes 1.0.80 unusable for us.
The regression is also clearly visible with https://github.com/networknt/json-schema-validator-perftest, even if less extreme (both runs are on my dev laptop using the same JVM etc):
I profiled my test program and found that a lot of time is being spent on HashSet operations in


CollectorContext.copyEvaluatedProperties
andRefValidator.validate
. I'm suspecting the problems were introduced by #714 since it touched exactly those files. I have some screenshots of IntelliJ's profiler results to illustrate this, but not much else unfortunately:Unfortunately, I can't share the schema or the documents, and crafting a synthetic example would be a lot of work that I'd rather like to avoid.
(Yes, I did see #564, but to be quite honest I didn't want to lump in such a significant regression from one release to the next with a year-old issue, even if they are related.)
The text was updated successfully, but these errors were encountered: