Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v7: schema compilation is safer but significantly slower than v6 - using standalone validation code #1386

Open
medikoo opened this issue Jan 7, 2021 · 6 comments
Labels

Comments

@medikoo
Copy link

medikoo commented Jan 7, 2021

In Serverless Framework we've just upgrade AJV from v6 to v7, and observed that ajv.compile(schema) execution time for very same schema jumped from ca 0.15s to 0.8s.

Is this a known issue?

It can be observed by running any Serverless Framework command, and measuring execution time at this line:

https://github.com/serverless/serverless/blob/bcbbd47fa09b7d99d7f8da3f11150215d1203bba/lib/classes/ConfigSchemaHandler/index.js#L106

What version of Ajv are you using? Does the issue happen if you use the latest version?

v7.0.3

Ajv options object

{ allErrors: true, coerceTypes: 'array', verbose: true, strict: false }

JSON Schema

It's a large dynamically generated schema, if needed I can export it to JSON and provide a link

Your code

const Ajv = require('ajv').default;
const ajv = new Ajv({ allErrors: true, coerceTypes: 'array', verbose: true, strict: false });
ajv.compile(schema);
@epoberezkin
Copy link
Member

epoberezkin commented Jan 7, 2021

v7 has the same (or, in many cases, more effective) generated validation code, but the code to generate this code is definitely substantially slower - the decision is to optimise for the safety of the code generation against injection via untrusted schemas, for maintainability of the code and for the reduced size of the bundle, at the cost of slower schema compilation.

In most cases, when you compile schema once and then use it many times in server side environment, the time it takes to compile is not important, it usually happens at application start-up. If it compiles on demand, or - what would be much worse - on every API call, it has to be changed - the only reason to compile schema to code is to compile once and execute many times.

You can reduce the schema compilation time by turning off optimization (it would reduce time by about 30% I think), but even 0.15 sec per validation (if it was the case with v6) is very slow - validation itself is much faster than compilation. The solution could be to compile schemas once, and save standalone validation code (either during the build or into the database, alongside the schema) - that is in case you are running it in lambda/serverless - instantiating Ajv and compiling schemas inside short-living environments is not a very good idea.

@epoberezkin epoberezkin changed the title v7 seems significantly slower than v6 v7: schema compilation is significantly slower than v6 Jan 7, 2021
@epoberezkin
Copy link
Member

Some docs that should be helpful:

For the context, the change in code generation: https://github.com/ajv-validator/ajv/blob/master/docs/codegen.md
Standalone validation code: https://github.com/ajv-validator/ajv/blob/master/docs/standalone.md

While v6 had inconsistent support for standalone code via a separate package ajv-pack, it is now included in this package as a separate file (that is Ajv itself does not use it but all schema tests are run both via Ajv API and via standalone code).

@epoberezkin epoberezkin changed the title v7: schema compilation is significantly slower than v6 v7: schema compilation is significantly slower than v6 - using standalone validation code Jan 7, 2021
@epoberezkin epoberezkin changed the title v7: schema compilation is significantly slower than v6 - using standalone validation code v7: schema compilation is safer but significantly slower than v6 - using standalone validation code Jan 7, 2021
@medikoo
Copy link
Author

medikoo commented Jan 8, 2021

@epoberezkin great thanks for explanations, that information is very helpful. I think going with standalone approach, might be a solution for us. One question though, document indicates that:

Ajv package should still be a run-time dependency for most schemas

Does it meant, that standalone is not really a "standalone" and still will require an ajv to be installed and accessible for generated code?

I'm asking as I was thinking about storing standalone results in cache folder that's outside of package context (and without access to it's dependencies as ajv), therefore if it'll require ajv internally, requiring it obviously will fail

Additional side note. I believe there's an error in doc. Following I think doesn't work anymore:

const Ajv = require("ajv") // version >= v7.0.0

As AJV moved to ESM we've noticed we need to require it as:

const Ajv = require("ajv").default

@epoberezkin
Copy link
Member

epoberezkin commented Jan 8, 2021

Does it meant, that standalone is not really a "standalone" and still will require an ajv to be installed and accessible for generated code?

Please see the rational behind it here: ajv-validator/ajv-cli#98 (comment)

It has indeed to be installed/available, but it should not be an issue, as Ajv would not be initialised / executed - only specific files.

I'm asking as I was thinking about storing standalone results in cache folder that's outside of package context (and without access to it's dependencies as ajv), therefore if it'll require ajv internally, requiring it obviously will fail

You can still do it with any other tool that would bundle the dependent files in, replacing any requires - it's just extra compilation step, but it would give you completely isolated function(s).

Following I think doesn't work anymore: const Ajv = require("ajv") // version >= v7.0.0

Yes - it have slipped through...

I don't like this .default compromise but I didn't figure out a better solution - see here: #1381 (comment)

@epoberezkin
Copy link
Member

It has indeed to be installed/available, but it should not be an issue, as Ajv would not be initialised / executed - only specific files.

Also, it depends on which keywords you are using - most keywords won't have any dependencies, specifically it is uniqueItems that depends on equal and minLength/maxLength that depends ucs2lenght plus any formats you might be using from ajv-formats would be required in "standalone" code.

@JavaScriptBach
Copy link

@epoberezkin would you consider adding an option to optimize codegen assuming all schemas are trusted?

For context, I work on a codebase with hundreds of jtd schemas. We chose to generate standalone validators because otherwise it would take tens of seconds to compile schemas on server startup time, which is not really reasonable. But generating standalone validators at build time is also tricky because it takes >1s to import ajv and compile a single validator.

By contrast, we write all our jtd schemas so they are all trusted, and it's unfortunate to have to take a big hit in compile time performance for no benefit. Let me say I appreciate all the work done to make this library performant when validating schemas; it would also be great to have compile-time performance considered of secondary (but not negligible) importance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants