Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump AJV to v8 #713

Merged
merged 44 commits into from
May 29, 2022
Merged

Bump AJV to v8 #713

merged 44 commits into from
May 29, 2022

Conversation

JacobLey
Copy link
Collaborator

@JacobLey JacobLey commented Mar 19, 2022

Partially resolves #573 + #582

Unblocks path to 3.1 support

Appears to be fully backwards compatible (see notes about SerDes + Validation errors).

Happy to discuss changes made/justifications. I would also like to append a prettier commit to the end of this, but omitting for now to prevent noise. Perhaps once approved or as a follow on PR

@gitguardian
Copy link

gitguardian bot commented Mar 19, 2022

⚠️ GitGuardian has uncovered 6 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id Secret Commit Filename
- Generic High Entropy Secret 95f635c test/resources/boba.yaml View secret
- Generic High Entropy Secret 95f635c test/resources/boba.yaml View secret
- Generic High Entropy Secret 95f635c test/resources/boba.yaml View secret
- Generic High Entropy Secret 7f13a14 test/resources/boba.yaml View secret
- Generic High Entropy Secret 7f13a14 test/resources/boba.yaml View secret
- Generic High Entropy Secret 7f13a14 test/resources/boba.yaml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

allErrors: true,
meta: draftSchema,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implied by ajv-draft-04

ajv.removeKeyword('propertyNames');
ajv.removeKeyword('contains');
ajv.removeKeyword('const');
ajv.addKeyword({
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any of these keywords without explicit validation logic will auto-resolve to true.

path and components are added because it is passed to AJV as a schema in places like this: https://github.com/cdimascio/express-openapi-validator/blob/master/src/middlewares/openapi.request.validator.ts#L274

if (sch) {
return function validate(data, path, obj, propName) {
if (sch.kind === 'res') {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See handleSerDes, this logic ensures that the deserializer (and equivelent serializer for res) will only run on proper schema.

Also considered splitting keyword into two forms x-eov-req-serdes and x-eov-res-serdes.

}, {}),
serDesMap: serDesMap,
validateFormats: !!validateFormats,
formats: <Exclude<typeof formats, unknown[]>>formats,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guaranteed to be {}, see normalizeOptions

Perhaps would benefit from a NormalizedOptions interface that extends Options?

return acc;
}, {}),
serDesMap: serDesMap,
validateFormats: !!validateFormats,
Copy link
Collaborator Author

@JacobLey JacobLey Mar 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast + full will be truthy, then handled via addFormats (see ajvFormatsMode)

const v = new Ajv(options);
v.addMetaSchema(draftSchema);
const v = new AjvDraft4(options);
addFormats(v, ['email', 'regex', 'uri', 'uri-reference']);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the only formats referenced in spec. Can append all if desired but figured it wasn't necessary?

type?: 'number' | 'string';
validate: (v: any) => boolean;
};
export type Format =
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type is necessary for AJV.

Functions typed with any are still valid (because that is how typescript works) but otherwise type must match.

Not sure if this is considered breaking... if in doubt could default the value to string and log a warning?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually AJV allows passing options like { foo: x => true } (see formats alwaysTrue so could use that in the case of missing type

/**
* @deprecated
* Use `formats` + `validateFormats` to ignore specified formats
*/
unknownFormats?: true | string[] | 'ignore';
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unknownFormats?: true | string[] | 'ignore';
serDes?: SerDes[];
formats?: Format[];
formats?: Format[] | Record<string, ajv.Format>;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This updated type better matches AJV's expected input.

I recommend deprecating the array version

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agreed. array version can be deprecated. can you also add this to the wiki

(note i moved the doc from README to wiki)

fileUploader?: boolean | multer.Options;
multerOpts?: multer.Options;
$refParser?: {
mode: 'bundle' | 'dereference';
};
operationHandlers?: false | string | OperationHandlerOptions;
validateFormats?: false | 'fast' | 'full';
validateFormats?: boolean | 'fast' | 'full';
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast and full should be deprecated...
perhaps a new option instead addFormatsOptions?

@@ -191,7 +191,7 @@ export class RequestValidator {
} else {
throw new BadRequest({
path: req.path,
message: `'${property}' should be equal to one of the allowed values: ${options
message: `'${property}' must be equal to one of the allowed values: ${options
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't consider should -> must a breaking change.
Although multiple tests had to be updated to accommodate

The gist of the error is the same, and is "more correct" as MUST implies it is required, vs SHOULD implies recommendation (but not necessarily enforced)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@@ -348,16 +352,71 @@ export class SchemaPreprocessor {

private handleSerDes(
parent: SchemaObject,
schema: SchemaObject,
schema: SchemaObject & { _serDesInternal?: boolean },
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function got a lot bigger... but it was the only way I was able to maintain backwards compatibility.

https://ajv.js.org/guide/modifying-data.html

The gist of the problem is that the x-eov-serdes modifies data. And AJV does not make any guarantees about the order of validations, with the exception of composite types like allOf

So the previous version of this function might accidentally skip validations like format or pattern because the string -> object deserialization happens first. So this new schema ensures that those validations happen, THEN deserialisation happens for requests (and vice versa for responses)

The second issue is that while the handleSerDes gets called for both req+res, it gets passed the same schema object. So any manipulation affects both. So x-eov-serdes handling was updated to always reject when kind does not match.

The nullable handling was added because AJV requires a type keyword alongside (which was removed from base schema). OpenAPI doesn't allow type: 'null' but AJV does just fine.

Biggest issue with this is that validation errors are much noisier. As there are messages for each oneOf failure, as well as oneOf itself.

So apparently AJV _does_ have some ability to enforce keyword ordering via `before`/`post`! Using those options, serdes schema gets a lot simpler and has more trivial error redacting
@@ -356,8 +392,20 @@ export class SchemaPreprocessor {
!!schema.format &&
this.serDesMap[schema.format]
) {
(<any>schema).type = [this.serDesMap[schema.format].jsonType || 'object', 'string'];
schema['x-eov-serdes'] = this.serDesMap[schema.format];
const serDes = this.serDesMap[schema.format];
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another serdes refactor... pretty sure that is the last time!

So the original reasoning for my complicated schema was I thought AJV did not provide any way to enforce keyword ordering (as internal keywords do not have "guaranteed" order). But custom keywords have some capability via before and post!

So the final schema looks a lot more like the original, simpler version. Still some small handling had to be done for type + nullable but not nearly as gross as before.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great news, Nice find

Copy link
Owner

@cdimascio cdimascio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good. i'm considering merging shortly. is there anything else worth considering prior to doing so?

@JacobLey
Copy link
Collaborator Author

JacobLey commented Apr 24, 2022

I don't have anything additional to add to this PR, unless someone sees something in the PR.

npm test passes for me locally so I think it is functional... Happy to see it merged!

I know you mentioned possibly doing a major version bump just out of caution. I'll continue to defer to you to that as it isn't strictly necessary, but might be a good chance to call out "lots of internal stuff changed, use some caution". Might still be good to wait until 3.1 support to actually implement the major bump (if necessary)

@cdimascio cdimascio merged commit 2b27332 into cdimascio:master May 29, 2022
@cdimascio
Copy link
Owner

cdimascio commented May 29, 2022

thanks @JacobLey

the PR is now merged, you can try it out using:

npm install [email protected]

will make version this the currentrelease version after a bit of soak time.

@cdimascio
Copy link
Owner

@JacobLey i created a new branch - v4.14.0-beta to host this code. i'm seeing a potential performance issue. tests are running a few seconds slower.

@cdimascio
Copy link
Owner

i don't see any obvious change that would cause the slowdown. clearly the internals have changed extensively with ajv8 and the external schema. that said, my expectation is that something else is at play.

the test are approximately twice as slow with these changes (v4.14.0-beta) to compared to 4.13.8 (on branch mainline)

branches:

  • v4.14.0-beta (new ajv8 code)
  • master (new ajv8 code -- currently equivalent to the v4.14.0-beta branch)
  • mainline (ajv6 code, 4.13.8)

@cdimascio
Copy link
Owner

@JacobLey any thoughts on what might be the cause?

@JacobLey
Copy link
Collaborator Author

I don't think any of the code I wrote should have significant performance implications... perhaps some of the custom keyword/serdes logic but that ended up relatively unchanged

So I suspect AJV v8 is the culprit, and the internal compilation/validation logic is somehow slower.

How are you benchmarking these comparisons? Do you have a particular tool? Or just checking out the branch and seeing how long tests take to execute?

Are there particularly slow tests? Most of the example schemas are pretty simple... 2x is definitely unexpectedly poor performance from AJV. It might be good to know if the performace comes from schema "compilation" (happens once at first request, performance only important to something like Lambda users) and "validation" (happens every request, important to everyone)

Unfortunately I can't find a good existing benchmark between AJV versions, only AJV and other validators.

@cdimascio
Copy link
Owner

I haven’t dug in too deeply, thus far the it’s effectively human observation, that is I notice the mainline tests e.g. 4.13.8 complete in about 5 seconds (on my MacBook Pro) while the v4.14.0-beta.2 branch tests run in about 9 seconds.

@JacobLey
Copy link
Collaborator Author

An attempt to put some hard numbers here... Adds a "listener" to test suites to track time, then reports at end (very hacky, some tests fail because they expect callbacks but those are acceptable casualities)

https://github.com/JacobLey/express-openapi-validator/tree/compile-v-validate

Hoping to:

  1. Show time elapse differences
  2. Highlight compile vs validate performance

Executed on master (AJV v8)

TOTAL SETUP TIME 106
{ compiles: 6904, validates: 2237, diff: 4667 }

Executed on mainline (AJV v6)

TOTAL SETUP TIME 103
{ compiles: 4378, validates: 1701, diff: 2677 }

Every run comes in a bit different... but those seem like good ballpark numbers. Assuming that added tests between two branches is negligable... It appears compiling is ~75% slower on v8, and validation is ~20%

@cdimascio
Copy link
Owner

cdimascio commented Jun 4, 2022

Some info on ajv8 perf
ajv-validator/ajv#1386

and more info on migration to 8

@cdimascio
Copy link
Owner

Compiles are slower in ajv8. We could offer an option to disable optimization which should gain back 30% at compile time, express openapi validator runs the compile step once when the middleware is initialized. Hence this is not an issue for long lived apis, particularly since ajv8 provides improved safety

It may also be worth investigating stand-alone mode

@cdimascio
Copy link
Owner

Finally we should verify that the two validate ops are still just as performant https://github.com/cdimascio/express-openapi-validator/blob/master/src/middlewares/openapi.request.validator.ts#L174

this way we know that per req latency is just as good (or perhaps better)

@JacobLey
Copy link
Collaborator Author

JacobLey commented Jun 5, 2022

Standalone mode is very interesting... Would require some extra build step, but should hopefully solve most of that lambda startup time. Would probably have some sort of CLI/method that takes the OpenAPI document, and writes a file with path + method mapping to a validation function, which is then loaded by the live server.

Disabling optimization sounds like a decent compromise for lambda/cloud function users that really care about the time-to-first-validation (should be opt-in).

I tried to benchmark the per-request validation-only performance above. I was not incredibly clear, just trying to get it knocked out quickly. That was the validates property which did see something like a ~20% slowdown. Turning off optimization would probably be worse.

I think long term we do want to adopt Ajv V8 (and future major versions). Not upgrading due to performance concerns is probably not a viable solution. Ajv still generally boasts the fastest validation times, so not any great alternatives (to my knowledge).

So this might justify putting this in a major version (already considered due to changes) bump, so we can warn that performance might take a hit, which to some might be seen as a breaking change.

@nd-jharn
Copy link

Any progress here? Will you be releasing this code as an update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenAPI 3.1 support
5 participants