Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prism-cli] Memory leak while validating #1881

Closed
nulltoken opened this issue Sep 8, 2021 · 13 comments
Closed

[prism-cli] Memory leak while validating #1881

nulltoken opened this issue Sep 8, 2021 · 13 comments

Comments

@nulltoken
Copy link
Contributor

Describe the bug

Following #1860 by @lukasz-kuzynski-11sigma we've been able to bump from Prism 4.1.2 to Prism 4.3.4.

However, some of servers started to restart due to Out of Memory issues. We've tracked down the issue to validateInput and validateOutput functions leaking memory.

To Reproduce

A complete standalone repro case is available at https://github.com/nulltoken/prism_repro

It triggers a loop of 1000 request and response validations, capturing the amount of used Heap every 10 steps, and triggering a garbage collection between every step.

Running yarn repro shows the following output where one can clearly see the heap growing.

$ yarn repro
yarn run v1.22.5
$ yarn node -r ts-node/register --unhandled-rejections=strict --expose-gc ./src/repro.ts
unknown format "JSON Web Token RFC 7519" ignored in schema at path "#/properties/authorization"
unknown format "JSON Web Token RFC 7519" ignored in schema at path "#/properties/authorization"
[...snipped for brevity...]
unknown format "JSON Web Token RFC 7519" ignored in schema at path "#/properties/authorization"
unknown format "JSON Web Token RFC 7519" ignored in schema at path "#/properties/authorization"

i	HeapUsed
0	114059736
10	113304496
20	113369808
30	113711088
[...snipped for brevity...]
930	131688256
940	131704024
950	131884168
960	132243528
970	132365232
980	132556000
990	132876080

Running prism v4.3.4
Done in 15.77s.

Expected behavior

No leak ? ;-)

/cc @chohmann

@nulltoken
Copy link
Contributor Author

@chohmann @radzserg 👋 Any chance this could get triaged, please? 🙏

@nulltoken
Copy link
Contributor Author

@radzserg @chohmann The leak might come from AJV. I've created a trimmed down version of the repro in ajv-validator/ajv#1763

As I'm FAR from being a Prism, nor AJV, nor schema expert, I'd greatly appreciate if you could help me there from the Prism use case standpoint.

@nulltoken
Copy link
Contributor Author

Answer from AJV author @ ajv-validator/ajv#1763 (comment)

compiling the “same” schema in the loop would create a new function every time because starting from v7 Ajv uses the schema itself as a key for Map, rather than it’s serialization.

A solution is to either make sure you are compiling schema once (that is outside of the loop), or at least the schema object itself is defined outside of the loop.

My understanding is that Prism is actually incorrectly leveraging AJV.

@nulltoken
Copy link
Contributor Author

My understanding is that Prism is actually incorrectly leveraging AJV.

I've taken a look at the code and may have an (hypothetical) idea of a (potential) approach.

Unless anyone is already working on a fix, I'll start digging my way through tomorrow.

@nulltoken nulltoken mentioned this issue Sep 15, 2021
7 tasks
@nulltoken
Copy link
Contributor Author

@radzserg Ok. I've got something that works and plugs the leak.

The code is currently in an "ugly as hell" format but works without making the CI cringe.

I'm now going to try to un-uglify it. Stay tuned!

@radzserg
Copy link
Contributor

cool 👀 👍

@radzserg radzserg self-assigned this Sep 16, 2021
@ghost
Copy link

ghost commented Sep 16, 2021

How about using WeakRef for caching validation function for schema and bundled schema and that way the solution will be fairly simple?

@nulltoken
Copy link
Contributor Author

@lukasz-kuzynski-11sigma Thanks for the heads up. However, AJV already implements a caching layer and I'm not sure we should add one of ours on top of it.

What I'm after is slightly different. I'm working to avoid "dynamically" re-generating the schema with every call to validateInput / validateOutput.

The moment we're invoking getHttpOperationsFromSpec, we've actually got ALL the required information to build the final JSON schemas (path, query, headers, body in and bodies out) per operation.

The idea is to generate those as part of getHttpOperationsFromSpec execution, store them back in the array of IHttpOperation and flow them back to every call path leading to validateAgainstSchema so that this function will always reuse the same instance of every schemas. The reuse of the same instance being what actually plugs the leak.

I've got something that already works on my box and will finish fixing up the tests tomorrow. I believe I should be able to show case something tomorrow evening so that you and @radzserg have something to poke at.

@ghost
Copy link

ghost commented Sep 17, 2021

Makes sense my friend. Fingers crossed and happy to review your changes.

@radzserg
Copy link
Contributor

radzserg commented Sep 17, 2021

@lukasz-kuzynski-11sigma did you mean WeakMap. Can you add more details? How exactly do you propose to use it?

@nulltoken FYI
I tried to use simple caching and it works fine.

// packages/http/src/validator/validators/utils.ts

const validationFunctionsCache: { [key: string]: ValidateFunction } = {};
function getValidationFunction(
  schemaName: string,
  coerce: boolean,
  schema: JSONSchema,
  bundle?: unknown
): ValidateFunction {
  const validationSchema = {
    ...schema,
    __bundled__: bundle,
  };
  const key = `${schemaName}_${coerce}_` + JSON.stringify(validationSchema);
  if (!validationFunctionsCache[key]) {
    validationFunctionsCache[key] = assignAjvInstance(String(schema.$schema), coerce).compile(validationSchema);
  }
  return validationFunctionsCache[key];
}

// and then 

export const validateAgainstSchema = (
  value: unknown,
  schema: JSONSchema,
  coerce: boolean,
  prefix?: string,
  bundle?: unknown
): O.Option<NonEmptyArray<IPrismDiagnostic>> => {
  return pipe(
    O.tryCatch(() => getValidationFunction(String(schema.$schema), coerce, schema, bundle)),
    // O.tryCatch(() =>
    //   assignAjvInstance(String(schema.$schema), coerce).compile({
    //     ...schema,
    //     __bundled__: bundle,
    //   })
    // ),
    O.chainFirst(validateFn => O.tryCatch(() => validateFn(value))),

I've updated your repro test to 10K and it looks good

Screenshot 2021-09-17 at 12.55.55.png

Looking forward for your final solution

@radzserg
Copy link
Contributor

@nulltoken Although on bigger numbers I still see leaking with my quick fix 🤔 ... Your approach beats mine

@ghost
Copy link

ghost commented Sep 17, 2021

I've came with an idea to cache schema through WeakMap but @nulltoken is absolutely right here. We're recreating json schema object many time therefore the only kind of cache that might work is the one introduced by @radzserg.

Maybe I'll come with a little bit different solution soon...

@ghost
Copy link

ghost commented Sep 17, 2021

Ok I got this guys... #1892

Let me know what you think and I'll polish the solution.

This was referenced Apr 29, 2024
This was referenced Jul 2, 2024
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants