[core] Add model cache to speed up code generation #7250

sebastien-rosset · 2020-08-19T18:32:50Z

This is a performance enhancement. The code generation behavior does not change:

Create a cache for StringUtils.escape()
Create a cache from named model to CodegenModel. Used by fromModel()
Create a cache from named property to CodegenProperty. Used by fromProperty()
Create a cache from language-specific model name to Shema.
Pre-compile regex patterns instead of calling String.replaceAll().
Other minor optimizations

I've noticed for a large OpenAPI doc, the code generation was taking more than 3 minutes and 47% of all the CPU processing time was spent invoking the findFirst() method in PythonClientExperimentalCodegen.java (as shown below). This PR creates a map from toModelName() to schema such that toModelName() does not have to be called many time repeatedly and the Schema map does not have to be traversed repeatedly.

 Optional<Schema> referencedSchema = ModelUtils.getSchemas(openAPI).entrySet().stream()
                .filter(entry -> Objects.equals(varDataType, toModelName(entry.getKey())))
                .map(Map.Entry::getValue)
                .findFirst();

https://github.com/OpenAPITools/openapi-generator/blob/master/modules/openapi-generator/src/main/java/org/openapitools/codegen/languages/PythonClientExperimentalCodegen.java#L453

PR checklist

Read the contribution guidelines.
Pull Request title clearly describes the work in the pull request and Pull Request description provides details about how to validate the work. Missing information here may result in delayed response from the community.
If contributing template-only or documentation-only changes which will change sample output, build the project beforehand.
Run the shell script ./bin/generate-samples.shto update all Petstore samples related to your fix. This is important, as CI jobs will verify all generator outputs of your HEAD commit as it would merge with master. These must match the expectations made by your contribution. You may regenerate an individual generator by passing the relevant config(s) as an argument to the script, for example ./bin/generate-samples.sh bin/configs/java*. For Windows users, please run the script in Git BASH.
File the PR against the correct branch: master
Copy the technical committee to review the pull request if your PR is targeting a particular programming language.

Merge upstream fixes

…rator

…dd-model-cache

…rator

…dd-model-cache

sebastien-rosset · 2020-08-19T18:34:48Z

@spacether @wing328 @jimschubert
cc @OpenAPITools/generator-core-team

…dd-model-cache

sebastien-rosset · 2020-08-20T01:18:10Z

@spacether, interesting observation: this PR is not supposed to change the code generation, but it does. I took a look at the generated python-experimental code. The generated code for the fake API is changed as follows:

diff --git a/samples/openapi3/client/petstore/python-experimental/petstore_api/api/fake_api.py b/samples/openapi3/client/petstore/python-experimental/petstore_api/api/fake_api.py
index 40f7ef9a82..1fa17d289b 100644
--- a/samples/openapi3/client/petstore/python-experimental/petstore_api/api/fake_api.py
+++ b/samples/openapi3/client/petstore/python-experimental/petstore_api/api/fake_api.py
@@ -1549,7 +1549,7 @@ class FakeApi(object):
                 string (str): None. [optional]
                 binary (file_type): None. [optional]
                 date (date): None. [optional]
-                date_time (datetime): None. [optional]
+                date_time (datetime): None. [optional] if omitted the server will use the default value of dateutil_parser('2010-02-01T10:20:10.11111+01:00')
                 password (str): None. [optional]
                 param_callback (str): None. [optional]
                 _return_http_data_only (bool): response data without head status
Perform git status

This file and the specific change comes from the {{#defaultValue}} tag in the api.mustache template. When you look at the input YAML schema for this test, the date_time python argument is generated from the dateTime property under the testEndpointParameters:

                dateTime:
                  description: None
                  type: string
                  format: date-time
                  default: '2010-02-01T10:20:10.11111+01:00'
                  example: '2020-02-02T20:20:20.22222Z'

The dateTime property does have the default keyword. So I think the code generated prior to this PR was actually incorrect, the default value should have been generated. I don't know why yet, but would you agree the newly generated code is correct?

The other python diffs are markdown changes for the same default value.

spacether · 2020-08-20T01:28:10Z

Yes, the new default is correct. My work in the PR that is blocked by this one showed that some of my old bad code in toDefault is setting schema values :( that's probably what is happening here. My guess is that it is caused by some bad mutation that python-experimental is doing. I am hitting this same issue in my PR also for this default value.

sebastien-rosset · 2020-08-20T01:33:58Z

Yes, the new default is correct. My work in the PR that is blocked by this one showed that some of my old bad code in toDefault is setting schema values :( that's probably what is happening here. My guess is that it is caused by some bad mutation that python-experimental is doing. I am hitting this same issue in my PR also for this default value.

Thanks for the quick reply. One difference is the CodegenProperty fromProperty(String name, Schema p) method is repeatedly invoked with the same name and p, and now that method returns a cached value instead of recomputing and returning a different instance of the same data for every invocation. Maybe that has some side effect if previously some of these instances (but not all) were mutated.

sebastien-rosset · 2020-08-20T01:37:05Z

The other diffs are in Haskell, and I don't know anything about this code generator. It's strange that the changes only occur in Haskell and python-experimental. My hunch is the caching enhancement uncovered a bug in the Haskell and python code generators, but I'll keep looking.

There are also several weird changes in the generated FILES with ./ being removed in the middle of the path.

-model/./apiResponse.ts
+model/apiResponse.ts

…dd-model-cache

…rator into add-model-cache

jimschubert · 2020-08-23T04:17:01Z

modules/openapi-generator/src/main/java/org/openapitools/codegen/utils/StringUtils.java

+    private static Cache<String, String> underscoreWordsCache;
+
+    // A cache of escaped words, used to optimize the performance of the escape() method.
+    private static Cache<EscapedNameOptions, String> escapedWordsCache;


I don't think we can do this. Word escaping is going to be very generator-specific, and this static cache will break functionality for any application where openapi-generator is "long-lived" (IDE plugins, our online generator, and other long-running services holding oag in memory).

I think we need to improve our model and the logic around it instead.

thanks. What if the cache lifecycle is tied to the lifecycle of the generator, similar to the other caches? Would this work for the cases you mention?

I'm looking at the method again and I'm trying to think of scenarios where there would be conflicts across threads if all the parameters are the same, and I can't think of any. I think you're probably fine with your current implementation for memoizing the escape function. I thought I had noticed something when I commented yesterday that prevented this function from being pure, and I'm not seeing it today.

But because you asked about a tying to the generation, if we ever wanted to have per-run caching we could do something like the following (for something like a CodegenModel cache for $ref!):

We could create a context object with unique ID as a hash key (an entity in DDD-land), and use this as the Cache key or as a component of the type defining the cache key.

For instance:

Id

package org.openapitools.codegen.context; import java.util.Objects; import java.util.UUID; public final class Id { private final UUID value; public Id() { this.value = UUID.randomUUID(); } @Override public int hashCode() { return Objects.hash(getValue()); } @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; Id id = (Id) o; return getValue().equals(id.getValue()); } public UUID getValue() { return value; } }

InvocationContext

package org.openapitools.codegen.context; import java.util.Objects; import java.util.UUID; public final class InvocationContext { private final Id id; public InvocationContext() { this.id = new Id(); } public UUID getIdentifier() { return this.id.getValue(); } @Override public int hashCode() { // InvocationContext is an entity and should only maintain id in hashcode. return Objects.hash(id); } @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; InvocationContext that = (InvocationContext) o; // InvocationContext is an entity and should only maintain id in equals. return id.equals(that.id); } }

And then update GlobalSettings to provide this per-run specific identifiable object:

public class GlobalSettings { private static ThreadLocal<InvocationContext> context = new InheritableThreadLocal<InvocationContext>() { @Override protected InvocationContext initialValue() { return new InvocationContext(); } }; private static ThreadLocal<Properties> properties = new InheritableThreadLocal<Properties>() { @Override protected Properties initialValue() { return (Properties) System.getProperties().clone(); } }; public static String getProperty(String key, String defaultValue) { return properties.get().getProperty(key, defaultValue); } public static String getProperty(String key) { return properties.get().getProperty(key); } public static void setProperty(String key, String value) { properties.get().setProperty(key, value); } public static void clearProperty(String key) { properties.get().remove(key); } public static InvocationContext getContext() { return context.get(); } public static void reset() { properties.remove(); context.remove(); } }

Then, you'd just need to work out how you'd want to work with this in your cache. For your escape function, for example, you could change EscapedNameOptions to accept a UUID which is added to equals and hashcode, and then construct it with the invocation context's identifier. So,

EscapedNameOptions ns = new EscapedNameOptions( GlobalSettings.getContext().getIdentifier(), name, replacementMap.keySet(), charactersToAllow, appendToReplacement);

in Thread1 would be different than the same in Thread2.

I could see us using this for a really simple ref lookup like this as the key and CodegenModel as the value:

package org.openapitools.codegen.context; import java.util.Objects; import java.util.UUID; public final class ContextualRef { private final UUID invocationId; private final String ref; public ContextualRef(UUID invocationId, String ref) { if (ref == null) throw new IllegalArgumentException("ref can't be null"); this.invocationId = invocationId; // normalize/validate ref? this.ref = ref; } @Override public int hashCode() { return Objects.hash(invocationId, ref); } @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; ContextualRef that = (ContextualRef) o; return invocationId.equals(that.invocationId) && ref.equals(that.ref); } }

jimschubert

I can't for the life of me come up with an example of the concern I expressed previously. Let's roll with this. As always, thanks for the great contribution!

spacether · 2020-09-21T05:49:48Z

Thanks @sebastien-rosset this is very helpful!

sebastien-rosset added 14 commits August 3, 2020 08:04

Merge pull request #24 from OpenAPITools/master

b59960e

Merge upstream fixes

Merge branch 'master' of https://github.com/OpenAPITools/openapi-gene…

df4e5ca

…rator

Merge branch 'master' of https://github.com/OpenAPITools/openapi-gene…

9a23d90

…rator

Merge branch 'master' of https://github.com/OpenAPITools/openapi-gene…

0974336

…rator

Merge branch 'master' of https://github.com/OpenAPITools/openapi-gene…

db1f359

…rator

Add modelName to schema cache.

4bfd89a

Add modelName to schema cache.

11f5235

add cache optimizations

6d8c804

Merge branch 'master' of https://github.com/OpenAPITools/openapi-gene…

6a4ec04

…rator

Merge branch 'master' of github.com:CiscoM31/openapi-generator into a…

d669621

…dd-model-cache

add cache optimizations

f5f451e

remove unused variable

9da4900

Merge branch 'master' of https://github.com/OpenAPITools/openapi-gene…

b817a48

…rator

Merge branch 'master' of github.com:CiscoM31/openapi-generator into a…

66a0aef

…dd-model-cache

sebastien-rosset changed the title ~~[codegen] Add model cache~~ [codegen] Add model cache to speed up code generation Aug 19, 2020

Merge branch 'master' of github.com:CiscoM31/openapi-generator into a…

863c04f

…dd-model-cache

sebastien-rosset added 3 commits August 20, 2020 07:17

Merge branch 'master' of github.com:CiscoM31/openapi-generator into a…

7ae460a

…dd-model-cache

improve documentation

62759c3

cache optimizations

a3ac138

sebastien-rosset marked this pull request as ready for review August 21, 2020 18:07

Merge branch 'master' of https://github.com/OpenAPITools/openapi-gene…

d9b47c1

…rator into add-model-cache

sebastien-rosset closed this Aug 22, 2020

sebastien-rosset reopened this Aug 22, 2020

jimschubert reviewed Aug 23, 2020

View reviewed changes

jimschubert added this to the 5.0.0 milestone Sep 21, 2020

jimschubert added the Enhancement: Performance label Sep 21, 2020

jimschubert approved these changes Sep 21, 2020

View reviewed changes

jimschubert changed the title ~~[codegen] Add model cache to speed up code generation~~ [core] Add model cache to speed up code generation Sep 21, 2020

jimschubert merged commit 4f27939 into OpenAPITools:master Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Add model cache to speed up code generation #7250

[core] Add model cache to speed up code generation #7250

sebastien-rosset commented Aug 19, 2020 •

edited

Loading

sebastien-rosset commented Aug 19, 2020 •

edited

Loading

sebastien-rosset commented Aug 20, 2020 •

edited

Loading

spacether commented Aug 20, 2020 •

edited

Loading

sebastien-rosset commented Aug 20, 2020

sebastien-rosset commented Aug 20, 2020 •

edited

Loading

jimschubert Aug 23, 2020

sebastien-rosset Aug 23, 2020 •

edited

Loading

jimschubert Aug 24, 2020

jimschubert left a comment

spacether commented Sep 21, 2020

[core] Add model cache to speed up code generation #7250

[core] Add model cache to speed up code generation #7250

Conversation

sebastien-rosset commented Aug 19, 2020 • edited Loading

PR checklist

sebastien-rosset commented Aug 19, 2020 • edited Loading

sebastien-rosset commented Aug 20, 2020 • edited Loading

spacether commented Aug 20, 2020 • edited Loading

sebastien-rosset commented Aug 20, 2020

sebastien-rosset commented Aug 20, 2020 • edited Loading

jimschubert Aug 23, 2020

Choose a reason for hiding this comment

sebastien-rosset Aug 23, 2020 • edited Loading

Choose a reason for hiding this comment

jimschubert Aug 24, 2020

Choose a reason for hiding this comment

Id

InvocationContext

jimschubert left a comment

Choose a reason for hiding this comment

spacether commented Sep 21, 2020

sebastien-rosset commented Aug 19, 2020 •

edited

Loading

sebastien-rosset commented Aug 19, 2020 •

edited

Loading

sebastien-rosset commented Aug 20, 2020 •

edited

Loading

spacether commented Aug 20, 2020 •

edited

Loading

sebastien-rosset commented Aug 20, 2020 •

edited

Loading

sebastien-rosset Aug 23, 2020 •

edited

Loading