Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Add model cache to speed up code generation #7250

Merged
merged 19 commits into from
Sep 21, 2020

Conversation

sebastien-rosset
Copy link
Contributor

@sebastien-rosset sebastien-rosset commented Aug 19, 2020

This is a performance enhancement. The code generation behavior does not change:

  1. Create a cache for StringUtils.escape()
  2. Create a cache from named model to CodegenModel. Used by fromModel()
  3. Create a cache from named property to CodegenProperty. Used by fromProperty()
  4. Create a cache from language-specific model name to Shema.
  5. Pre-compile regex patterns instead of calling String.replaceAll().
  6. Other minor optimizations

I've noticed for a large OpenAPI doc, the code generation was taking more than 3 minutes and 47% of all the CPU processing time was spent invoking the findFirst() method in PythonClientExperimentalCodegen.java (as shown below). This PR creates a map from toModelName() to schema such that toModelName() does not have to be called many time repeatedly and the Schema map does not have to be traversed repeatedly.

 Optional<Schema> referencedSchema = ModelUtils.getSchemas(openAPI).entrySet().stream()
                .filter(entry -> Objects.equals(varDataType, toModelName(entry.getKey())))
                .map(Map.Entry::getValue)
                .findFirst();

https://github.com/OpenAPITools/openapi-generator/blob/master/modules/openapi-generator/src/main/java/org/openapitools/codegen/languages/PythonClientExperimentalCodegen.java#L453

PR checklist

  • Read the contribution guidelines.
  • Pull Request title clearly describes the work in the pull request and Pull Request description provides details about how to validate the work. Missing information here may result in delayed response from the community.
  • If contributing template-only or documentation-only changes which will change sample output, build the project beforehand.
  • Run the shell script ./bin/generate-samples.shto update all Petstore samples related to your fix. This is important, as CI jobs will verify all generator outputs of your HEAD commit as it would merge with master. These must match the expectations made by your contribution. You may regenerate an individual generator by passing the relevant config(s) as an argument to the script, for example ./bin/generate-samples.sh bin/configs/java*. For Windows users, please run the script in Git BASH.
  • File the PR against the correct branch: master
  • Copy the technical committee to review the pull request if your PR is targeting a particular programming language.

@sebastien-rosset sebastien-rosset changed the title [codegen] Add model cache [codegen] Add model cache to speed up code generation Aug 19, 2020
@sebastien-rosset
Copy link
Contributor Author

sebastien-rosset commented Aug 19, 2020

@spacether @wing328 @jimschubert
cc @OpenAPITools/generator-core-team

@sebastien-rosset
Copy link
Contributor Author

sebastien-rosset commented Aug 20, 2020

@spacether, interesting observation: this PR is not supposed to change the code generation, but it does. I took a look at the generated python-experimental code. The generated code for the fake API is changed as follows:

diff --git a/samples/openapi3/client/petstore/python-experimental/petstore_api/api/fake_api.py b/samples/openapi3/client/petstore/python-experimental/petstore_api/api/fake_api.py
index 40f7ef9a82..1fa17d289b 100644
--- a/samples/openapi3/client/petstore/python-experimental/petstore_api/api/fake_api.py
+++ b/samples/openapi3/client/petstore/python-experimental/petstore_api/api/fake_api.py
@@ -1549,7 +1549,7 @@ class FakeApi(object):
                 string (str): None. [optional]
                 binary (file_type): None. [optional]
                 date (date): None. [optional]
-                date_time (datetime): None. [optional]
+                date_time (datetime): None. [optional] if omitted the server will use the default value of dateutil_parser('2010-02-01T10:20:10.11111+01:00')
                 password (str): None. [optional]
                 param_callback (str): None. [optional]
                 _return_http_data_only (bool): response data without head status
Perform git status

This file and the specific change comes from the {{#defaultValue}} tag in the api.mustache template. When you look at the input YAML schema for this test, the date_time python argument is generated from the dateTime property under the testEndpointParameters:

                dateTime:
                  description: None
                  type: string
                  format: date-time
                  default: '2010-02-01T10:20:10.11111+01:00'
                  example: '2020-02-02T20:20:20.22222Z'

The dateTime property does have the default keyword. So I think the code generated prior to this PR was actually incorrect, the default value should have been generated. I don't know why yet, but would you agree the newly generated code is correct?

The other python diffs are markdown changes for the same default value.

@spacether
Copy link
Contributor

spacether commented Aug 20, 2020

Yes, the new default is correct. My work in the PR that is blocked by this one showed that some of my old bad code in toDefault is setting schema values :( that's probably what is happening here. My guess is that it is caused by some bad mutation that python-experimental is doing. I am hitting this same issue in my PR also for this default value.

@sebastien-rosset
Copy link
Contributor Author

Yes, the new default is correct. My work in the PR that is blocked by this one showed that some of my old bad code in toDefault is setting schema values :( that's probably what is happening here. My guess is that it is caused by some bad mutation that python-experimental is doing. I am hitting this same issue in my PR also for this default value.

Thanks for the quick reply. One difference is the CodegenProperty fromProperty(String name, Schema p) method is repeatedly invoked with the same name and p, and now that method returns a cached value instead of recomputing and returning a different instance of the same data for every invocation. Maybe that has some side effect if previously some of these instances (but not all) were mutated.

@sebastien-rosset
Copy link
Contributor Author

sebastien-rosset commented Aug 20, 2020

The other diffs are in Haskell, and I don't know anything about this code generator. It's strange that the changes only occur in Haskell and python-experimental. My hunch is the caching enhancement uncovered a bug in the Haskell and python code generators, but I'll keep looking.

There are also several weird changes in the generated FILES with ./ being removed in the middle of the path.

-model/./apiResponse.ts
+model/apiResponse.ts

@sebastien-rosset sebastien-rosset marked this pull request as ready for review August 21, 2020 18:07
private static Cache<String, String> underscoreWordsCache;

// A cache of escaped words, used to optimize the performance of the escape() method.
private static Cache<EscapedNameOptions, String> escapedWordsCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do this. Word escaping is going to be very generator-specific, and this static cache will break functionality for any application where openapi-generator is "long-lived" (IDE plugins, our online generator, and other long-running services holding oag in memory).

I think we need to improve our model and the logic around it instead.

Copy link
Contributor Author

@sebastien-rosset sebastien-rosset Aug 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. What if the cache lifecycle is tied to the lifecycle of the generator, similar to the other caches? Would this work for the cases you mention?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the method again and I'm trying to think of scenarios where there would be conflicts across threads if all the parameters are the same, and I can't think of any. I think you're probably fine with your current implementation for memoizing the escape function. I thought I had noticed something when I commented yesterday that prevented this function from being pure, and I'm not seeing it today.

But because you asked about a tying to the generation, if we ever wanted to have per-run caching we could do something like the following (for something like a CodegenModel cache for $ref!):

We could create a context object with unique ID as a hash key (an entity in DDD-land), and use this as the Cache key or as a component of the type defining the cache key.

For instance:

Id

package org.openapitools.codegen.context;

import java.util.Objects;
import java.util.UUID;

public final class Id {
    private final UUID value;

    public Id() {
        this.value = UUID.randomUUID();
    }

    @Override
    public int hashCode() {
        return Objects.hash(getValue());
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Id id = (Id) o;
        return getValue().equals(id.getValue());
    }

    public UUID getValue() {
        return value;
    }
}

InvocationContext

package org.openapitools.codegen.context;

import java.util.Objects;
import java.util.UUID;

public final class InvocationContext {
    private final Id id;

    public InvocationContext() {
        this.id = new Id();
    }

    public UUID getIdentifier() {
        return this.id.getValue();
    }

    @Override
    public int hashCode() {
        // InvocationContext is an entity and should only maintain id in hashcode.
        return Objects.hash(id);
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        InvocationContext that = (InvocationContext) o;

        // InvocationContext is an entity and should only maintain id in equals.
        return id.equals(that.id);
    }
}

And then update GlobalSettings to provide this per-run specific identifiable object:

public class GlobalSettings {
    private static ThreadLocal<InvocationContext> context = new InheritableThreadLocal<InvocationContext>() {
        @Override
        protected InvocationContext initialValue() {
            return new InvocationContext();
        }
    };
    private static ThreadLocal<Properties> properties = new InheritableThreadLocal<Properties>() {
        @Override
        protected Properties initialValue() {
            return (Properties) System.getProperties().clone();
        }
    };

    public static String getProperty(String key, String defaultValue) {
        return properties.get().getProperty(key, defaultValue);
    }

    public static String getProperty(String key) {
        return properties.get().getProperty(key);
    }

    public static void setProperty(String key, String value) {
        properties.get().setProperty(key, value);
    }

    public static void clearProperty(String key) {
        properties.get().remove(key);
    }

    public static InvocationContext getContext() {
        return context.get();
    }

    public static void reset() {
        properties.remove();
        context.remove();
    }
}

Then, you'd just need to work out how you'd want to work with this in your cache. For your escape function, for example, you could change EscapedNameOptions to accept a UUID which is added to equals and hashcode, and then construct it with the invocation context's identifier. So,

EscapedNameOptions ns = new EscapedNameOptions(
    GlobalSettings.getContext().getIdentifier(),
    name, replacementMap.keySet(), charactersToAllow, appendToReplacement);

in Thread1 would be different than the same in Thread2.

I could see us using this for a really simple ref lookup like this as the key and CodegenModel as the value:

package org.openapitools.codegen.context;

import java.util.Objects;
import java.util.UUID;

public final class ContextualRef {
    private final UUID invocationId;
    private final String ref;

    public ContextualRef(UUID invocationId, String ref) {
        if (ref == null) throw new IllegalArgumentException("ref can't be null");
        this.invocationId = invocationId;
        // normalize/validate ref?
        this.ref = ref;
    }

    @Override
    public int hashCode() {
        return Objects.hash(invocationId, ref);
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        ContextualRef that = (ContextualRef) o;
        return invocationId.equals(that.invocationId) &&
                ref.equals(that.ref);
    }
}

@jimschubert jimschubert added this to the 5.0.0 milestone Sep 21, 2020
Copy link
Member

@jimschubert jimschubert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't for the life of me come up with an example of the concern I expressed previously. Let's roll with this. As always, thanks for the great contribution!

@jimschubert jimschubert changed the title [codegen] Add model cache to speed up code generation [core] Add model cache to speed up code generation Sep 21, 2020
@jimschubert jimschubert merged commit 4f27939 into OpenAPITools:master Sep 21, 2020
@spacether
Copy link
Contributor

Thanks @sebastien-rosset this is very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants