Skip to content

Bad CucumberExpression creation performance #200

@jkronegg

Description

@jkronegg

👓 What did you see?

On my project with about 150 stepdefs and about 400 test scenarios, the IntelliJ profiler says the CucumberExpression.<init> method takes 25.9% of the total CPU time. This is because the method is called for all step defs and for all test scenarios. I think the performance could be better.

✅ What did you expect to see?

I expect CucumberExpression.<init> to avoid unnecessary processing (contributes to #2035).

I understand that cucumber-java8 can introduce dynamic behavior which requires parsing the expressions for each test scenario. However, I think we can safely cache everything that is constant and does not depend on cucumber-java8. I identitifed the following performance improvement points in CucumberExpression:

  • TreeRegex creation: in CucumberExpression constructor, this object serves to get some "metadata" about a regular expression itself (i.e. not depending on context). Thus, two identical regular expressions will lead to the same TreeRegp, so the creation is cacheable.

    The original code:

    this.treeRegexp = new TreeRegexp(pattern);
    

    could be replaced by (treeRegexps is a static Map<String, TreeRegexp>):

    this.treeRegexp = treeRegexps.computeIfAbsent(pattern, TreeRegexp::new);
    
  • calls to escapeRegex in the rewriteToRegex method are done on the Node.text() content: two identical Node.text() will lead to the same escaped result, independently of the context. Thus, the result of escapeRegex is cacheable.

    The original code:

    return escapeRegex(node.text());
    

    can be replaced by (escapedTexts is a static Map<String, String>):

    return escapedTexts.computeIfAbsent(node.text(), CucumberExpression::escapeRegex);
    

These two optimization points lead to four combinations to be benchmarked (original version is createExpression0). The benchmark consists in creating 400 times five different expressions:

Benchmark cached calls to escapeRegex cached TreeRegex creation ops/s
CucumberExpressionBenchmark.createExpression0 no no 153,024 ± 13,800
CucumberExpressionBenchmark.createExpression1 yes no 181,960 ± 12,133
CucumberExpressionBenchmark.createExpression2 no yes 186,236 ± 11,232
CucumberExpressionBenchmark.createExpression3 yes yes 219,890 ± 12,365

Caching the TreeRegex creation lead to 22% performance improvement and using both methods lead to 44% performance improvement.

On a real project with about 150 stepdefs and 400 test scenarios, the IntelliJ Profiler runs is about 7700 ms and says that CucumberExpression.<init> is:

  • 25.9% of the total CPU time with the original version (1994 ms)
  • 15.7% of the total CPU time with both optimizations enabled (1209 ms, i.e. that's a 785 ms improvement on total time, or 10%)

I suggest to use the variant createExpression3 and I would be happy to propose a PR.

📦 Which tool/library version are you using?

Cucumber 7.10.1

🔬 How could we reproduce it?

The benchmark with the four variants is in
cucumberexpressions.zip

Steps to reproduce the behavior:

  1. Create a Maven project with the following dependencies:

     <dependency>
         <groupId>io.cucumber</groupId>
         <artifactId>cucumber-java</artifactId>
         <version>${cucumber.version}</version>
         <scope>test</scope>
     </dependency>
     <dependency>
         <groupId>io.cucumber</groupId>
         <artifactId>cucumber-junit-platform-engine</artifactId>
         <version>${cucumber.version}</version>
         <scope>test</scope>
     </dependency>
     <dependency>
         <groupId>io.cucumber</groupId>
         <artifactId>cucumber-picocontainer</artifactId>
         <version>${cucumber.version}</version>
         <scope>test</scope>
     </dependency>
     <dependency>
         <groupId>org.openjdk.jmh</groupId>
         <artifactId>jmh-generator-annprocess</artifactId>
         <version>1.36</version>
         <scope>test</scope>
     </dependency>
    
  2. Run the benchmark

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions