Bad CucumberExpression creation performance

### 👓 What did you see?

On my project with about 150 stepdefs and about 400 test scenarios, the IntelliJ profiler says the `CucumberExpression.<init>` method takes 25.9% of the total CPU time. This is because the method is called for all step defs and for all test scenarios. I think the performance could be better.

### ✅ What did you expect to see?

I expect  `CucumberExpression.<init>`  to avoid unnecessary processing (contributes to [#2035](https://github.com/cucumber/cucumber-jvm/issues/2035)).


I understand that `cucumber-java8` can introduce dynamic behavior which requires parsing the expressions for each test scenario. However, I think we can safely cache everything that is constant and does not depend on `cucumber-java8`. I identitifed the following performance improvement points in `CucumberExpression`:

- `TreeRegex` creation: in `CucumberExpression` constructor, this object serves to get some "metadata" about a regular expression itself (i.e. not depending on context). Thus, two identical regular expressions will lead to the same `TreeRegp`, so the creation is cacheable.

  The original code:

      this.treeRegexp = new TreeRegexp(pattern);

  could be replaced by (`treeRegexps` is a static `Map<String, TreeRegexp>`):

      this.treeRegexp = treeRegexps.computeIfAbsent(pattern, TreeRegexp::new);
 
- calls to `escapeRegex` in the `rewriteToRegex` method are done on the `Node.text()` content: two identical `Node.text()` will lead to the same escaped result, independently of the context. Thus, the result of `escapeRegex` is cacheable.

  The original code:

      return escapeRegex(node.text());

  can be replaced by (`escapedTexts` is a static `Map<String, String>`):

      return escapedTexts.computeIfAbsent(node.text(), CucumberExpression::escapeRegex);

These two optimization points lead to four combinations to be benchmarked (original version is `createExpression0`). The benchmark consists in creating 400 times five different expressions: 

| Benchmark                                     | cached calls to escapeRegex | cached TreeRegex creation | ops/s           |
|-----------------------------------------------|-----------------------------|---------------------------|-----------------|
| CucumberExpressionBenchmark.createExpression0 | no                          | no                        | 153,024 ± 13,800 |
| CucumberExpressionBenchmark.createExpression1 | yes                         | no                        | 181,960 ± 12,133 |
| CucumberExpressionBenchmark.createExpression2 | no                          | yes                       | 186,236 ± 11,232 |
| CucumberExpressionBenchmark.createExpression3 | yes                         | yes                       | 219,890 ± 12,365 |

Caching the `TreeRegex` creation lead to 22% performance improvement and using both methods lead to 44% performance improvement.

On a real project with about 150 stepdefs and 400 test scenarios, the IntelliJ Profiler runs is about 7700 ms and says that `CucumberExpression.<init>` is:
- 25.9% of the total CPU time with the original version (1994 ms)
- 15.7% of the total CPU time with both optimizations enabled (1209 ms, i.e. that's a 785 ms improvement on total time, or 10%)

I suggest to use the variant createExpression3 and I would be happy to propose a PR.

### 📦 Which tool/library version are you using?

Cucumber 7.10.1

### 🔬 How could we reproduce it?

The benchmark with the four variants is in 
[cucumberexpressions.zip](https://github.com/cucumber/cucumber-expressions/files/10313102/cucumberexpressions.zip)

Steps to reproduce the behavior:
1. Create a Maven project with the following dependencies:

        <dependency>
            <groupId>io.cucumber</groupId>
            <artifactId>cucumber-java</artifactId>
            <version>${cucumber.version}</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>io.cucumber</groupId>
            <artifactId>cucumber-junit-platform-engine</artifactId>
            <version>${cucumber.version}</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>io.cucumber</groupId>
            <artifactId>cucumber-picocontainer</artifactId>
            <version>${cucumber.version}</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.openjdk.jmh</groupId>
            <artifactId>jmh-generator-annprocess</artifactId>
            <version>1.36</version>
            <scope>test</scope>
        </dependency>

3. Run the benchmark


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bad CucumberExpression creation performance #200

👓 What did you see?

✅ What did you expect to see?

📦 Which tool/library version are you using?

🔬 How could we reproduce it?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark	cached calls to escapeRegex	cached TreeRegex creation	ops/s
CucumberExpressionBenchmark.createExpression0	no	no	153,024 ± 13,800
CucumberExpressionBenchmark.createExpression1	yes	no	181,960 ± 12,133
CucumberExpressionBenchmark.createExpression2	no	yes	186,236 ± 11,232
CucumberExpressionBenchmark.createExpression3	yes	yes	219,890 ± 12,365

Uh oh!

Bad CucumberExpression creation performance #200

Description

👓 What did you see?

✅ What did you expect to see?

📦 Which tool/library version are you using?

🔬 How could we reproduce it?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions