Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate string allocations and array generation across gherkin flavours #351

Open
2 of 22 tasks
luke-hill opened this issue Jan 7, 2025 · 1 comment
Open
2 of 22 tasks
Labels
🏦 debt Tech debt

Comments

@luke-hill
Copy link
Contributor

luke-hill commented Jan 7, 2025

🤔 What's the problem you've observed?

Some work was done in .NET here and here. This work should be mimicked (Or at least qualified and checked off that an equivalent piece cannot be done elsewhere).

✨ Do you have a proposal for making it better?

Check equivalent work to PR #336 -> Improved parsing time

  • Java
  • JavaScript
  • Ruby
  • Go
  • Python
  • C
  • Objective-C
  • Perl
  • PHP
  • Dart
  • C++

Check equivalent work to PR #344 -> Avoid allocation and improve parsing time

  • Java
  • JavaScript
  • Ruby
  • Go
  • Python
  • C
  • Objective-C
  • Perl
  • PHP
  • Dart
  • C++

📚 Any additional context?

Some languages it may not be relevant for

@luke-hill luke-hill added the 🏦 debt Tech debt label Jan 7, 2025
@jkronegg
Copy link
Contributor

Java

Reading the very_long.feature using OpenJDK21 with JMH micro-benchmark gives the following result (the parser receives the file content as a String):

Benchmark                  Mode  Cnt     Score     Error  Units
MyClassBenchmark.original  avgt   25  6601.674 ± 433.332  us/op

When reading 1000 times the very_long.feature, IntelliJ's Profiler gives the following flame graph:

Image

Most of time is passed on String trimming (about 50% of total duration). I already worked on the point in the past (#84), but some improvement still needs to be done (I have some ideas on how to improve that😉). Otherwise there is no noticeable performance hot spot.

On one of my real-life project with about 100 rules and 1000 test scenarios, the Parser.parse() takes 340 ms (with about 100 ms of String trimming, that is: only about 30% of the parsing duration).

I'll create an issue on that point.

JMH benchmark code:

public class MyClassBenchmark {
    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public GherkinDocument original(MyClassPlan plan) {
        return plan.parser.parse(plan.featureContent, plan.matcher, "very_long.feature");
    }

    @Test
    void test_for_profiler() {
        MyClassPlan plan = new MyClassPlan();
        for (int i=0; i<1000; i++) plan.parser.parse(plan.featureContent, plan.matcher, "very_long.feature");
    }
}

@State(Scope.Benchmark)
public class MyClassPlan {
    TokenMatcher matcher = new TokenMatcher("en");
    IdGenerator idGenerator = new IncrementingIdGenerator();
    Path path = Paths.get("../testdata/good/very_long.feature");
    Parser<GherkinDocument> parser = new Parser<>(new GherkinDocumentBuilder(idGenerator, path.toString()));
    String featureContent;
    {
        try {
            featureContent = new String(Files.readAllBytes(path));
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏦 debt Tech debt
Projects
None yet
Development

No branches or pull requests

2 participants