Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not reallocate whole input string when matching next token #119

Merged
merged 5 commits into from
Jun 11, 2024

Conversation

mvorisek
Copy link
Contributor

@mvorisek mvorisek commented May 31, 2024

Purely a performance refactoring.

The measured performance improvement is about -90% reduced runtime for the included test. With larger test/SQL, the speedup is even more dramatic as the original complexity was O(N^2).

Originally, the whole input string was reallocated after each token in:

such big temporary string was upper cased for each token matching in:

and regex patterns were constructed for each token matching freshly in: (extracted into another PR)

All such operations should be avoided for linear scalability and this PR addresses that.

@mvorisek mvorisek changed the title Do not reallocate sstring when matching next token Do not reallocate string when matching next token May 31, 2024
@mvorisek mvorisek changed the title Do not reallocate string when matching next token Do not reallocate whole input string when matching next token May 31, 2024
@mvorisek mvorisek force-pushed the fix_perf branch 2 times, most recently from ba29e15 to ce7ccec Compare May 31, 2024 20:59
@mvorisek mvorisek marked this pull request as ready for review May 31, 2024 21:00
@mvorisek mvorisek force-pushed the fix_perf branch 13 times, most recently from 52a4fb4 to de3d8d9 Compare June 1, 2024 11:15
@mvorisek mvorisek marked this pull request as draft June 1, 2024 12:35
@mvorisek mvorisek force-pushed the fix_perf branch 2 times, most recently from 6fbb4ca to 06abf17 Compare June 2, 2024 00:41
@mvorisek mvorisek marked this pull request as ready for review June 2, 2024 00:43
@mvorisek
Copy link
Contributor Author

mvorisek commented Jun 2, 2024

PR is finished and the performance improvement (of the "long concat" tests with new 20k elements limit) is the following:

before:

21.370 secs Doctrine\SqlFormatter\Tests\SqlFormatterTest::testFormatLongConcat
18.474 secs Doctrine\SqlFormatter\Tests\TokenizerTest::testTokenizeLongConcat

after

5.710 secs Doctrine\SqlFormatter\Tests\SqlFormatterTest::testFormatLongConcat
1.665 secs Doctrine\SqlFormatter\Tests\TokenizerTest::testTokenizeLongConcat

So the tokenization process is newly about 10x faster and with increased query length the speedup is even bigger.

This PR is faster also on the tests/performance.php script even it uses very small queries:

before:

$ tests\performance.php                        
    <p>Formatted 54 queries</p>
    <p>Average query length of 284.37037 characters</p>
    <p>
        Took 0.02518 seconds total,
        0.00047 seconds per query,
        0.00164 seconds per 1000 characters
    </p>
    <p>Used 0 bytes of memory</p>

(2nd run)
$ tests\performance.php
    <p>Formatted 54 queries</p>
    <p>Average query length of 284.37037 characters</p>
    <p>
        Took 0.02403 seconds total,
        0.00044 seconds per query,
        0.00156 seconds per 1000 characters
    </p>
    <p>Used 0 bytes of memory</p>

after:

$ tests\performance.php
    <p>Formatted 54 queries</p>
    <p>Average query length of 284.37037 characters</p>
    <p>
        Took 0.01587 seconds total,
        0.00029 seconds per query,
        0.00103 seconds per 1000 characters
    </p>
    <p>Used 0 bytes of memory</p>

(2nd run)
$ tests\performance.php
    <p>Formatted 54 queries</p>
    <p>Average query length of 284.37037 characters</p>
    <p>
        Took 0.01543 seconds total,
        0.00029 seconds per query,
        0.00101 seconds per 1000 characters
    </p>
    <p>Used 0 bytes of memory</p>

@mvorisek mvorisek marked this pull request as draft June 3, 2024 10:34
@mvorisek
Copy link
Contributor Author

Please split this; there seems to be changes that could be indenpendent, for instance "Remove historical comments" can be split out.

commit removed

@greg0ire
Copy link
Member

Are there ways you could split this in several smaller PRs?

@mvorisek
Copy link
Contributor Author

Are there ways you could split this in several smaller PRs?

Yes - done now, here we focus on "no reallocation", ie. using $offset strictly.

@mvorisek mvorisek requested a review from greg0ire June 11, 2024 12:01
@greg0ire
Copy link
Member

Is 3c2a6db about avoiding allocations? It seems to allocate more variables.

@mvorisek
Copy link
Contributor Author

mvorisek commented Jun 11, 2024

Is 3c2a6db about avoiding allocations? It seems to allocate more variables.

Such short allocations are fast and more readable over $str[$offset] and $str[$offset + 1]. Especially, single char string is always interned thus no real allocation is done. This is thanks to internal php optimization.

However I tested it - offset is slightly faster over substr($str, $offset, 1). I belive this is because specific opcode is used over regular function call/overhead.

@greg0ire
Copy link
Member

greg0ire commented Jun 11, 2024

Since it is not related (and even the opposite of what this PR is about, reducing allocations), then let's move the 2 commits in a separate PR please.

@mvorisek
Copy link
Contributor Author

Like really, do I have to use separate PR for using separate variable when this PR is refactoring this topic and the performance was tested?

@greg0ire
Copy link
Member

Do I have to deal with PRs with 7 commits? 7, really? What the hell?

Copy link
Member

@greg0ire greg0ire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR had better be strictly about removing allocations, not adding some.

@mvorisek
Copy link
Contributor Author

done

@greg0ire greg0ire added this to the 1.5.0 milestone Jun 11, 2024
@greg0ire greg0ire added the enhancement New feature or request label Jun 11, 2024
@greg0ire greg0ire merged commit 0bcd33e into doctrine:1.5.x Jun 11, 2024
10 checks passed
@greg0ire
Copy link
Member

Thanks @mvorisek !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants