Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up long-running migrations #2552

Merged

Conversation

stefankaufmann
Copy link
Contributor

Boost performance for migrations scripts

@stefankaufmann stefankaufmann changed the title fix #2545 Fix long running or never ending migrations #2545 Nov 7, 2024
@bilke
Copy link

bilke commented Nov 7, 2024

I applied your patch and it seems I got through 2024_07_09_025240_find_test_measurements_by_testid but now got another error in 2024_08_24_160326_label2test_relationship_refactor:

cdash  | Running migrations...
cdash  |
cdash  |    INFO  Running migrations.
cdash  |
cdash  |   2024_07_09_025240_find_test_measurements_by_testid    INFO  No scheduled commands are ready to run.
cdash  |
cdash  | 172.18.0.1 - - [07/Nov/2024:10:23:26 +0000] "GET / HTTP/1.1" 503 382
cdash  | ........... 66,765ms DONE
cdash  |   2024_08_24_160326_label2test_relationship_refactor ............ 7,436ms FAIL
cdash  |
cdash  | In Connection.php line 829:
cdash  |
cdash  |   SQLSTATE[22012]: Division by zero: 7 ERROR:  division by zero (Connection:
cdash  |   pgsql, SQL:
cdash  |                       UPDATE label2test
cdash  |                       SET testid = build2test.id
cdash  |                       FROM build2test
cdash  |                       WHERE
cdash  |                           build2test.buildid = label2test.buildid
cdash  |                           AND build2test.outputid = label2test.outputid
cdash  |                           AND label2test.testid IS NULL
cdash  |                           AND build2test.buildid % 0 = 0
cdash  |                   )
cdash  |
cdash  |
cdash  | In Connection.php line 612:
cdash  |
cdash  |   SQLSTATE[22012]: Division by zero: 7 ERROR:  division by zero

@bilke
Copy link

bilke commented Nov 7, 2024

I also applied the following change:

diff --git a/database/migrations/2024_08_24_160326_label2test_relationship_refactor.php b/database/migrations/2024_08_24_160326_label2test_relationship_refactor.php
index 67994ad43..64b7745db 100644
--- a/database/migrations/2024_08_24_160326_label2test_relationship_refactor.php
+++ b/database/migrations/2024_08_24_160326_label2test_relationship_refactor.php
@@ -22,7 +22,7 @@ return new class extends Migration {

         // Execute at most 10k batches of buildwise updates
         if (config('database.default') === 'pgsql') {
-            for ($i = 0; $i < ceil($count / 10000); $i++) {
+            for ($i = 1; $i < ceil($count / 10000); $i++) {
                 DB::update('
                     UPDATE label2test
                     SET testid = build2test.id

Then the migrations completed.

@stefankaufmann
Copy link
Contributor Author

stefankaufmann commented Nov 7, 2024

For me the label2test table was empty, so we did not run into this problem

@bilke : I am not sure about your fix, but if the count is less than 10000, then the loop will not execute at all, probably it needs to be changed to $i <= ceil($count / 10000) to run at least once

Feedback from Kitware would be nice here. But for me it looks like the current implementation will cause the division by zero problem and should be changed

@williamjallen
Copy link
Collaborator

williamjallen commented Nov 7, 2024

It looks like there are two issues here:

  1. The slowness reported in Upgrade from 3.5.1 to 3.6.0 never finishes migration steps #2545, with a fix proposed in this PR

    • I am not convinced that this is the right solution. Even though it seems to be faster on Postgres, this wasn't found to be substantially faster on MySQL. A better approach would probably to use batching, similar to 2024_08_24_160326_label2test_relationship_refactor.php.
  2. A separate divide-by-zero issue in 2024_08_24_160326_label2test_relationship_refactor.php

    • At first glance, this definitely seems like an issue. It's unclear to me how this succeeded on some production systems. I'll open a PR with a fix.

@stefankaufmann
Copy link
Contributor Author

stefankaufmann commented Nov 7, 2024

@williamjallen

I am not convinced that this is the right solution. Even though it seems to be faster on Postgres, this wasn't found to be substantially faster on MySQL. A better approach would probably to use batching, similar to 2024_08_24_160326_label2test_relationship_refactor.php.

I am not sure if batching would really help here. For us 1.5 million entries of build2test are compare to another 1.5 million entries in testoutput. Even in batches this might take very long using NOT IN

Here are just a few references suggesting to replace NOT IN by NOT EXISTS for postgres

If you want to keep the current implementation for MySQL, I could adjust the code thus that NOT EXISTS is only used for postgres

@williamjallen williamjallen changed the title Fix long running or never ending migrations #2545 Speed up long-running migrations Nov 8, 2024
@williamjallen williamjallen changed the base branch from master to releases/3.6 November 8, 2024 18:40
@williamjallen williamjallen force-pushed the fix/endless_migrations branch from 3872843 to c601505 Compare November 8, 2024 18:47
@williamjallen
Copy link
Collaborator

@stefankaufmann Thanks for this contribution! After playing around with this locally a bit, I didn't see any negative performance impacts of this change when using MySQL, and it clearly seems to help Postgres.

I also created #2555 which fixes the other divide-by-zero error reported above.

Both of these changes will be included in a patch release: CDash 3.6.1.

@stefankaufmann
Copy link
Contributor Author

@williamjallen Thanks a lot. I really appreciate how quick my pull requests have been processed

@zackgalbreath zackgalbreath added this pull request to the merge queue Nov 11, 2024
Merged via the queue into Kitware:releases/3.6 with commit f0424a2 Nov 11, 2024
6 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Nov 11, 2024
As reported in
#2552 (comment),
#2496 introduced a divide-by-zero error in the migration
`2024_08_24_160326_label2test_relationship_refactor`. This PR fixes the
issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants