-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance in JdbcStepExecutionDao #3635
Conversation
Optimize SQL in GET_LAST_STEP_EXECUTION to work fast on large amount of records in %PREFIX%JOB_EXECUTION table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested (just by manually tweaking the query) this change and it leads to massive performance improvement having the same result.
before (16363.873 ms)
production=# EXPLAIN ANALYZE SELECT SE.STEP_EXECUTION_ID, SE.STEP_NAME, SE.START_TIME, SE.END_TIME, SE.STATUS, SE.COMMIT_COUNT, SE.READ_COUNT, SE.FILTER_COUNT, SE.WRITE_COUNT, SE.EXIT_CODE, SE.EXIT_MESSAGE, SE.READ_SKIP_COUNT, SE.WRITE_SKIP_COUNT, SE.PROCESS_SKIP_COUNT, SE.ROLLBACK_COUNT, SE.LAST_UPDATED, SE.VERSION, JE.JOB_EXECUTION_ID, JE.START_TIME, JE.END_TIME, JE.STATUS, JE.EXIT_CODE, JE.EXIT_MESSAGE, JE.CREATE_TIME, JE.LAST_UPDATED, JE.VERSION from BATCH_JOB_EXECUTION JE, BATCH_STEP_EXECUTION SE where SE.JOB_EXECUTION_ID in (SELECT JOB_EXECUTION_ID from BATCH_JOB_EXECUTION where JE.JOB_INSTANCE_ID = 6919782) and SE.JOB_EXECUTION_ID = JE.JOB_EXECUTION_ID and SE.STEP_NAME = 'webhooks.destinations.batch.job.step.disable' order by SE.START_TIME desc, SE.STEP_EXECUTION_ID desc;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1889629.39..1892730.97 rows=1240632 width=232) (actual time=16354.241..16354.242 rows=1 loops=1)
Sort Key: se.start_time DESC, se.step_execution_id DESC
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=396812.35..1667136.03 rows=1240632 width=232) (actual time=14724.619..16354.230 rows=1 loops=1)
Hash Cond: (se.job_execution_id = je.job_execution_id)
Join Filter: (SubPlan 1)
Rows Removed by Join Filter: 2506751
-> Seq Scan on batch_step_execution se (cost=0.00..1054638.36 rows=2481263 width=173) (actual time=364.293..4884.475 rows=2506752 loops=1)
Filter: ((step_name)::text = 'webhooks.destinations.batch.job.step.disable'::text)
Rows Removed by Filter: 11481938
-> Hash (cost=222134.49..222134.49 rows=6932949 width=75) (actual time=3940.651..3940.651 rows=6921527 loops=1)
Buckets: 524288 Batches: 32 Memory Usage: 27805kB
-> Seq Scan on batch_job_execution je (cost=0.00..222134.49 rows=6932949 width=75) (actual time=0.016..1516.128 rows=6921527 loops=1)
SubPlan 1
-> Result (cost=0.00..222134.49 rows=6932949 width=8) (actual time=0.000..0.001 rows=3 loops=2506752)
One-Time Filter: (je.job_instance_id = 6919782)
-> Seq Scan on batch_job_execution (cost=0.00..222134.49 rows=6932949 width=8) (actual time=0.014..1044.411 rows=6919456 loops=1)
Planning Time: 0.553 ms
JIT:
Functions: 20
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 7.343 ms, Inlining 12.015 ms, Optimization 213.212 ms, Emission 138.594 ms, Total 371.165 ms
Execution Time: 16363.873 ms
(23 rows)
after (0.193 ms)
production=# EXPLAIN ANALYZE SELECT SE.STEP_EXECUTION_ID, SE.STEP_NAME, SE.START_TIME, SE.END_TIME, SE.STATUS, SE.COMMIT_COUNT, SE.READ_COUNT, SE.FILTER_COUNT, SE.WRITE_COUNT, SE.EXIT_CODE, SE.EXIT_MESSAGE, SE.READ_SKIP_COUNT, SE.WRITE_SKIP_COUNT, SE.PROCESS_SKIP_COUNT, SE.ROLLBACK_COUNT, SE.LAST_UPDATED, SE.VERSION, JE.JOB_EXECUTION_ID, JE.START_TIME, JE.END_TIME, JE.STATUS, JE.EXIT_CODE, JE.EXIT_MESSAGE, JE.CREATE_TIME, JE.LAST_UPDATED, JE.VERSION from BATCH_JOB_EXECUTION JE, BATCH_STEP_EXECUTION SE where SE.JOB_EXECUTION_ID in (SELECT JOB_EXECUTION_ID from BATCH_JOB_EXECUTION where JOB_INSTANCE_ID = 6919782) and SE.JOB_EXECUTION_ID = JE.JOB_EXECUTION_ID and SE.STEP_NAME = 'webhooks.destinations.batch.job.step.disable' order by SE.START_TIME desc, SE.STEP_EXECUTION_ID desc;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=9.97..9.97 rows=1 width=232) (actual time=0.104..0.105 rows=1 loops=1)
Sort Key: se.start_time DESC, se.step_execution_id DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1.43..9.96 rows=1 width=232) (actual time=0.096..0.098 rows=1 loops=1)
-> Nested Loop (cost=0.86..8.90 rows=1 width=75) (actual time=0.028..0.029 rows=1 loops=1)
-> Index Scan using index_batch_job_execution_on_job_instance_id on batch_job_execution (cost=0.43..4.45 rows=1 width=8) (actual time=0.016..0.017 rows=1 loops=1)
Index Cond: (job_instance_id = 6919782)
-> Index Scan using batch_job_execution_pkey on batch_job_execution je (cost=0.43..4.45 rows=1 width=67) (actual time=0.009..0.009 rows=1 loops=1)
Index Cond: (job_execution_id = batch_job_execution.job_execution_id)
-> Index Scan using josef_test_4 on batch_step_execution se (cost=0.56..0.99 rows=7 width=173) (actual time=0.066..0.066 rows=1 loops=1)
Index Cond: (((step_name)::text = 'webhooks.destinations.batch.job.step.disable'::text) AND (job_execution_id = je.job_execution_id))
Planning Time: 0.946 ms
Execution Time: 0.193 ms
(13 rows)
PS: Is there any easy way to use this change with latest release unless this will make it out? I can also test it in that way and report back.
@simi Thank you for taking time to test the fix and for sharing these numbers!
You can override |
LGTM, rebased and merged. This is a small change with a big improvement 😄 @mcheban Thank you very much for raising the issue and contributing a fix! |
Optimize SQL in GET_LAST_STEP_EXECUTION to work fast on large amount of records in %PREFIX%JOB_EXECUTION table
Fix for #3634
@pivotal-issuemaster This is an Obvious Fix