Performance issues with partition and split [BATCH-2523] #1079
Labels
related-to: performance
status: declined
Features that we don't intend to implement or Bug reports that are invalid or missing enough details
Damien DALY opened BATCH-2523 and commented
Hi,
I am trying to run Spring Batch with splitted/partitionned steps, to increase batch thoughput.
This is a one shot database migation (from Firebird to postgresql), I don't need to store job/state data, so I use a MapJobRepository and a SimpleJobLauncher. Configuration is done by annotations on static class members. There is only one application, no remote step execution, only local code.
I also created a ThreadPoolTaskExecutor.
I have a main flow, that starts sequentially 3 other flows : -->[flow1]-->[flow2]-->[flow3]--.
Flow1 is a split flow, containing single "classic" steps (reader, processor, writer, chunked) and some "partitioned" steps. Each split takes a new SimpleAsyncTaskExecutor instance.
Each partitioner creates lists of entity id (Integer[]) to process.
The TaskExecutor is a singleton of ThreadPoolTaskExecutor.
The performance issue I have seams that there is a long time when master steps are finalising child step executions. If I am right, it looks like a serialization/deserialization process happening to get child steps status/context.
How can I either change serializer/deserializer process, or bypass totally serialization ?
What can be "good" values for gridSize, thread pool size... ?
Thanks.
No further details from BATCH-2523
The text was updated successfully, but these errors were encountered: