-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
throttle-limit unclear, javadoc wrong #3947
Comments
The import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.listener.JobExecutionListenerSupport;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
@Configuration
@EnableBatchProcessing
public class GH3947 {
@Bean
ThreadPoolTaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(10);
taskExecutor.setMaxPoolSize(10);
taskExecutor.setQueueCapacity(10);
return taskExecutor;
}
@Bean
public Job job(JobBuilderFactory jobs, StepBuilderFactory steps, ThreadPoolTaskExecutor taskExecutor) {
Tasklet tasklet = (contribution, chunkContext) -> {
System.out.println(Thread.currentThread().getName() + " running tasklet");
return RepeatStatus.FINISHED;
};
return jobs.get("job")
.start(steps.get("step")
.tasklet(tasklet)
.taskExecutor(taskExecutor)
.throttleLimit(5)
.build())
.listener(new JobExecutionListenerSupport() {
@Override
public void afterJob(JobExecution jobExecution) {
taskExecutor.shutdown();
}
})
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(GH3947.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
} This prints:
In this example, the task executor pool size is set to 10, but the number of concurrent Tasklet executions is throttled at 5. Is that more clear to you now? Hopefully this example makes things a bit clearer.
I think this javadoc statement is about the case where the task executor is configured to limit the number of concurrent tasks to a value lower than the throttle limit, in which case it would prevent the throttle limit to be reached. That said, the fact that this throttle limit parameter may interfere with the throttle limit of the task executor itself (which is the confusion you are facing I think), and in addition to other issues like #1516, this parameter will be deprecated in v5 in favor of the throttling capabilities of the provided task executor, see #2218. With regard to the parallel sample improvement, we are actually planning to improve all samples in v5 as well, see #3663 . |
Your example doesn't terminate. The main thread ends but all the threads in the pool block waiting on a new task. You need to call You cap the concurrency at 5 with the throttle limit but permanently have 10 threads in the pool. That's a waste of resources, doesn't make any sense. And why cause a queue for waiting tasks to be created with |
That's not the real issue here, I updated the sample to shutdown the task executor with a listener after the job. That said, the task executor could be kept running if the JVM was used for something else after the job is finished (like for example to run subsequent scheduled jobs).
Of course it's a waste of resources in this particular case, that's why it is up to you to correctly configure the size of your thread pool and its queue capacity. The example is only for demonstration purpose to clarify things, and could not explain the purpose of throttle-limit parameter if its value was higher than the pool size. But this is not always a waste of resources, for example in the case where the same thread pool is used in different parts of batch application. Many users prefer creating a single thread pool (instead of several ones due to the cost of creating threads/pools) and (re)use it in different parts of the batch app, like for example to launch jobs (by setting the pool in the |
I was actually not asking for an explanation. I understand what the throttle limit does because I have looked at the code. My point is the throttle limit is unclearly documented and explained at best. The throttle limit is not a cap on the number of concurrent executions. It is the exact number of (dummy) runnables that are submitted to the Permits would be a more appropriate name. Why not use a bounded queue as a thread handoff mechanism instead? What exactly is |
With your example setting a queue capacity it is easy to get the batch to hang with an inappropriate throttle limit - because throttle limit is not just a cap but also the fewest number of tasks sent to the executor. I've opened a bug report for this problem here: #3951. |
ok let's back up, one issue at a time because you are addressing several aspects at once (documentation, implementation, samples, etc).
Well, based on your statement:
So here you say throttle limit is not a cap on the number of executions, but in your initial comment you said: There are at least two things reported here based on what you shared: Documentation issueYou claim the Javadoc is misleading or wrong, but I don't think so. The javadocs being discussed here are:
This is clear and correct IMO, as shown in the previous example. If it is not clear to you, please open a PR with what you think will clarify the docs.
This is also correct. If you set your thread pool size to 10 and the throttle limit to 2000, then your throttle limit won't be reached since the thread pool will prevent that from happening anyway (ie having 2000 concurrent executions with only 10 threads). That's why the javadoc suggests to make the pool size larger than the throttle limit if possible. Again, if you think this is not clear enough, please open a PR with what you think can make things clearer. That said, I'm closing this issue in favor of a PR from your part if you still think this is needed. Implementation issue
We are open to constructive feedback. If you think the current implementation can be improved, you are welcome to share your ideas or open a PR. Please note that the throttle-limit will be deprecated (#2218) and the concurrency model will be reviewed (#3950), so I suggest you wait for that to happen before attempting to improve the current implementation.
Thank you for opening that issue. We will take a look and get back to you. |
Sorry, you're not going to sell me on this code. Just quoting the javadoc and claiming everything is correct is not going to convince anybody. Please read the code. You don't seem to be familiar with it. I've shown it is quite easy to produce a deadlock in multithreaded step among other things. We'll just have to hope for an improvement in v5 (after no improvement in v2-4). I have nothing left to say on this topic, except that scalability is one of the big reasons we were considering adopting Spring Batch, and Michael Minella's book, after claiming "the features Spring Batch offers to scale ... are some of the strongest of any framework," spends only two pages on multithreaded step and doesn't even mention the crucial throttleLimit property, or says not to use SimpleAsyncTaskExecutor after using it in the examples, but doesn't describe any alternatives or their pitfalls. |
It is unclear what throttle-limit does, and the Javadoc comment in TaskExecutorRepeatTemplate#setThrottleLimit is wrong imho.
Yes, "the throttle limit is the largest number of concurrent tasks that can be executing at one time", but the following statement in the javadoc (from @dsyer possibly?) is misleading or wrong I think:
The semaphore waits in ResultHolderResultQueue prevents more than "throttle limit" tasks being active at once, so if the thread pool is larger, there will be idle threads.
From another angle, a ThreadPoolExecutor will normally never block when adding a task, so how could it prevent throttle limit from being reached? Tasks above and beyond the number of available threads will simply be queued.
Throttle limit is the number of tasks being processed by threads plus the number of tasks in the jobQueue when using a ThreadPoolTaskExecutor (and hence a ThreadPoolExecutor), so a better name for the property would be e.g. max-concurrent.
I find the example parallel-sample could be much improved. One could set the property throttle-limit, use a ThreadPoolTaskExecutor (with e.g. corePoolSize == maxPoolSize for a fixed-size task executor), and have more than five lines as input data! See 20070122.teststream.ImportTradeDataStep.txt.
On the same note, throttle-limit does not even get a mention in @mminella's latest book. The documentation and explanation for scaling Spring Batch in general could be much improved and expanded in my opinion.
The text was updated successfully, but these errors were encountered: