-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Commit interval support through StepExecution class. #4134
Comments
Thank you for opening this issue.
In the SO answer you shared I mentioned that the chunk size could be made dynamic through application/system properties or job parameters. I deliberately put application/system properties first because I would recommend this way to configure such "technical" properties (chunk-size, thread-pool size, database connection pool size, etc). I would use job parameters for "business" properties like input file name, input table name, etc. I previously shared this opinion here. However, one could argue that the chunk-size is a property of the job itself (unlike the thread pool size which can be used in another part of the app) and therefore could be made dynamic using a (preferably non-identifying) job parameter (note there is no "step parameter" concept in Spring Batch). Some folks use this approach like in #3948. This is also a correct point of view and I would not consider it as wrong. So as you can see, both are correct and it is a matter of choice/preference. Now regarding your feature request: the commit-interval is an implementation detail of the As mentioned in SO, making the commit interval configurable through application/system properties or job parameters allows you to change the value at startup time (without recompiling your app). The dynamism you are looking for is a runtime dynamism (ie "in-flight", while the job execution is running), and is too specific to the use case (the commit-interval could be changed based on the current resource utilization, another specific condition, etc), and I do not see an obvious way to implement this is a generic way as a framework feature. That's why I believe a specific custom step implementation is the way to go here. Hopefully I addressed all your concerns here and in SO, so I'm closing this issue for now. But do not hesitate to add a comment if you would like to discuss this further. |
I asked a question regarding if a dynamic commit interval is possible here currently supported, and it is, but I believe the solution to be wrong.
The issue core issue I have is the current solution relies on dynamic commit interval being controlled through JobParameters class which is incorrect.
To explain, the commit interval is a property of a step and not a job. This is represented by the implementation of the SimpleStepBuilder.chunk().
Expected Behavior
In the StepExecution class I would like for there to be the inclusion of 2 more methods.
This function would set the commit interval of only the step it is a part of. The access could easily be done through @BeforeStep, @afterstep.
There are three cases of behavior I would believe occur when calling these methods.
Consider the case of the step processing and it's reading a List of size 8(old commit interval) then when setCommitInterval(10) gets called I would expect that only on the next read() do we read in a list of size 10.
I think this solution works for both the sequential and parallel case, as it follows the mentality of "The change does not have to apply instantly, if I read in an old commit interval then it's ok".
Current Behavior
Currently if I want to have a dynamic commit interval then I have to put a reference to the JobParam referencing the commit interval to use.
If I wish to change the commit interval I then have to change it through the JobParams reference.
Why dynamic commit interval is important?
To explain my situation and why I believe this to be important to support.
My use-case of spring-batch is to read and write files between ftp/ftps servers - s3 buckets, these files can range in sizes of 10KB-100+GB.
The commit interval is essential when processing larger files, the throughput is directly affected by this property.
Now when I am detecting that the JVM has available to it 16GB+ I should utilize it.
Generally I am running on HPC Clusters for ex: Chameleon Cloud, CCR.
The text was updated successfully, but these errors were encountered: