Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Commit interval support through StepExecution class. #4134

Closed
jgoldverg opened this issue Jun 17, 2022 · 1 comment
Closed

Dynamic Commit interval support through StepExecution class. #4134

jgoldverg opened this issue Jun 17, 2022 · 1 comment
Labels
status: declined Features that we don't intend to implement or Bug reports that are invalid or missing enough details

Comments

@jgoldverg
Copy link

I asked a question regarding if a dynamic commit interval is possible here currently supported, and it is, but I believe the solution to be wrong.
The issue core issue I have is the current solution relies on dynamic commit interval being controlled through JobParameters class which is incorrect.
To explain, the commit interval is a property of a step and not a job. This is represented by the implementation of the SimpleStepBuilder.chunk().

Expected Behavior
In the StepExecution class I would like for there to be the inclusion of 2 more methods.

  1. setCommitInterval()
    This function would set the commit interval of only the step it is a part of. The access could easily be done through @BeforeStep, @afterstep.
    There are three cases of behavior I would believe occur when calling these methods.
    1. Before the step executes: the setCommitInterval would override the interval provided during the creation of the step through StepBuilder.
    2. After the step executes: this should not throw null but simply do nothing or still set the property but this does not truly matter.
    3. During the step: This is the trickiest case, but I believe this is where "soft apply" should be used.
      Consider the case of the step processing and it's reading a List of size 8(old commit interval) then when setCommitInterval(10) gets called I would expect that only on the next read() do we read in a list of size 10.
      I think this solution works for both the sequential and parallel case, as it follows the mentality of "The change does not have to apply instantly, if I read in an old commit interval then it's ok".
  2. getCommitInterval(). I believe this to be self-explanatory but would just like to say I don't think there should be any locking/blocking property.

Current Behavior
Currently if I want to have a dynamic commit interval then I have to put a reference to the JobParam referencing the commit interval to use.
If I wish to change the commit interval I then have to change it through the JobParams reference.

Why dynamic commit interval is important?
To explain my situation and why I believe this to be important to support.
My use-case of spring-batch is to read and write files between ftp/ftps servers - s3 buckets, these files can range in sizes of 10KB-100+GB.
The commit interval is essential when processing larger files, the throughput is directly affected by this property.
Now when I am detecting that the JVM has available to it 16GB+ I should utilize it.
Generally I am running on HPC Clusters for ex: Chameleon Cloud, CCR.

@jgoldverg jgoldverg added status: waiting-for-triage Issues that we did not analyse yet type: feature labels Jun 17, 2022
@fmbenhassine
Copy link
Contributor

Thank you for opening this issue.

I asked a question regarding if a dynamic commit interval is possible here currently supported, and it is, but I believe the solution to be wrong. The issue core issue I have is the current solution relies on dynamic commit interval being controlled through JobParameters class which is incorrect. To explain, the commit interval is a property of a step and not a job. This is represented by the implementation of the SimpleStepBuilder.chunk().

In the SO answer you shared I mentioned that the chunk size could be made dynamic through application/system properties or job parameters. I deliberately put application/system properties first because I would recommend this way to configure such "technical" properties (chunk-size, thread-pool size, database connection pool size, etc). I would use job parameters for "business" properties like input file name, input table name, etc. I previously shared this opinion here.

However, one could argue that the chunk-size is a property of the job itself (unlike the thread pool size which can be used in another part of the app) and therefore could be made dynamic using a (preferably non-identifying) job parameter (note there is no "step parameter" concept in Spring Batch). Some folks use this approach like in #3948. This is also a correct point of view and I would not consider it as wrong.

So as you can see, both are correct and it is a matter of choice/preference.


Now regarding your feature request: the commit-interval is an implementation detail of the ChunkOrientedTasklet, which is a specific implementation of Tasklet that is a property of TaskletStep. So as you can see, we are here two levels down from Step and StepExecution. That's why we cannot add the commit-interval as a property in StepExecution. The StepExecution could represent an execution of a partitioned step, in which the commit-interval property (which is suggested to be added) does not make sense.

As mentioned in SO, making the commit interval configurable through application/system properties or job parameters allows you to change the value at startup time (without recompiling your app). The dynamism you are looking for is a runtime dynamism (ie "in-flight", while the job execution is running), and is too specific to the use case (the commit-interval could be changed based on the current resource utilization, another specific condition, etc), and I do not see an obvious way to implement this is a generic way as a framework feature. That's why I believe a specific custom step implementation is the way to go here.

Hopefully I addressed all your concerns here and in SO, so I'm closing this issue for now. But do not hesitate to add a comment if you would like to discuss this further.

@fmbenhassine fmbenhassine closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2022
@fmbenhassine fmbenhassine added status: declined Features that we don't intend to implement or Bug reports that are invalid or missing enough details and removed status: waiting-for-triage Issues that we did not analyse yet type: feature labels Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: declined Features that we don't intend to implement or Bug reports that are invalid or missing enough details
Projects
None yet
Development

No branches or pull requests

2 participants