[SPARK-48802][SS][FOLLOWUP] FileStreamSource maxCachedFiles set to 0 causes batch with no files to be processed #47195
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This is a followup to a bug identified from #45362. When setting
maxCachedFilesto 0 (to force a full relisting of files for each batch, see https://issues.apache.org/jira/browse/SPARK-44924) subsequent batches of files would be skipped due to a logic error that carried forward an empty array ofunreadFileswhich was only being null checked. This update includes additional checks to verify thatunreadFilesis also non-empty as a guard condition to prevent batches executing with no files, as well as checks to ensure thatunreadFilesis only set if a) there are files remaining in the listing and b)maxCachedFilesis greater than 0Why are the changes needed?
Setting the
maxCachedFilesconfiguration to 0 would inadvertently cause every other batch to contain 0 files, which is an unexpected behavior for users.Does this PR introduce any user-facing change?
Fixes the case where users may want to always perform a full listing of files each batch by setting
maxCachedFilesto 0How was this patch tested?
New test added to verify
maxCachedFilesset to 0 would perform a file listing each batchWas this patch authored or co-authored using generative AI tooling?
No