Skip to content

Commit

Permalink
Fix disappearing outputs from long running tasks
Browse files Browse the repository at this point in the history
with external storage enabled

In case of a:
1. Long running task
2. With big output (externalized)
3. With output growing over time
4. Causing multiple externalize / internalize executions
5. ... such as a join task collecting outputs of all forked tasks
6. Lost some of its outputs when finally completed

This issue was caused by / because:
1. On an Nth execution of a task (such as described above)
2. The task internalized its intermediate output from external storage
3. The task was executed and it updated its output to current value in memory
4. The task tried to externalize the new version of its output
5. ... but while doing so, the outputPayload (last externalized value)
   was combined with outputData (current, in-memory value) in a way
   where output payload over-wrote the latest values
6. Thus, newly calculated outputs have been lost

Signed-off-by: Maros Marsalek <[email protected]>
  • Loading branch information
marosmars committed Apr 11, 2023
1 parent 92ad530 commit 2285531
Showing 1 changed file with 6 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,12 @@ public void setWorkerId(String workerId) {
@JsonIgnore
public Map<String, Object> getOutputData() {
if (!outputPayload.isEmpty() && !outputData.isEmpty()) {
outputData.putAll(outputPayload);
// Combine payload + data
// data has precedence over payload because:
// with external storage enabled, payload contains the old values
// while data contains the latest and if payload took precedence, it
// would remove latest outputs
outputPayload.forEach(outputData::putIfAbsent);
outputPayload = new HashMap<>();
return outputData;
} else if (outputPayload.isEmpty()) {
Expand Down

0 comments on commit 2285531

Please sign in to comment.