Skip to content

Preserve sourcePartition to Splits mapping in task descriptor #19478

Merged
losipiuk merged 3 commits intotrinodb:masterfrom
losipiuk:lo/source-partition-info
Oct 26, 2023
Merged

Preserve sourcePartition to Splits mapping in task descriptor #19478
losipiuk merged 3 commits intotrinodb:masterfrom
losipiuk:lo/source-partition-info

Conversation

@losipiuk
Copy link
Copy Markdown
Member

Description

We need to preserve information which splits map to which source
partition id in task descriptor to be able to split task descriptor into
multiple ones. Splitting is important in case we made a scheduling
mistake and packed to many source partitions into single task descriptor
and execution of such task is not possible due to lack of resources.

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

If we observe on task failure that partition the task was handling
was already processed by the task ignore the failure.
Explicitly use taskPartition in place of partition in
SplitAssignerTester. This is a preparatory commit to avoid confusion as
we will be dealing with source data partitions in same context with
upcoming changes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REPLICATED_SOURCE_PARTITION is also 0

document why 0 is ok here or give the constant a name

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USed:

// marker source partition id for data which is not hash distributed
    int SINGLE_SOURCE_PARTITION_ID = 0;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sometimes used for iteration. can this return a Stream<Entry?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the production code the iteration usecase is only for generating debug string when we run out of memory in task descriptor storage. I think it is not worth optimizing.
Let's discuss tomorrow - maybe I do not understand what benefits we can get here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code can be made much more simpler and hopefully not much less performant:

public SplitsMapping build()
{
    ImmutableMap.Builder<PlanNodeId, Map<Integer, List<Split>>> result = ImmutableMap.builder();
    for (PlanNodeId planNodeId : Sets.union(originalMapping.splits.keySet(), updates.keySet())) {
        Map<Integer, List<Split>> planNodeOriginalMapping = originalMapping.splits.getOrDefault(planNodeId, ImmutableMap.of());
        Map<Integer, ImmutableList.Builder<Split>> planNodeUpdates = updates.getOrDefault(planNodeId, ImmutableMap.of());
        if (planNodeUpdates.isEmpty()) {
            // just use original splits for planNodeId
            result.put(planNodeId, planNodeOriginalMapping);
            continue;
        }
        // create new mapping for planNodeId reusing as much of source as possible
        ImmutableMap.Builder<Integer, List<Split>> targetSplitsMapBuilder = ImmutableMap.builder();
        for (Integer sourcePartitionId : Sets.union(planNodeOriginalMapping.keySet(), planNodeUpdates.keySet())) {
            @Nullable List<Split> originalSplits = planNodeOriginalMapping.get(sourcePartitionId);
            @Nullable ImmutableList.Builder<Split> splitUpdates = planNodeUpdates.get(sourcePartitionId);
            targetSplitsMapBuilder.put(sourcePartitionId, mergeIfPresent(originalSplits, splitUpdates));
        }
        result.put(planNodeId, targetSplitsMapBuilder.buildOrThrow());
    }
    return new SplitsMapping(result.buildOrThrow());
}

private static <T> List<T> mergeIfPresent(@Nullable List<T> list, @Nullable ImmutableList.Builder<T> additionalElements)
{
    if (additionalElements == null) {
        // reuse source immutable split list
        return requireNonNull(list, "list is null");
    }
    if (list == null) {
        return additionalElements.build();
    }
    return ImmutableList.<T>builder()
            .addAll(list)
            .addAll(additionalElements.build())
            .build();
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler indeed.
Thanks for reminding my that I cannot code :P

Copy link
Copy Markdown
Member Author

@losipiuk losipiuk Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you earned the Co-authored-By: ...

Comment on lines 280 to 275
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider Table?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like

        Table<PlanNodeId, Integer, ImmutableList.Builder<Split>> ....;

It would work - but Table interface is less nice to work with. E.g you do not have computeIfAbsent

@losipiuk losipiuk force-pushed the lo/source-partition-info branch 2 times, most recently from 0c78d49 to 5b43fa1 Compare October 25, 2023 08:28
@losipiuk losipiuk marked this pull request as ready for review October 25, 2023 08:29
We need to preserve information which splits map to which source
partition id in task descriptor to be able to split task descriptor into
multiple ones. Splitting is important in case we made a scheduling
mistake and packed to many source partitions into single task descriptor
and execution of such task is not possible due to lack of resources.
@losipiuk losipiuk force-pushed the lo/source-partition-info branch from 5b43fa1 to 49116d1 Compare October 25, 2023 20:51
@losipiuk
Copy link
Copy Markdown
Member Author

@findepi I added one more tiny commit - PTAL when you have a chance.

@losipiuk losipiuk force-pushed the lo/source-partition-info branch from 49116d1 to 565a173 Compare October 26, 2023 08:49
@losipiuk
Copy link
Copy Markdown
Member Author

@findepi I added one more tiny commit - PTAL when you have a chance.

Actually it does not work correctly due to write skew mitigation logic - I will send separate - slightly more complex PR to address that.

@losipiuk losipiuk merged commit c2807e7 into trinodb:master Oct 26, 2023
@github-actions github-actions bot added this to the 431 milestone Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants