Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard against multiple consuming segments for same partition #6483

Merged
merged 2 commits into from
Jan 26, 2021

Conversation

mcvsubbu
Copy link
Contributor

Fixes being considered for issues #5559 and #5263 may end up
resurfacing some race conditions that we have strived to avoid
via automated mechanisms. The race conditions may lead to multiple
consuming segments in the same partition.

This commit is a catch-all safeguard in case some automaticl operation
interferes with other manual primitives provided, so that the table
never reaches such a state.

Description

Add a description of your PR here.
A good description should include pointers to an issue or design document, etc.

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

If you have tagged this as either backward-incompat or release-notes,
you MUST add text here that you would like to see appear in release notes of the
next release.

If you have a series of commits adding or enabling a feature, then
add this section only in final commit that marks the feature completed.
Refer to earlier release notes to see examples of text

Documentation

If you have introduced a new feature or configuration, please add it to the documentation as well.
See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document

@mcvsubbu mcvsubbu requested review from npawar and snleee January 23, 2021 03:30
@codecov-io
Copy link

codecov-io commented Jan 23, 2021

Codecov Report

Merging #6483 (0469e82) into master (1beaab5) will increase coverage by 6.65%.
The diff coverage is 78.97%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6483      +/-   ##
==========================================
+ Coverage   66.44%   73.10%   +6.65%     
==========================================
  Files        1075     1334     +259     
  Lines       54773    65595   +10822     
  Branches     8168     9567    +1399     
==========================================
+ Hits        36396    47952   +11556     
+ Misses      15700    14499    -1201     
- Partials     2677     3144     +467     
Flag Coverage Δ
integration 43.82% <42.05%> (?)
unittests 64.86% <59.96%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ot/broker/broker/AllowAllAccessControlFactory.java 100.00% <ø> (ø)
.../helix/BrokerUserDefinedMessageHandlerFactory.java 52.83% <0.00%> (-13.84%) ⬇️
...ker/routing/instanceselector/InstanceSelector.java 100.00% <ø> (ø)
...ava/org/apache/pinot/client/AbstractResultSet.java 73.33% <ø> (+16.19%) ⬆️
...n/java/org/apache/pinot/client/BrokerResponse.java 100.00% <ø> (ø)
.../main/java/org/apache/pinot/client/Connection.java 44.44% <ø> (-4.40%) ⬇️
...n/java/org/apache/pinot/client/ExecutionStats.java 15.55% <ø> (ø)
...inot/client/JsonAsyncHttpPinotClientTransport.java 56.36% <ø> (-5.64%) ⬇️
...n/java/org/apache/pinot/client/ResultSetGroup.java 65.38% <ø> (+0.16%) ⬆️
.../org/apache/pinot/client/ResultTableResultSet.java 28.00% <ø> (-6.29%) ⬇️
... and 1139 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e5bf05b...0469e82. Read the comment docs.

// These conditions can happen again due to manual operations considered as fixes in Issues #5559 and #5263
// The following check prevents the table from going into such a state (but does not prevent the root cause
// of attempting such a zk update).
if (instanceStatesMap != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instanceStatesMap won't be null here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed null check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instanceStateMap could be null if the table is newly created, i.e. this method gets invoked by setUpNewTable method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helix returns an empty TreeMap() when we create new table

@@ -793,6 +793,27 @@ void updateInstanceStatesForNewConsumingSegment(Map<String, Map<String, String>>
LOGGER.info("Updating segment: {} to ONLINE state", committingSegmentName);
}

// There used to be a race condition in pinot (caused by heavy GC on the controller during segment commit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this caused by controller running into GC and leader changed, and 2 controllers trying to create the segment with the same seq num? If so, this change won't really work because the instanceStatesMap won't be up-to-date

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time this happened, it was caused by gc on the controller, and the same controller waking up to complete the operation it is scheduled to do.
I could not dig up the PR since it happened (I think) when the source code was under com.linkedin domain, but it was fixed by @jackjlli by adding a Stat check when we update metadata for the committing segment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can only happen in the following orders:

  • The current leader controller start committing a segment
  • The current leader controller runs into GC and not responding, leadership changes to another controller
  • The new leader controller committed the segment
  • The previous leader controller wakes up and commit the segment again

If this is the order, the fix won't work because the instanceStatesMap is not up-to-date if it is already calculated before the controller runs into GC. The right fix might be checking the leadership again before actually committing the segment.

@jackjlli What was the fix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is the usage of Stat class. This is the PR I made(#2361). Now it's in persistSegmentZKMetadata method.

If the former lead controller tried to persist its change, it'd fail since the stat version has already been changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, it picks up the latest Idealstate and checks on that znode. If a different controller added a segment, then even if the old controller gets back control, it will not be able to add the new segment

Fixes being considered for issues apache#5559 and apache#5263 may end up
resurfacing some race conditions that we have strived to avoid
via automated mechanisms. The race conditions may lead to multiple
consuming segments in the same partition.

This commit is a catch-all safeguard in case some automaticl operation
interferes with other manual primitives provided, so that the table
never reaches such a state.
@mcvsubbu mcvsubbu force-pushed the idealstate-update-fix branch from 4e8644f to 03f5924 Compare January 25, 2021 21:53
@mcvsubbu
Copy link
Contributor Author

@Jackie-Jiang a couple of files did not have license headers, so my commit was failing. I added them now . can you review please?

@kbastani
Copy link
Contributor

kbastani commented Jan 26, 2021

@mcvsubbu @kkrugler volunteered to put this into a separate pull request. It's ready for review: #6489

@kkrugler kkrugler mentioned this pull request Jan 26, 2021
@mcvsubbu mcvsubbu force-pushed the idealstate-update-fix branch from 03f5924 to 0469e82 Compare January 26, 2021 02:04
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. The purpose of this PR is for protection instead of fixing the race condition. LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants