Skip to content

Commit

Permalink
Optimize calculation of close time to avoid impasse and minimize grat…
Browse files Browse the repository at this point in the history
…uitous proposal changes (XRPLF#4760)

* Optimize the calculation of close time to avoid
impasse and minimize gratuitous proposal changes.

* git apply clang-format.patch

* Review (Howard) fixes.

* Review fix for impasse discovered by John.

* Review fixes (comments) from John.

* Scott S review fixes. Also clang-format.
  • Loading branch information
mtrippled authored Nov 17, 2023
1 parent d593972 commit 8ce85a9
Showing 1 changed file with 154 additions and 43 deletions.
197 changes: 154 additions & 43 deletions src/ripple/consensus/Consensus.h
Original file line number Diff line number Diff line change
Expand Up @@ -1759,15 +1759,27 @@ Consensus<Adaptor>::updateOurPositions(bool const share)
else
{
int neededWeight;
// It's possible to be at a close time impasse (described below), so
// keep track of whether this round has taken a long time.
bool stuck = false;

if (convergePercent_ < parms.avMID_CONSENSUS_TIME)
{
neededWeight = parms.avINIT_CONSENSUS_PCT;
}
else if (convergePercent_ < parms.avLATE_CONSENSUS_TIME)
{
neededWeight = parms.avMID_CONSENSUS_PCT;
}
else if (convergePercent_ < parms.avSTUCK_CONSENSUS_TIME)
{
neededWeight = parms.avLATE_CONSENSUS_PCT;
}
else
{
neededWeight = parms.avSTUCK_CONSENSUS_PCT;
stuck = true;
}

int participants = currPeerPositions_.size();
if (mode_.get() == ConsensusMode::proposing)
Expand All @@ -1787,57 +1799,156 @@ Consensus<Adaptor>::updateOurPositions(bool const share)
<< " nw:" << neededWeight << " thrV:" << threshVote
<< " thrC:" << threshConsensus;

// An impasse is possible unless a validator pretends to change
// its close time vote. Imagine 5 validators. 3 have positions
// for close time t1, and 2 with t2. That's an impasse because
// 75% will never be met. However, if one of the validators voting
// for t2 switches to t1, then that will be 80% and sufficient
// to break the impasse. It's also OK for those agreeing
// with the 3 to pretend to vote for the one with 2, because
// that will never exceed the threshold of 75%, even with as
// few as 3 validators. The most it can achieve is 2/3.
for (auto& [t, v] : closeTimeVotes)
// Choose a close time and decide whether there is consensus for it.
//
// Close time is chosen based on the threshVote threshold
// calculated above. If a close time has votes equal to or greater than
// that threshold, then that is the best close time. If multiple
// close times have an equal number of votes, then choose the greatest
// of them. Ensure that our close time then matches that which meets
// the criteria. But if no close time meet the criteria, make no
// changes.
//
// This is implemented slightly differently for validators vs
// non-validators. For non-validators, it is sufficient to merely
// count the close times from all peer positions to determine
// the best. Validators, however, risk putting the network into an
// impasse unless they are able to change their own position without
// first having counted it towards the close time totals.
//
// Here's how the impasse could occur:
// Imagine 5 validators. 3 have close time t1, and 2 have t2.
// As consensus time increases, the threshVote threshold also increases.
// Once threshVote exceeds 60%, no members of either set of validators
// will change their close times.
//
// Avoiding the impasse means that validators should identify
// whether they currently have the best close time. First, choose
// the close time with the most votes. However, if multiple close times
// have the same number of votes, pick the latest of them.
// If the validator does not currently have the best close time,
// switch to it and increase the local vote tally for that better
// close time. This will result in consensus in the next iteration
// assuming that the peer messages propagate successfully.
// In this case the validators in set t1 will remain the same but
// those in t2 switch to t1.
//
// Another wrinkle, however, is that too many position changes
// from validators also has a destabilizing affect. Consensus can
// take longer if peers have to keep re-calculating positions,
// and mistakes can be made if peers settle on the wrong position.
// Therefore, avoiding an impasse should also minimize the likelihood
// of gratuitous change of position.
//
// The solution for validators is to first track whether it's
// possible that the network is at an impasse based on how much
// time this current consensus round has taken. This is reflected
// in the "stuck" boolean. When stuck, validators perform the
// above-described position change based solely on whether or not
// they agree with the best position, and ignore the threshVote
// criteria used for the earlier part of the phase.
//
// Determining whether there is close time consensus is very simple
// in comparison: if votes for the best close time meet or exceed
// threshConsensus, then we have close time consensus. Otherwise, not.

// First, find the best close time with which to agree: first criteria
// is the close time with the most votes. If a tie, the latest
// close time of those tied for the maximum number of votes.
std::multimap<int, NetClock::time_point> votesByCloseTime;
std::stringstream ss;
ss << "Close time calculation for ledger sequence "
<< static_cast<std::uint32_t>(previousLedger_.seq()) + 1
<< " Close times and vote count are as follows: ";
bool first = true;
for (auto const& [closeTime, voteCount] : closeTimeVotes)
{
if (adaptor_.validating() &&
t != asCloseTime(result_->position.closeTime()))
{
JLOG(j_.debug()) << "Others have voted for a close time "
"different than ours. Adding our vote "
"to this one in case it is necessary "
"to break an impasse.";
++v;
}
JLOG(j_.debug())
<< "CCTime: seq "
<< static_cast<std::uint32_t>(previousLedger_.seq()) + 1 << ": "
<< t.time_since_epoch().count() << " has " << v << ", "
<< threshVote << " required";

if (v >= threshVote)
if (first)
first = false;
else
ss << ',';
votesByCloseTime.insert({voteCount, closeTime});
ss << closeTime.time_since_epoch().count() << ':' << voteCount;
}
// These always gets populated because currPeerPositions_ is not
// empty to end up here, so at least 1 close time has at least 1 vote.
assert(!currPeerPositions_.empty());
std::optional<int> maxVote;
std::set<NetClock::time_point> maxCloseTimes;
// Highest vote getter is last. Track each close time that is tied
// with the highest.
for (auto rit = votesByCloseTime.crbegin();
rit != votesByCloseTime.crend();
++rit)
{
int const voteCount = rit->first;
if (!maxVote.has_value())
maxVote = voteCount;
else if (voteCount < *maxVote)
break;
maxCloseTimes.insert(rit->second);
}
// The best close time is the latest close time of those that have
// the maximum number of votes.
NetClock::time_point const bestCloseTime = *maxCloseTimes.crbegin();
ss << ". The best close time has the most votes. If there is a tie, "
"choose the latest. This is "
<< bestCloseTime.time_since_epoch().count() << "with " << *maxVote
<< " votes. ";

// If we are a validator potentially at an impasse and our own close
// time is not the best, change our close time to match it and
// tally another vote for it.
if (adaptor_.validating() && stuck &&
consensusCloseTime != bestCloseTime)
{
consensusCloseTime = bestCloseTime;
++*maxVote;
ss << " We are a validator. Consensus has taken "
<< result_->roundTime.read().count()
<< "ms. Previous round "
"took "
<< prevRoundTime_.count()
<< "ms. Now changing our "
"close time to "
<< bestCloseTime.time_since_epoch().count()
<< " that "
"now has "
<< *maxVote << " votes.";
}
// If the close time with the most votes also meets or exceeds the
// threshold to change our position, then change our position.
// Then check if we have met or exceeded the threshold for close
// time consensus.
//
// The 2nd check has been nested within the first historically.
// It's possible that this can be optimized by doing the
// 2nd check independently--this may make consensus happen faster in
// some cases. Then again, the trade-offs have not been modelled.
if (*maxVote >= threshVote)
{
consensusCloseTime = bestCloseTime;
ss << "Close time " << bestCloseTime.time_since_epoch().count()
<< " has " << *maxVote << " votes, which is >= the threshold ("
<< threshVote
<< " to make that our position if it isn't already.";
if (*maxVote >= threshConsensus)
{
// A close time has enough votes for us to try to agree
consensusCloseTime = t;
threshVote = v;

if (threshVote >= threshConsensus)
{
haveCloseTimeConsensus_ = true;
// Make sure that the winning close time is the one
// that propagates to the rest of the function.
break;
}
haveCloseTimeConsensus_ = true;
ss << " The maximum votes also >= the threshold ("
<< threshConsensus << ") for consensus.";
}
}

if (!haveCloseTimeConsensus_)
{
JLOG(j_.debug())
<< "No CT consensus:"
<< " Proposers:" << currPeerPositions_.size()
<< " Mode:" << to_string(mode_.get())
<< " Thresh:" << threshConsensus
<< " Pos:" << consensusCloseTime.time_since_epoch().count();
ss << " No CT consensus:"
<< " Proposers:" << currPeerPositions_.size()
<< " Mode:" << to_string(mode_.get())
<< " Thresh:" << threshConsensus
<< " Pos:" << consensusCloseTime.time_since_epoch().count();
}
JLOG(j_.debug()) << ss.str();
}

if (!ourNewSet &&
Expand Down

0 comments on commit 8ce85a9

Please sign in to comment.