Skip to content

Commit 9761ced

Browse files
mtrippledximinez
authored andcommitted
Optimize calculation of close time to avoid impasse and minimize gratuitous proposal changes (XRPLF#4760)
* Optimize the calculation of close time to avoid impasse and minimize gratuitous proposal changes. * git apply clang-format.patch * Review (Howard) fixes. * Review fix for impasse discovered by John. * Review fixes (comments) from John. * Scott S review fixes. Also clang-format.
1 parent 99d0872 commit 9761ced

File tree

1 file changed

+154
-43
lines changed

1 file changed

+154
-43
lines changed

src/ripple/consensus/Consensus.h

+154-43
Original file line numberDiff line numberDiff line change
@@ -1759,15 +1759,27 @@ Consensus<Adaptor>::updateOurPositions(bool const share)
17591759
else
17601760
{
17611761
int neededWeight;
1762+
// It's possible to be at a close time impasse (described below), so
1763+
// keep track of whether this round has taken a long time.
1764+
bool stuck = false;
17621765

17631766
if (convergePercent_ < parms.avMID_CONSENSUS_TIME)
1767+
{
17641768
neededWeight = parms.avINIT_CONSENSUS_PCT;
1769+
}
17651770
else if (convergePercent_ < parms.avLATE_CONSENSUS_TIME)
1771+
{
17661772
neededWeight = parms.avMID_CONSENSUS_PCT;
1773+
}
17671774
else if (convergePercent_ < parms.avSTUCK_CONSENSUS_TIME)
1775+
{
17681776
neededWeight = parms.avLATE_CONSENSUS_PCT;
1777+
}
17691778
else
1779+
{
17701780
neededWeight = parms.avSTUCK_CONSENSUS_PCT;
1781+
stuck = true;
1782+
}
17711783

17721784
int participants = currPeerPositions_.size();
17731785
if (mode_.get() == ConsensusMode::proposing)
@@ -1787,57 +1799,156 @@ Consensus<Adaptor>::updateOurPositions(bool const share)
17871799
<< " nw:" << neededWeight << " thrV:" << threshVote
17881800
<< " thrC:" << threshConsensus;
17891801

1790-
// An impasse is possible unless a validator pretends to change
1791-
// its close time vote. Imagine 5 validators. 3 have positions
1792-
// for close time t1, and 2 with t2. That's an impasse because
1793-
// 75% will never be met. However, if one of the validators voting
1794-
// for t2 switches to t1, then that will be 80% and sufficient
1795-
// to break the impasse. It's also OK for those agreeing
1796-
// with the 3 to pretend to vote for the one with 2, because
1797-
// that will never exceed the threshold of 75%, even with as
1798-
// few as 3 validators. The most it can achieve is 2/3.
1799-
for (auto& [t, v] : closeTimeVotes)
1802+
// Choose a close time and decide whether there is consensus for it.
1803+
//
1804+
// Close time is chosen based on the threshVote threshold
1805+
// calculated above. If a close time has votes equal to or greater than
1806+
// that threshold, then that is the best close time. If multiple
1807+
// close times have an equal number of votes, then choose the greatest
1808+
// of them. Ensure that our close time then matches that which meets
1809+
// the criteria. But if no close time meet the criteria, make no
1810+
// changes.
1811+
//
1812+
// This is implemented slightly differently for validators vs
1813+
// non-validators. For non-validators, it is sufficient to merely
1814+
// count the close times from all peer positions to determine
1815+
// the best. Validators, however, risk putting the network into an
1816+
// impasse unless they are able to change their own position without
1817+
// first having counted it towards the close time totals.
1818+
//
1819+
// Here's how the impasse could occur:
1820+
// Imagine 5 validators. 3 have close time t1, and 2 have t2.
1821+
// As consensus time increases, the threshVote threshold also increases.
1822+
// Once threshVote exceeds 60%, no members of either set of validators
1823+
// will change their close times.
1824+
//
1825+
// Avoiding the impasse means that validators should identify
1826+
// whether they currently have the best close time. First, choose
1827+
// the close time with the most votes. However, if multiple close times
1828+
// have the same number of votes, pick the latest of them.
1829+
// If the validator does not currently have the best close time,
1830+
// switch to it and increase the local vote tally for that better
1831+
// close time. This will result in consensus in the next iteration
1832+
// assuming that the peer messages propagate successfully.
1833+
// In this case the validators in set t1 will remain the same but
1834+
// those in t2 switch to t1.
1835+
//
1836+
// Another wrinkle, however, is that too many position changes
1837+
// from validators also has a destabilizing affect. Consensus can
1838+
// take longer if peers have to keep re-calculating positions,
1839+
// and mistakes can be made if peers settle on the wrong position.
1840+
// Therefore, avoiding an impasse should also minimize the likelihood
1841+
// of gratuitous change of position.
1842+
//
1843+
// The solution for validators is to first track whether it's
1844+
// possible that the network is at an impasse based on how much
1845+
// time this current consensus round has taken. This is reflected
1846+
// in the "stuck" boolean. When stuck, validators perform the
1847+
// above-described position change based solely on whether or not
1848+
// they agree with the best position, and ignore the threshVote
1849+
// criteria used for the earlier part of the phase.
1850+
//
1851+
// Determining whether there is close time consensus is very simple
1852+
// in comparison: if votes for the best close time meet or exceed
1853+
// threshConsensus, then we have close time consensus. Otherwise, not.
1854+
1855+
// First, find the best close time with which to agree: first criteria
1856+
// is the close time with the most votes. If a tie, the latest
1857+
// close time of those tied for the maximum number of votes.
1858+
std::multimap<int, NetClock::time_point> votesByCloseTime;
1859+
std::stringstream ss;
1860+
ss << "Close time calculation for ledger sequence "
1861+
<< static_cast<std::uint32_t>(previousLedger_.seq()) + 1
1862+
<< " Close times and vote count are as follows: ";
1863+
bool first = true;
1864+
for (auto const& [closeTime, voteCount] : closeTimeVotes)
18001865
{
1801-
if (adaptor_.validating() &&
1802-
t != asCloseTime(result_->position.closeTime()))
1803-
{
1804-
JLOG(j_.debug()) << "Others have voted for a close time "
1805-
"different than ours. Adding our vote "
1806-
"to this one in case it is necessary "
1807-
"to break an impasse.";
1808-
++v;
1809-
}
1810-
JLOG(j_.debug())
1811-
<< "CCTime: seq "
1812-
<< static_cast<std::uint32_t>(previousLedger_.seq()) + 1 << ": "
1813-
<< t.time_since_epoch().count() << " has " << v << ", "
1814-
<< threshVote << " required";
1815-
1816-
if (v >= threshVote)
1866+
if (first)
1867+
first = false;
1868+
else
1869+
ss << ',';
1870+
votesByCloseTime.insert({voteCount, closeTime});
1871+
ss << closeTime.time_since_epoch().count() << ':' << voteCount;
1872+
}
1873+
// These always gets populated because currPeerPositions_ is not
1874+
// empty to end up here, so at least 1 close time has at least 1 vote.
1875+
assert(!currPeerPositions_.empty());
1876+
std::optional<int> maxVote;
1877+
std::set<NetClock::time_point> maxCloseTimes;
1878+
// Highest vote getter is last. Track each close time that is tied
1879+
// with the highest.
1880+
for (auto rit = votesByCloseTime.crbegin();
1881+
rit != votesByCloseTime.crend();
1882+
++rit)
1883+
{
1884+
int const voteCount = rit->first;
1885+
if (!maxVote.has_value())
1886+
maxVote = voteCount;
1887+
else if (voteCount < *maxVote)
1888+
break;
1889+
maxCloseTimes.insert(rit->second);
1890+
}
1891+
// The best close time is the latest close time of those that have
1892+
// the maximum number of votes.
1893+
NetClock::time_point const bestCloseTime = *maxCloseTimes.crbegin();
1894+
ss << ". The best close time has the most votes. If there is a tie, "
1895+
"choose the latest. This is "
1896+
<< bestCloseTime.time_since_epoch().count() << "with " << *maxVote
1897+
<< " votes. ";
1898+
1899+
// If we are a validator potentially at an impasse and our own close
1900+
// time is not the best, change our close time to match it and
1901+
// tally another vote for it.
1902+
if (adaptor_.validating() && stuck &&
1903+
consensusCloseTime != bestCloseTime)
1904+
{
1905+
consensusCloseTime = bestCloseTime;
1906+
++*maxVote;
1907+
ss << " We are a validator. Consensus has taken "
1908+
<< result_->roundTime.read().count()
1909+
<< "ms. Previous round "
1910+
"took "
1911+
<< prevRoundTime_.count()
1912+
<< "ms. Now changing our "
1913+
"close time to "
1914+
<< bestCloseTime.time_since_epoch().count()
1915+
<< " that "
1916+
"now has "
1917+
<< *maxVote << " votes.";
1918+
}
1919+
// If the close time with the most votes also meets or exceeds the
1920+
// threshold to change our position, then change our position.
1921+
// Then check if we have met or exceeded the threshold for close
1922+
// time consensus.
1923+
//
1924+
// The 2nd check has been nested within the first historically.
1925+
// It's possible that this can be optimized by doing the
1926+
// 2nd check independently--this may make consensus happen faster in
1927+
// some cases. Then again, the trade-offs have not been modelled.
1928+
if (*maxVote >= threshVote)
1929+
{
1930+
consensusCloseTime = bestCloseTime;
1931+
ss << "Close time " << bestCloseTime.time_since_epoch().count()
1932+
<< " has " << *maxVote << " votes, which is >= the threshold ("
1933+
<< threshVote
1934+
<< " to make that our position if it isn't already.";
1935+
if (*maxVote >= threshConsensus)
18171936
{
1818-
// A close time has enough votes for us to try to agree
1819-
consensusCloseTime = t;
1820-
threshVote = v;
1821-
1822-
if (threshVote >= threshConsensus)
1823-
{
1824-
haveCloseTimeConsensus_ = true;
1825-
// Make sure that the winning close time is the one
1826-
// that propagates to the rest of the function.
1827-
break;
1828-
}
1937+
haveCloseTimeConsensus_ = true;
1938+
ss << " The maximum votes also >= the threshold ("
1939+
<< threshConsensus << ") for consensus.";
18291940
}
18301941
}
18311942

18321943
if (!haveCloseTimeConsensus_)
18331944
{
1834-
JLOG(j_.debug())
1835-
<< "No CT consensus:"
1836-
<< " Proposers:" << currPeerPositions_.size()
1837-
<< " Mode:" << to_string(mode_.get())
1838-
<< " Thresh:" << threshConsensus
1839-
<< " Pos:" << consensusCloseTime.time_since_epoch().count();
1945+
ss << " No CT consensus:"
1946+
<< " Proposers:" << currPeerPositions_.size()
1947+
<< " Mode:" << to_string(mode_.get())
1948+
<< " Thresh:" << threshConsensus
1949+
<< " Pos:" << consensusCloseTime.time_since_epoch().count();
18401950
}
1951+
JLOG(j_.debug()) << ss.str();
18411952
}
18421953

18431954
if (!ourNewSet &&

0 commit comments

Comments
 (0)