eth/downloader: refactor downloader queue by holiman · Pull Request #20236 · ethereum/go-ethereum

holiman · 2019-11-04T10:22:38Z

This PR contains a massive refactoring in the downloader + queue area. It's not quite ready to be merged yet, I'd like to see how the tests perform.

Todo: add some more unit-tests regarding the resultstore implementation, and the queue.

Throttling

Previously, we had a doneQueue which was a map where we kept track of all downloaded items (receipts, block bodies). This map was updated when deliveries came in, and cleaned when results were pulled from the resultCache. It was quite finicky, and modifications to how the download functioned was dangerous: if these were not kept in check, it was possible that the doneQueue would blow up.

It was also quite resource intensive, where a lot of counting and cross-checking was going on between the various pools and queues.

This has now been reworked, so that

the resultCache maintains (like previously) a slice of *fetchResults, with a length of blockCacheLimit * 2.
the resultCache also knows that it should only consider the first 75% of available slots to be up for filling. Thus, when a reserve request comes in (we want do give a task to a peer), the resultCache checks if the proposed download-task is in that priority segment. Otherwise, it flags for throttling.
Once the results are fetched for processing, and removed from the internal slice, the priority segment moves organically, and new data become eligible for fetching.

This means I could drop all donePool thingies, which simplified things a bit.

Concurrency

Previously, the queue maintained one lock to rule them all. Now, the resultCache has it's own lock, and can handle concurrency internally. This means that body and receipt fetch/delivery can happen simultaneously, and also that verification (sha:ing) of the bodies/receipts doesn't block other threads waiting for the lock.

Previously, I think it was kind of racy when setting the Pending on the fetchResult. This has been fixed.

Tests

The downloader tests failed quite often; when receipts are added in the backend, the headers (ownHeaders) were deleted and moved into ancientHeaders. If this happened quickly enough, the next batch of headers errored with unknown parent. This has been fixed so the backend also queries ancientHeaders for header existence.

Minor changes

Set the incoming response time earlier in the flow, so it doesn't have to wait for obtaining locks before setting it. Should make the rtt measurements a bit more closer to the thuth.
the idle check used some bubble sort algo, replaced
the fetcher did a lot of useless work, iterating in the block filter (every block) and calculating hashes over and over again. This was simplified

holiman · 2019-11-04T14:11:40Z

This is now doing a fast-sync on the benchmarkers: https://geth-bench.ethdevops.io/d/Jpk-Be5Wk/dual-geth?orgId=1&var-exp=mon08&var-master=mon09&var-percentile=50&from=1572876506497&to=now

holiman · 2019-11-13T14:10:17Z

Finally got greenlighted by travis. Will do one more fastsync-benchmark and post results

holiman · 2019-11-14T17:31:22Z

Fast-sync done (https://geth-bench.ethdevops.io/d/Jpk-Be5Wk/dual-geth?orgId=1&from=1573721066023&to=1573752540000&var-exp=mon06&var-master=mon07&var-percentile=50) , some graphs (this PR in yellow)

Also, totally unrelated, it's interesting to see that there's a 10x write amplification on leveldb (750Gb written, 75G stored), and a perfect 1x on ancients:

holiman · 2019-12-02T12:29:38Z

+	throttleThreshold := uint64((common.StorageSize(blockCacheMemory) + q.resultSize - 1) / q.resultSize)
+	q.resultCache.SetThrottleThreshold(throttleThreshold)
+	// log some info at certain times
+	if time.Now().Second()&0xa == 0 {


fjl · 2019-12-02T12:29:41Z

-			delete(q.receiptDonePool, hash)
+		closed = q.closed
+		q.lock.Unlock()
+		results = q.resultCache.GetCompleted(maxResultsProcess)


I think the condition variable should be in resultStore. This means closed needs to move into the resultStore as well.

Well, that totally makes sense, but it means an even larger refactor. Then closed would have to move in there, and the things that calls Signal need to somehow trigger that via the resultStore.
Let's leave that for a future refactor (I'd be happy to continue iterating on the downloader)

karalabe · 2020-01-24T09:28:12Z

 }

+// EmptyBody returns true if there is no additional 'body' to complete the header
+// that is: no transactions and no uncles


karalabe · 2020-01-24T09:28:17Z

+	return h.TxHash == EmptyRootHash && h.UncleHash == EmptyUncleHash
+}
+
+// EmptyReceipts returns true if there are no receipts for this header/block


karalabe · 2020-01-24T09:33:26Z

 			headers := packet.(*headerPack).headers
 			if len(headers) != 1 {
-				p.log.Debug("Multiple headers for single request", "headers", len(headers))
+				p.log.Info("Multiple headers for single request", "headers", len(headers))


I think we should possible raise this to Warn

karalabe · 2020-01-24T09:33:29Z

 				headers := packer.(*headerPack).headers
 				if len(headers) != 1 {
-					p.log.Debug("Multiple headers for single request", "headers", len(headers))
+					p.log.Info("Multiple headers for single request", "headers", len(headers))


I think we should possible raise this to Warn

karalabe · 2020-01-24T09:33:34Z

 				header := d.lightchain.GetHeaderByHash(h) // Independent of sync mode, header surely exists
 				if header.Number.Uint64() != check {
-					p.log.Debug("Received non requested header", "number", header.Number, "hash", header.Hash(), "request", check)
+					p.log.Info("Received non requested header", "number", header.Number, "hash", header.Hash(), "request", check)


I think we should possible raise this to Warn

karalabe · 2020-01-24T09:36:29Z

 							delay = n
 						}
 						headers = headers[:n-delay]
+						ignoredHeaders = delay


Probably simpler if you replace delay altogether with ignoredHeader instead of defining a new delay instance and then just assigning it at the end.

…tandalone resultcache

… in state requests

… mechanism

…pped

holiman · 2020-04-07T13:15:19Z

Rebased

holiman · 2020-06-26T09:06:04Z

Closing in favour of #21263

holiman requested review from karalabe and rjl493456442 as code owners November 4, 2019 10:22

holiman commented Nov 4, 2019

View reviewed changes

Comment thread eth/downloader/downloader_test.go Outdated

holiman force-pushed the refactor_downloader branch 4 times, most recently from 7d58a97 to eed7fe6 Compare November 13, 2019 11:53

fjl changed the title ~~Refactor downloader~~ eth/downloader: refactor downloader queue Nov 14, 2019

holiman commented Dec 2, 2019

View reviewed changes

Comment thread eth/downloader/resultcache.go Outdated

holiman commented Dec 2, 2019

View reviewed changes

Comment thread eth/downloader/resultcache.go

holiman commented Dec 2, 2019

View reviewed changes

Comment thread eth/downloader/queue.go Outdated

holiman commented Dec 2, 2019

View reviewed changes

fjl reviewed Dec 2, 2019

View reviewed changes

holiman force-pushed the refactor_downloader branch 3 times, most recently from cfd3f18 to a0f30ba Compare December 4, 2019 08:48

adamschmideg added the status:triage label Dec 17, 2019

karalabe self-assigned this Jan 7, 2020

adamschmideg removed the status:triage label Jan 7, 2020

holiman mentioned this pull request Jan 14, 2020

Suggest #20552

Closed

karalabe reviewed Jan 24, 2020

View reviewed changes

adamschmideg added this to the 1.9.14 milestone Apr 7, 2020

adamschmideg removed the status:triage label Apr 7, 2020

holiman added 18 commits April 7, 2020 15:03

eth/downloader: improved locking

4e5b380

downloader, fetcher: throttle-metrics, fetcher filter improvements, s…

1fe952e

…tandalone resultcache

downloader: more accurate deliverytime calculation, less mem overhead…

e121488

… in state requests

downloader/queue: increase underlying buffer of results, new throttle…

40114c9

… mechanism

eth/downloader: updates to tests

43946c6

eth/downloader: fix up some review concerns

6804284

eth/downloader/queue: minor fixes

78c5d8d

eth/downloader: minor fixes after review call

1d5b586

eth/downloader: testcases for queue.go

88ebb3c

eth/downloader: minor change, don't set progress unless progress...

e3d4a09

eth/downloader: fix flaw which prevented useless peers from being dro…

525f499

…pped

eth/downloader: try to fix tests

c290f11

eth/downloader: verify non-deliveries against advertised remote head

dd188af

eth/downloader: fix flaw with checking closed-status causing hang

554269e

eth/downloader: hashing avoidance

757ce48

eth/downloader: review concerns + simplify resultcache and queue

86a3716

eth/downloader: add back some locks, address review concerns

2b5da62

downloader/queue: fix remaining lock flaw

060e2c0

holiman force-pushed the refactor_downloader branch from 99d1503 to 060e2c0 Compare April 7, 2020 13:15

karalabe modified the milestones: 1.9.14, 1.9.15 May 13, 2020

adamschmideg added the status:triage label May 26, 2020

karalabe modified the milestones: 1.9.15, 1.9.16 Jun 8, 2020

holiman mentioned this pull request Jun 26, 2020

eth/downloader: refactor downloader + queue #21263

Merged

holiman closed this Jun 26, 2020

Conversation

holiman commented Nov 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Throttling

Concurrency

Tests

Minor changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

holiman commented Nov 4, 2019

Uh oh!

holiman commented Nov 13, 2019

Uh oh!

holiman commented Nov 14, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

holiman commented Apr 7, 2020

Uh oh!

holiman commented Jun 26, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

holiman commented Nov 4, 2019 •

edited

Loading