-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client/eth: Check provider header times. #2074
Conversation
client/asset/eth/eth.go
Outdated
@@ -295,7 +295,7 @@ type ethFetcher interface { | |||
sendSignedTransaction(ctx context.Context, tx *types.Transaction) error | |||
sendTransaction(ctx context.Context, txOpts *bind.TransactOpts, to common.Address, data []byte) (*types.Transaction, error) | |||
signData(data []byte) (sig, pubKey []byte, err error) | |||
syncProgress(context.Context) (*ethereum.SyncProgress, error) | |||
syncProgress(context.Context) (progress *ethereum.SyncProgress, bestHeaderUNIXTime uint64, err error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The providers fetch the block header anyway, so this is an optimization that cuts off one header fetch for providers.
4c28922
to
f6087f9
Compare
// Time in the header is in seconds. | ||
timeDiff := time.Now().Unix() - int64(bh.Time) | ||
if timeDiff > dexeth.MaxBlockInterval && eth.net != dex.Simnet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the simnet check and mining some blocks in the harness tests. It seems reasonable to just mine some blocks when testing. May be a preference thing so I can revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer reverting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, ok. Also passing the net to prividers bestHeader
so it can also not care.
f6087f9
to
4236355
Compare
I guess it's possible for a provider to get out of sync and still not be found to have an old header if that provider randomly doesn't call best header but later is randomly asked to do something else. Need to look at this a little bit more. |
Heck I think we'll have to be able to work without even
EDIT: the above is server! But still... |
Wait a sec, we don't use In fact, I think Similarly, I think we can add |
For server, is there an issue with this? diff --git a/server/asset/eth/eth.go b/server/asset/eth/eth.go
index e541304ef..8e11349a7 100644
--- a/server/asset/eth/eth.go
+++ b/server/asset/eth/eth.go
@@ -565,10 +565,7 @@ func (eth *baseBackend) Synced() (bool, error) {
// node.SyncProgress will return nil both before syncing has begun and
// after it has finished. In order to discern when syncing has begun,
// check that the best header came in under MaxBlockInterval.
- sp, err := eth.node.syncProgress(eth.ctx)
- if err != nil {
- return false, err
- }
+ sp, _ := eth.node.syncProgress(eth.ctx)
if sp != nil {
return false, nil
} If there's no |
4236355
to
5fc6eaf
Compare
It seems pretty useless and confusing from the very beginning. We can just do the header time everywhere. Thats what it will always come down to currently, anyway. Server only recently (this week) became useable with providers so probably some things we weren't thinking about pop up. |
5fc6eaf
to
1d78728
Compare
// succeeds or all have failed. The context used to run functions has a time | ||
// limit equal to defaultRequestTimeout for all requests to return. If | ||
// operations are expected to run longer than that the calling function should | ||
// not use the altered context. | ||
func (m *multiRPCClient) withOne(ctx context.Context, providers []*provider, f func(context.Context, *provider) error, acceptabilityFilters ...acceptabilityFilter) (superError error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I needed a ctx to check the header time... and so I thought it might be better to set the context deadline in this function to ensure all the functions that use providers have a deadline. I think it is easier to keep up with than setting them all separately. You can see a couple cases where this timeout is not used, but in the vast majority we are just running one rpc quest with these so I think this is cleaner. This isn't strictly needed to solve the issue however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove the timed contexts in currentFees
, nextNonce
, and transactionConfirmations
. They all use withOne
underneath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. The extra ctx in currentFees
and transactionConfirmations
were just because two requests were being done with the same limit. But, they probably only should take milliseconds, so probably fine. I was thinking nextNonce
was using the same context the whole time, but now that I look again it is using a new withPreferred
every time so a new timeout. I was just confused it seems.
Moved these two to the compliant providers. |
client/asset/eth/multirpc.go
Outdated
time.Since(p.tip.headerStamp) > stale || | ||
time.Now().Unix()-int64(p.tip.header.Time) > dexeth.MaxBlockInterval { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this final condition hit if the stale
one does not? 180 sec is dexeth.MaxBlockInterval
.
I suppose p.tip.headerStamp
is when we received that tip, so I guess that could be much newer. OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the impression on matrix that @buck54321 is going to do some things around here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably we get rid of the stale
check now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the stale check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, actually, with no stale check it only fetches a new header after dexeth.MaxBlockInterval
... So maybe one block interval after that should fetch a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After seeing it now, I think maybe we did need the stale
check. It's the MaxBlockInterval
check that could have been deleted, since stale
was always substantially more restrictive. But we definitely don't want to check after 13 seconds when we have a websocket connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put it back. The header can still be 180 seconds old though. Thats a few blocks.
client/asset/eth/multirpc.go
Outdated
tip, err := p.bestHeader(ctx, m.log) | ||
if err != nil { | ||
return err | ||
} | ||
bestHeaderUNIXTime = tip.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call returning this instead of doing bestHeader again in the caller.
|
||
// Non-compliant providers | ||
providerCloudflareETH = "cloudflare-eth.com" // "SuggestGasTipCap" error: Method not found | ||
providerAnkr = "ankr.com" // "SyncProgress" error: the method eth_syncing does not exist/is not available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep meaning to come back to this, but even before this diff, I had been using ankr and it was never treated as non-compliant nor did it have to run a compliance check (I searched the logs, and I don't have a compliant-providers.json anywhere on my drive). Not sure how it was workin. The host was https://rpc.ankr.com/eth_goerli
1d78728
to
589fcfe
Compare
Just rebased. |
589fcfe
to
6f3d98e
Compare
I broke the the multirpc live test by not calling But, tests are not passing. Looking into that and the |
6f3d98e
to
c701510
Compare
c701510
to
8b30026
Compare
The test failures in the multirpc live test are solved by #2091 so nothing to do here about that. |
It looks like nothing happens if a provider is on the non-compliant list atm, it only checks unknown providers. Should it just not use providers on the bad list, or maybe check them again? If anything needs to change with those, I'd like to make a new pr. Made #2102 |
I don't think there's a point in the non-compliant providers really. Why not have it check all unknown? Ankr worked for a long time for me and it was on the non-compliant providers list. Providers can change and the list can become stale... or the RPCs we use can change (no eth_syncing anymore) and the list can become stale. |
client/asset/eth/multirpc.go
Outdated
// newly fetched header is not too old. If it is too | ||
// old that indicates the provider is not in sync and | ||
// should not be used. | ||
if _, err := p.bestHeader(ctx, m.log); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in (*multiRPCClient).bestHeader
, it's going to do:
dcrdex/client/asset/eth/multirpc.go
Lines 960 to 963 in e598ecd
return hdr, m.withAny(ctx, func(ctx context.Context, p *provider) error { | |
hdr, err = p.bestHeader(ctx, m.log) | |
return err | |
}, allRPCErrorsAreFails) |
which means that through withAny
-> withOne
, we're going to get here and do (*provider).bestHeader
twice in a row?
I kinda wish something other than withOne
were in charge of flagging providers as stale. (*ETHWallet).checkForNewBlocks
is going to be calling bestHeader
regularly anyway, so it seems like either (*provider).bestHeader
or (*multiRPCClient).bestHeader
would be a good place to set some flag, which withOne
could then use in a similar manner to how it gets readyProviders
with !p.failed()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I stick the check in a goroutine? I'm afraid if you just put the check in a random call, it randomly might not get called while another random call is called and the stale provider is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While adding the goroutine, it seems like the other goroutine subscribeHeaders
will continue to run on old providers if you reconfigure, although it does not do anything, it's just waiting. Maybe should be stopped when the rpc client is closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ok? Just check every 50 seconds. https://github.com/decred/dcrdex/compare/e598ecda67878df8bc302fd6e88ba60803bfed76..f981b0e14dea6c63eee50728992d19cf321694cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While adding the goroutine, it seems like the other goroutine
subscribeHeaders
will continue to run on old providers if you reconfigure, although it does not do anything, it's just waiting. Maybe should be stopped when the rpc client is closed.
The docs for ethereum.Subscription.Err
say
// Err returns the subscription error channel. The error channel receives
// a value if there is an issue with the subscription (e.g. the network connection
// delivering the events has been closed). Only one value will ever be sent.
// The error channel is closed by Unsubscribe.
Err() <-chan error
So it seems to me that closing the *ethclient.Client
should give us something to work with in subscribeHeaders
. Maybe not checking the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, maybe I was mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to wait for it to finish during shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested some more and you are right, closing the client causes the 'ok' part of case err, ok := <-sub.Err():
to be true.
I still think the extra monitoring is sane. Can revert though.
f981b0e
to
843c7ae
Compare
Added a timeout to the periodic header fetches https://github.com/decred/dcrdex/compare/f981b0e14dea6c63eee50728992d19cf321694cd..843c7aeeeb3c860741be39f7d629ad35c08929fa |
843c7ae
to
28d85e4
Compare
Just rebased. |
28d85e4
to
b738ccd
Compare
client/asset/eth/multirpc.go
Outdated
// should not be used. | ||
innerCtx, cancel := context.WithTimeout(ctx, defaultRequestTimeout) | ||
if _, err := p.bestHeader(innerCtx, log); err != nil { | ||
log.Warnf("Problem getting best header: %s.", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could say the provider in the message otherwise its not too helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not helpful. Adding the host.
client/asset/eth/multirpc.go
Outdated
log.Tracef("handling header refreshes for %q", p.host) | ||
for { | ||
select { | ||
case <-time.After(headerCheckInterval): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ticker would avoid the goroutine churn, but if you think it's important to start the timer after bestHeader completes, that's alright I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a time limit on the context, there should be no way to overlap. Will change to a ticker.
Just worked as intended for me right now as blastio fell behind, got flagged as bad, and then caught up really fast:
But, when and were did it "unfail" the provider? |
b738ccd
to
37ea7c0
Compare
Just rebased. |
If it is working how I expect, it is not unfailed until a minute passes. Unusuable until a minute after |
Appling the |
Or wait, |
37ea7c0
to
813fc24
Compare
Adding a log for when a provider was unusable, but is not usable. https://github.com/decred/dcrdex/compare/37ea7c0e8ef4c5de4cdca9be80dbd8843177ad4e..813fc242776c1ab3338fd6f598dcfc3bc5bfbfe8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be working.
@@ -47,7 +49,8 @@ func mine(ctx context.Context) error { | |||
} | |||
|
|||
func testEndpoint(endpoints []string, syncBlocks uint64, tFunc func(context.Context, *multiRPCClient)) error { | |||
ctx, cancel := context.WithTimeout(context.Background(), time.Minute) | |||
var cancel context.CancelFunc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can delete this declaration.
func TestMain(m *testing.M) { | ||
var cancel context.CancelFunc | ||
ctx, cancel = context.WithCancel(context.Background()) | ||
c := make(chan os.Signal, 1) | ||
signal.Notify(c, os.Interrupt) | ||
go func() { | ||
select { | ||
case <-c: | ||
cancel() | ||
case <-ctx.Done(): | ||
} | ||
}() | ||
exit := func(exitCode int) { | ||
signal.Stop(c) | ||
cancel() | ||
os.Exit(exitCode) | ||
} | ||
time.Sleep(time.Second) | ||
err := mine(ctx) | ||
if err != nil { | ||
fmt.Println(err) | ||
exit(1) | ||
} | ||
time.Sleep(time.Second) | ||
exit(m.Run()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need all this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mining two blocks before tests when I took out the Simnet check for old headers, but then put the Simnet check back. So.. can revert.
Hmm. Looks like a ci error.
Seems unrelated though. |
Retriggered CI. Can merge with final iteration for nits in #2074 (review) |
813fc24
to
324705a
Compare
When fetching a new or cached header with a provider, do a basic check on the header's time to determine if the header, and so the provider, are up to date.
Removed some needless mining in tests since we have a special case for Simnet https://github.com/decred/dcrdex/compare/324705a2d1192fa5f3d491a8df2274e3bb496244..0f21cb47fbd246ded08278fb8ecba435399675ba |
324705a
to
0f21cb4
Compare
I thought that listening for ctrl-c was necessary in tests, but it appears not. The tests stop and fail on ctrl-c. Maybe I was thinking about test frameworks like the loadbot.
|
When fetching a new or cached header with a provider, do a basic check on the header's time to determine if the header, and so the provider, are up to date.
closes #2064