add ignore_rpc_timeout option to allow suppressing rpc timeout errors#5137
add ignore_rpc_timeout option to allow suppressing rpc timeout errors#5137gluckzhang wants to merge 1 commit intoanza-xyz:masterfrom
Conversation
|
Hello again, as there are changes merged to the |
Signed-off-by: Long Zhang <gluckzhang@gmail.com>
|
Hi @gluckzhang. I'm going to rejecting this change as I don't think ignoring timeouts is a solution to the flaky endpoint problem. As mentioned by Jon, #4748 is meant to add redundancy which reduces false positives while maintaining coverage and allowing for detection of bad RPC endpoints. Additionally, there is the I would be much more willing to consider changes to |
|
Hi @mircea-c, thanks for the feedback. Fully understand it and agree with the rejection. Cheers :) |
Problem
Currently we run the watchtower to monitor validators on both mainnet and testnet. Though we have configured the instance to have a higher unhealthy threshold, ignore bad gateway errors, bear a longer connection time, and check the status less frequently (e.g.,
--unhealthy-threshold 2 --ignore-http-bad-gateway --rpc-timeout 60 --interval 65), we still receive a lot ofoperation timed outalerts. Such errors are more related to the availability of RPC endpoints and for now, we would like to suppress such errors.Summary of Changes
This PR adds a new optional cli option
--ignore-rpc-timeoutto allow users to suppress rpc timeout errors. The default value of--ignore-rpc-timeoutisfalseso merging this PR does not change the default behavior of watchtower. It is up to users to decide whether they would like to ignore rpc timeouts.