Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the groundwork for failover RPC support #110

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

c0gent
Copy link

@c0gent c0gent commented Jun 19, 2018

Changes

  • Remove the rpc_host and rpc_port fields from the configuration file
    format and add primary_rpc_host, primary_rpc_port, failover_rpc_host,
    and failover_rpc_port.
  • Add the RpcUrl and RpcUrlKind types.
    • RpcUrl contains a host name and a port.
    • RpcUrlKind has variants for primary and failover urls and contains a
      RpcUrl.
  • Remove rpc_host and rpc_port and add primary and failover RPC url fields
    to config::Config et. al.
  • Update app::Connections::new_http to automatically use the failover url in
    the event of an error connecting to the primary.
  • Add the home_url and foreign_url to Connections containing the
    url/kind (a RpcUrlKind) currently in use.
  • Update logging.
  • Update tests.

Closes #107 (maybe)

I don't have a very sophisticated test setup so this will definitely need more comprehensive testing in a more realistic environment.

@akolotov: If I understand your original issue correctly (please correct me if not), the main purpose of this is for the failover not simply to be available upon startup, but to be used in the event of a disconnection. I'm not familiar enough with the whole project yet to know exactly how and where that happens. This patch only makes use of the failover url on startup (creation of an App/Connection) but does nothing special in the event of a disconnect at a later point. Therefore it may not completely address your issue and should be considered only a partial implementation.

I could use your help showing me where and how disconnection is currently is handled so that I can add some more logic to that process. I'm available on voice or Zoom if that's easier. Thanks!

* Remove the `rpc_host` and `rpc_port` fields from the configuration file
  format and add `primary_rpc_host`, `primary_rpc_port`, `failover_rpc_host`,
  and `failover_rpc_port`.
* Add the `RpcUrl` and `RpcUrlKind` types.
  * `RpcUrl` contains a host name and a port.
  * `RpcUrlKind` has variants for primary and failover urls and contains a
    `RpcUrl`.
* Remove `rpc_host` and `rpc_port` and add primary and failover RPC url fields
  to `config::Config` et. al.
* Update `app::Connections::new_http` to automatically use the failover url in
  the event of an error connecting to the primary.
* Add the `home_url` and `foreign_url` to `Connections` containing the
  url/kind (a `RpcUrlKind`) currently in use.
* Update logging.
* Update tests.
@akolotov
Copy link
Contributor

Thanks! I will look at this PR tomorrow (I am in MSK timezone) and provide my feedback.

@akolotov akolotov requested review from akolotov and yrashk June 19, 2018 23:43
@yrashk
Copy link
Contributor

yrashk commented Jul 2, 2018

After some further thinking about this problem, here's my take on how it should be implemented to keep it simple and yet flexible so that even future scenarios can be accommodated.

Instead of building very specific features (such as failover RPC), we can further utilize the approach that has been used in the bridge in the past couple of months: delegating failure handling to a supervisor process. Currently, an exit code for failed RPC connection doesn't make distinction between home and foreign. However, if we track the side on which the failure has occurred and use different exit codes for different sides, the supervisor can simply restart the bridge with a different config, according to its own logic. Could be more than two failovers, for example, or a check on what's available using curl and then using the appropriate config.

This will work particularly well combined with persistent transaction queues.

TransactionWithConfirmation(self.app.connections.foreign.clone(),
self.app.config.foreign.poll_interval,
self.app.config.foreign.required_confirmations)
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these are formatting changes unrelated to the actual change. They obscure the substance of the change, making it harder to understand what's important and what's not.

api::send_transaction_with_nonce(self.app.connections.foreign.clone(),
self.app.connections.foreign_url.clone(), self.app.clone(),
self.app.config.foreign.clone(), tx, self.foreign_chain_id,
SendRawTransaction(self.app.connections.foreign.clone()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these are formatting changes unrelated to the actual change. They make sense, but they would have been a lot easier to process as a part of separate "formatting" patch, both for review and later reading.

api::send_transaction_with_nonce(self.app.connections.foreign.clone(),
self.app.connections.foreign_url.clone(), self.app.clone(),
self.app.config.foreign.clone(), tx, self.foreign_chain_id,
SendRawTransaction(self.app.connections.foreign.clone()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these are formatting changes unrelated to the actual change.

@@ -232,7 +233,8 @@ impl<T: Transport> Stream for WithdrawRelay<T> {
nonce: U256::zero(),
action: Action::Call(contract),
};
api::send_transaction_with_nonce(t.clone(), app.clone(), home.clone(), tx, chain_id, SendRawTransaction(t.clone()))
api::send_transaction_with_nonce(t.clone(), t_url.clone(), app.clone(),
home.clone(), tx, chain_id, SendRawTransaction(t.clone()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is a formatting change unrelated to the actual change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I made these formatting changes to make the code more readable (and to adhere more closely to the Rust style guidelines. I'll put them all in a separate PR.

// the transport and the url upon success.
fn connect(handle: &Handle, url_primary: &RpcUrl, url_failover: Option<&RpcUrl>,
concurrent_connections: usize) -> Result<(Http, RpcUrlKind), Web3Error> {
match Http::with_event_loop(&url_primary.to_string(), handle, concurrent_connections) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most important part of my review. My reading of https://github.com/tomusdrw/rust-web3/blob/master/src/transports/http.rs#L78 suggests that this call will never fail if primary is not available as it doesn't attempt the connection, meaning bridge will always try primary, then fail, restart and try it again to no avail.

Would love to be proven wrong, though -- if I misread any part or something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right which is why I wanted to make clear that this PR doesn't properly address the issue. It looks like the connection errors are passed up through the spawned future. I'll need to dig around and look into how those are currently handled (see below).

@c0gent
Copy link
Author

c0gent commented Jul 2, 2018

Instead of building very specific features (such as failover RPC), we can further utilize the approach that has been used in the bridge in the past couple of months: delegating failure handling to a supervisor process.

I agree, this is a superior approach in this situation. And yes, this will need some re-architecting. I'll have a closer look at this later in the week and put together some ideas.

@c0gent
Copy link
Author

c0gent commented Jul 3, 2018

It looks like I'm going to have to put this PR on the back-burner for now. I should be able to get back to it sometime in the next few weeks. Let me know if you come up with any other plans or suggestions, or if there are some simple things I can do to this PR to make it mergeable for now and deal with the larger issues in a separate PR later.

@yrashk
Copy link
Contributor

yrashk commented Jul 6, 2018

I think this PR is premature insofar as indicated by the comment in which I mentioned that it is unlikely to actually ever switch over (leaving aside the suggestion for simplifying this whole solution), so let's return to this later or if @akolotov needs it sooner, I can take care of it in the coming days.

noot pushed a commit to noot/poa-bridge that referenced this pull request Jul 18, 2018
Fixed some typos and made a section more coherent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants