Simplified SSH Forwarding Configuration #563

willdonnelly · 2022-06-24T19:24:22Z

Description:

This is an attempt to implement the mechanism I described in #513, which allows a top-level endpoint config property named address to serve as the forwardHost/forwardPort values for SSH tunneling (and replaces the address property with 127.0.0.1:$RANDOM_LOCAL_PORT before invoking the connector).

It includes some additional logic which does a similar job when asked for the endpoint config spec, and omits the forwardHost, forwardPort, and localPort properties from the SSH config portion of the resulting schema IFF the main endpoint config schema contains a top-level address property.

It should be noted that the new logic is backwards-compatible in two ways:

First, any existing captures/materializations which explicitly specify forwardHost / forwardPort / localPort should continue to operate as normal. This means that we can choose to migrate existing uses of source-postgres and other connectors whenever it's convenient.
Second, any connectors which do not have a top-level address property will continue to provide the "full" version of the SSH config spec, which includes forwardHost and friends. This means that we can choose to modify additional connectors (such as materialize-postgres) to use the specific address configuration whenever it's convenient.

This whole PR still feels really hacky because of the reliance on a specific endpoint property address, but I believe it should work well in practice and I couldn't justify using a more complicated mechanism at this time. The main issues are:

To opt in to this behavior, connectors will need to be modified. For instance materialize-postgres will need to replace its current host and port properties with address, which is itself a potentially-breaking change that needs some care (either keeping things backwards-compatible or pinning-then-upgrading the live materializations) to not break existing uses.
There could theoretically be some connectors out there with a top-level address property which does not have the expected semantics, and we would do the wrong thing with those. However I have checked and I do not believe any of our current officially supported connectors have this issue, and it's a bit unlikely anyway.

Workflow steps:

Configuration for some connectors (currently source-postgres and source-mysql) will no longer require forwardHost and related properties to be specified. Instead the primary address field should always describe the target database, and the fact that it's not directly reachable but goes through an SSH bastion host is only represented by the presence of sshEndpoint and privateKey configuration.

Documentation links affected:

Probably https://docs.estuary.dev/guides/connect-network/, and it looks like https://docs.estuary.dev/reference/Connectors/capture-connectors/PostgreSQL/#amazon-rds directly mentions forwardHost so that should be changed too.

This change is

psFried

I think this looks good.

One question I have is whether it would also make sense to apply this behavior in the presence of separate host and port properties. There certainly doesn't seem to be any reason why we couldn't, though it would add complexity to a feature that we may not be entirely certain about. WDYT?

crates/connector_proxy/src/libs/network_tunnel.rs

willdonnelly · 2022-06-27T23:52:36Z

whether it would also make sense to apply this behavior in the presence of separate host and port properties

Yeah, that's absolutely an option. There were basically two reasons I was hesitant to go that route:

I think it's good for user experience to have basic details like host:port configuration look as similar as possible between connectors. I never really wrote up my argument for why I think that a "host:port"-valued /address property is the optimal one we should standardize on, but I may as well sketch that out now so it's written down somewhere. Basically:
- Separate /host and /port properties don't work out very well, in the web UI that exists today. Observe the "Create Materialization" form for materialize-postgres, where the rendered sequence of form fields is "Host", "Password", "Port".
- Even if the field ordering issue was fixed, using separate properties for host and port makes the configuration process look marginally more complex (5 fields vs 4, in that case) for no real benefit. Like, my ideal configuration flow looks like "Address, Username, Password, done" so that extra field really matters. Concatenating host:port as a single string is intuitive to pretty much everybody who's worked much with computers and networking.
- Naming the combined field /address causes it to occur before just about any other property, in the web UI as it exists today.
It's been said that the only principled numbers in computer science are 0, 1, and infinity. Once we've added support for two different mechanisms it becomes a lot harder to argue against adding a third, whereas if there's just one it feels a lot more natural to say "okay, guess we'd better go modify the connector" to make the SSH forwarding UX work nicely.

Also to get more concrete, I believe that materialize-postgres is the only connector on our approved list which still takes host + port properties at present (there are a few more which take some form of "endpoint URI", but those are a more tricky subject to handle here), and given that's the case then I think we should absolutely make a breaking change to bring that connector into symmetry with source-{mysql,postgres} one way or another.

In high-level terms, this commit allows an endpoint's `address` property to always describe the target database regardless of whether SSH forwarding is in use. In order to make that happen, this commit introduces a new method `extend_endpoint_schema` of the `NetworkTunnel` trait, which is called prior to network tunnel startup. This gives a specific network tunnel implementation (such as SSH Forwarding) an opportunity to inspect and modify endpoint configuration. In the case of SSH forwarding, this method extracts values for `forward_host` and `forward_port` from the endpoint's `address` property, randomly selects a `local_port` between 10000-20000, and modifies the endpoint's `address` to `127.0.0.1:<localPort>`.

This commit modifies the config schema generation logic so that when the underlying endpoint config has an 'address' property the SSH forwarding config *schema* will no longer mention the trio of unnecessary `forwardHost/forwardPort/localPort` values. This conditional behavior turned out to be surprisingly tricky, and the best way I could find to implement this was to leave the relevant fields in place in the SSH Forwarding config, make them optional, and then write a small `Visitor` which could be used to postprocess the network tunnel spec and remove those three properties. By only running the "delete that stuff" visitor when the `address` property exists the generated config spec is able to vary as intended. The upshot of this is that almost nothing should change for the majority of connectors at the moment, and we can fix them up one at a time to take advantage of this UX improvement while remaining backwards-compatible with existing specs the whole time (since the actual struct fields still exist, we're just not mentioning them and have a reasonable default behavior when they're all unset).

As I understand it the `maximum` keyword in a JSON schema means an inclusive maximum (there's a separate `exclusiveMaximum`), so the maximum value of the port number fields should be 65535. This is an incredibly dumb nitpick given that I'm actively working on getting rid of these fields, but it was bothering me.

The new test runs NetworkTunnel::extend_endpoint_schema on a couple of hypothetical endpoint spec schemas and verifies the results against JSON snapshot files. There are two schema variants in the test at present: one with an 'address' property and one without, so we can observe the difference in what SSH Forwarding properties are described.

psFried · 2022-06-28T00:21:14Z

Thanks for the explanation @willdonnelly , I'm convinced 👍

mdibaiee · 2022-06-28T09:40:49Z

crates/network-tunnel/src/sshforwarding.rs

@@ -80,6 +87,13 @@ pub struct SshForwarding {
    process: Option<Child>,
 }

+fn split_host_port(hostport: String) -> Option<(String, u16)> {
+    let mut splits = hostport.as_str().splitn(2, ':');


Should this be rsplitn, since we might have username and password in the host? e.g.:

mahdi:[email protected]:8080

Currently none of the connectors for which this logic will trigger support username/password in the address field. I'd be open to adding that, but I think it would require some additional code to chop off anything up to an @ character and then reattach that to the rewritten localhost address at the end.

mdibaiee

LGTM, thank you Will! I just have a small note regarding how we split the address into host and port

willdonnelly requested review from mdibaiee and psFried June 24, 2022 19:24

willdonnelly mentioned this pull request Jun 24, 2022

Improve Error Messaging in Network Proxy #555

Closed

willdonnelly force-pushed the wgd/ssh-forwarding-config branch from f9f86cb to c4c9635 Compare June 24, 2022 21:02

psFried approved these changes Jun 27, 2022

View reviewed changes

crates/connector_proxy/src/libs/network_tunnel.rs Outdated Show resolved Hide resolved

willdonnelly added 4 commits June 27, 2022 18:57

willdonnelly force-pushed the wgd/ssh-forwarding-config branch from c4c9635 to cdf566d Compare June 27, 2022 23:57

mdibaiee reviewed Jun 28, 2022

View reviewed changes

mdibaiee approved these changes Jun 28, 2022

View reviewed changes

willdonnelly merged commit 5934611 into master Jun 29, 2022

willdonnelly deleted the wgd/ssh-forwarding-config branch June 29, 2022 17:16

oliviamiannone added the docs pending Improvements or additions to documentation noted or in progress label Jul 6, 2022

mdibaiee mentioned this pull request Jul 18, 2022

Potential UX improvement for SSH tunnel configuration #513

Open

oliviamiannone mentioned this pull request Jul 19, 2022

Docs: updated network tunnel config #580

Merged

oliviamiannone added docs complete / NA No (more) doc work related to this PR and removed docs pending Improvements or additions to documentation noted or in progress labels Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplified SSH Forwarding Configuration #563

Simplified SSH Forwarding Configuration #563

willdonnelly commented Jun 24, 2022 •

edited by jgraettinger

Loading

psFried left a comment

willdonnelly commented Jun 27, 2022

psFried commented Jun 28, 2022

mdibaiee Jun 28, 2022 •

edited

Loading

willdonnelly Jun 28, 2022 •

edited

Loading

mdibaiee left a comment

Simplified SSH Forwarding Configuration #563

Simplified SSH Forwarding Configuration #563

Conversation

willdonnelly commented Jun 24, 2022 • edited by jgraettinger Loading

psFried left a comment

Choose a reason for hiding this comment

willdonnelly commented Jun 27, 2022

psFried commented Jun 28, 2022

mdibaiee Jun 28, 2022 • edited Loading

Choose a reason for hiding this comment

willdonnelly Jun 28, 2022 • edited Loading

Choose a reason for hiding this comment

mdibaiee left a comment

Choose a reason for hiding this comment

willdonnelly commented Jun 24, 2022 •

edited by jgraettinger

Loading

mdibaiee Jun 28, 2022 •

edited

Loading

willdonnelly Jun 28, 2022 •

edited

Loading