-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplified SSH Forwarding Configuration #563
Conversation
f9f86cb
to
c4c9635
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good.
One question I have is whether it would also make sense to apply this behavior in the presence of separate host
and port
properties. There certainly doesn't seem to be any reason why we couldn't, though it would add complexity to a feature that we may not be entirely certain about. WDYT?
Yeah, that's absolutely an option. There were basically two reasons I was hesitant to go that route:
Also to get more concrete, I believe that |
In high-level terms, this commit allows an endpoint's `address` property to always describe the target database regardless of whether SSH forwarding is in use. In order to make that happen, this commit introduces a new method `extend_endpoint_schema` of the `NetworkTunnel` trait, which is called prior to network tunnel startup. This gives a specific network tunnel implementation (such as SSH Forwarding) an opportunity to inspect and modify endpoint configuration. In the case of SSH forwarding, this method extracts values for `forward_host` and `forward_port` from the endpoint's `address` property, randomly selects a `local_port` between 10000-20000, and modifies the endpoint's `address` to `127.0.0.1:<localPort>`.
This commit modifies the config schema generation logic so that when the underlying endpoint config has an 'address' property the SSH forwarding config *schema* will no longer mention the trio of unnecessary `forwardHost/forwardPort/localPort` values. This conditional behavior turned out to be surprisingly tricky, and the best way I could find to implement this was to leave the relevant fields in place in the SSH Forwarding config, make them optional, and then write a small `Visitor` which could be used to postprocess the network tunnel spec and remove those three properties. By only running the "delete that stuff" visitor when the `address` property exists the generated config spec is able to vary as intended. The upshot of this is that almost nothing should change for the majority of connectors at the moment, and we can fix them up one at a time to take advantage of this UX improvement while remaining backwards-compatible with existing specs the whole time (since the actual struct fields still exist, we're just not mentioning them and have a reasonable default behavior when they're all unset).
As I understand it the `maximum` keyword in a JSON schema means an inclusive maximum (there's a separate `exclusiveMaximum`), so the maximum value of the port number fields should be 65535. This is an incredibly dumb nitpick given that I'm actively working on getting rid of these fields, but it was bothering me.
The new test runs NetworkTunnel::extend_endpoint_schema on a couple of hypothetical endpoint spec schemas and verifies the results against JSON snapshot files. There are two schema variants in the test at present: one with an 'address' property and one without, so we can observe the difference in what SSH Forwarding properties are described.
c4c9635
to
cdf566d
Compare
Thanks for the explanation @willdonnelly , I'm convinced 👍 |
@@ -80,6 +87,13 @@ pub struct SshForwarding { | |||
process: Option<Child>, | |||
} | |||
|
|||
fn split_host_port(hostport: String) -> Option<(String, u16)> { | |||
let mut splits = hostport.as_str().splitn(2, ':'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be rsplitn
, since we might have username and password in the host? e.g.:
mahdi:[email protected]:8080
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently none of the connectors for which this logic will trigger support username/password in the address
field. I'd be open to adding that, but I think it would require some additional code to chop off anything up to an @
character and then reattach that to the rewritten localhost address at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you Will! I just have a small note regarding how we split the address into host and port
Description:
This is an attempt to implement the mechanism I described in #513, which allows a top-level endpoint config property named
address
to serve as theforwardHost/forwardPort
values for SSH tunneling (and replaces theaddress
property with127.0.0.1:$RANDOM_LOCAL_PORT
before invoking the connector).It includes some additional logic which does a similar job when asked for the endpoint config spec, and omits the
forwardHost
,forwardPort
, andlocalPort
properties from the SSH config portion of the resulting schema IFF the main endpoint config schema contains a top-leveladdress
property.It should be noted that the new logic is backwards-compatible in two ways:
forwardHost
/forwardPort
/localPort
should continue to operate as normal. This means that we can choose to migrate existing uses ofsource-postgres
and other connectors whenever it's convenient.address
property will continue to provide the "full" version of the SSH config spec, which includesforwardHost
and friends. This means that we can choose to modify additional connectors (such asmaterialize-postgres
) to use the specificaddress
configuration whenever it's convenient.This whole PR still feels really hacky because of the reliance on a specific endpoint property
address
, but I believe it should work well in practice and I couldn't justify using a more complicated mechanism at this time. The main issues are:materialize-postgres
will need to replace its currenthost
andport
properties withaddress
, which is itself a potentially-breaking change that needs some care (either keeping things backwards-compatible or pinning-then-upgrading the live materializations) to not break existing uses.address
property which does not have the expected semantics, and we would do the wrong thing with those. However I have checked and I do not believe any of our current officially supported connectors have this issue, and it's a bit unlikely anyway.Workflow steps:
Configuration for some connectors (currently
source-postgres
andsource-mysql
) will no longer requireforwardHost
and related properties to be specified. Instead the primaryaddress
field should always describe the target database, and the fact that it's not directly reachable but goes through an SSH bastion host is only represented by the presence ofsshEndpoint
andprivateKey
configuration.Documentation links affected:
Probably https://docs.estuary.dev/guides/connect-network/, and it looks like https://docs.estuary.dev/reference/Connectors/capture-connectors/PostgreSQL/#amazon-rds directly mentions
forwardHost
so that should be changed too.This change is