Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli/server: Use AdvertiseAddr instead of Addr where appropriate #17145

Closed
wants to merge 2 commits into from

Conversation

a-robinson
Copy link
Contributor

cli: Fix internal client connections to use AdvertiseAddr, not Addr

As far as I can tell, this wasn't actually breaking anything, since
getClientGRPCConn, the only caller of addrWithDefaultHost, isn't called
from within the start command, which is the only command where
AdvertiseAddr can differ from Addr.

Fixes #17143

server: Use AdvertiseAddr instead of Addr HTTP<->gRPC connections

Fixes #10374

As far as I can tell, this wasn't actually breaking anything, since
getClientGRPCConn, the only caller of addrWithDefaultHost, isn't called
from within the start command, which is the only command where
AdvertiseAddr can differ from Addr.

Fixes cockroachdb#17143
@a-robinson a-robinson requested a review from tamird July 20, 2017 17:25
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@@ -910,7 +910,7 @@ func (s *Server) Start(ctx context.Context) error {
s.stopper.AddCloser(stop.CloserFn(gwCancel))

// Setup HTTP<->gRPC handlers.
conn, err := s.rpcContext.GRPCDial(s.cfg.Addr)
conn, err := s.rpcContext.GRPCDial(s.cfg.AdvertiseAddr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure that's right? Depending on Addr vs AdvertiseAddr and the network topology, this may force local connections to go through an external interface, even a firewall.

@tamird
Copy link
Contributor

tamird commented Jul 20, 2017

Reviewed 1 of 1 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/cli/start.go, line 626 at r1 (raw file):

}

func addrWithDefaultHost(addr string) (string, error) {

this function has just one caller - can we inline it? I don't think it does anything other than obscure things.


pkg/server/server.go, line 913 at r2 (raw file):

Previously, mberhault (marc) wrote…

are you sure that's right? Depending on Addr vs AdvertiseAddr and the network topology, this may force local connections to go through an external interface, even a firewall.

+1

Continues to be a bummer that we can't directly forward to the local server.


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/cli/start.go, line 626 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this function has just one caller - can we inline it? I don't think it does anything other than obscure things.

It seems like a reasonably well-contained function to me. Are you fine with leaving it?


pkg/server/server.go, line 913 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

+1

Continues to be a bummer that we can't directly forward to the local server.

Depending on the network configuration, it certainly could make local connections less efficient, but I can't think of a case in which it flat out wouldn't work. Can you? Because a node's AdvertiseAddr will always have to be included in that node's cert, and it will always have to be routable as well.

Always using the hostname can fail if the hostname isn't routable (as with the hostname on same Macs) or wasn't included in the cert (as can easily be the case in container-based deployments if the admin isn't careful about setting each container's hostname).


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/server/server.go, line 913 at r2 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Depending on the network configuration, it certainly could make local connections less efficient, but I can't think of a case in which it flat out wouldn't work. Can you? Because a node's AdvertiseAddr will always have to be included in that node's cert, and it will always have to be routable as well.

Always using the hostname can fail if the hostname isn't routable (as with the hostname on same Macs) or wasn't included in the cert (as can easily be the case in container-based deployments if the admin isn't careful about setting each container's hostname).

Of course, falling back from Addr to AdvertiseAddr upon error would presumably be best, but in my testing this never returned an error, so we'd have to rework what we're doing here to fail fast and detect the problem.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Jul 22, 2017

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/cli/start.go, line 626 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

It seems like a reasonably well-contained function to me. Are you fine with leaving it?

I guess; it seems to me that it does more to obscure than to abstract.


pkg/server/server.go, line 913 at r2 (raw file):

Always using the hostname can fail if the hostname isn't routable (as with the hostname on same Macs) or wasn't included in the cert (as can easily be the case in container-based deployments if the admin isn't careful about setting each container's hostname).

Isn't this what #16177 was about?


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/server/server.go, line 913 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Always using the hostname can fail if the hostname isn't routable (as with the hostname on same Macs) or wasn't included in the cert (as can easily be the case in container-based deployments if the admin isn't careful about setting each container's hostname).

Isn't this what #16177 was about?

Yes, that ensures the hostname is always routable, but not that it's always included in the cert. And always including it in the cert isn't practical in all environments.

Perhaps we could fall back to insecure connections when node is connecting to itself, although if we're investing the effort it'd probably be better placed in trying to forward directly to the local server. Do you have a pointer to discussion on why that wasn't feasible?


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Jul 26, 2017

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/server/server.go, line 913 at r2 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Yes, that ensures the hostname is always routable, but not that it's always included in the cert. And always including it in the cert isn't practical in all environments.

Perhaps we could fall back to insecure connections when node is connecting to itself, although if we're investing the effort it'd probably be better placed in trying to forward directly to the local server. Do you have a pointer to discussion on why that wasn't feasible?

There's no discussion - there's just no API to do it in grpc-gateway; see e.g. https://godoc.org/github.com/cockroachdb/cockroach/pkg/server/serverpb#RegisterAdminHandler which wants a grpc.Conn, which is a concrete type, not an interface.


Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented Aug 9, 2017

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/server/server.go, line 913 at r2 (raw file):

Continues to be a bummer that we can't directly forward to the local server.

Can't we use ln.Addr() for this? (If ln.Addr() comes back as something like 0.0.0.0:26257, we can split the host/port and attach the port to 127.0.0.1 or [::1])


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Aug 9, 2017

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/server/server.go, line 913 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Continues to be a bummer that we can't directly forward to the local server.

Can't we use ln.Addr() for this? (If ln.Addr() comes back as something like 0.0.0.0:26257, we can split the host/port and attach the port to 127.0.0.1 or [::1])

I'm pretty sure ln.Addr never comes back as having host 0.0.0.0, and yeah, I think that should work.


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/server/server.go, line 913 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I'm pretty sure ln.Addr never comes back as having host 0.0.0.0, and yeah, I think that should work.

I'm not sure I understand. Isn't that mostly what we're already doing? https://github.com/cockroachdb/cockroach/blob/master/pkg/server/server.go#L583

It's fine if we don't want to do this due to the extra networking complexity involved. It would certainly be better to only fall back to s.cfg.AdvertiseAddr if we know that s.cfg.Addr isn't a valid name listed in the cert.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/server/server.go, line 913 at r2 (raw file):

I'm pretty sure ln.Addr never comes back as having host 0.0.0.0, and yeah, I think that should work.

I did a quick test and ln.Addr() will return either 0.0.0.0 or its ipv6 equivalent when listening on all interfaces.

I'm not sure I understand. Isn't that mostly what we're already doing?

officialAddr attaches our hostname to our requested port, to produce our best guess at an address that will resolve from other nodes. What I'm proposing is to do something similar, but using 127.0.0.1 (or ::1 if the listener is ipv6?) instead of the hostname. This address will be guaranteed to work from the local node, so it can make the admin UI work on single-node clusters even if the a multi-node cluster on the same machine would require explicit --host flags.


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/server/server.go, line 913 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'm pretty sure ln.Addr never comes back as having host 0.0.0.0, and yeah, I think that should work.

I did a quick test and ln.Addr() will return either 0.0.0.0 or its ipv6 equivalent when listening on all interfaces.

I'm not sure I understand. Isn't that mostly what we're already doing?

officialAddr attaches our hostname to our requested port, to produce our best guess at an address that will resolve from other nodes. What I'm proposing is to do something similar, but using 127.0.0.1 (or ::1 if the listener is ipv6?) instead of the hostname. This address will be guaranteed to work from the local node, so it can make the admin UI work on single-node clusters even if the a multi-node cluster on the same machine would require explicit --host flags.

It will be guaranteed to route, but won't work in the case where the cluster is secured and the node's certificate doesn't include 127.0.0.1 or ::1 or localhost or whatever we want to use, which was the original motivation for the second commit in this PR.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/server/server.go, line 913 at r2 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

It will be guaranteed to route, but won't work in the case where the cluster is secured and the node's certificate doesn't include 127.0.0.1 or ::1 or localhost or whatever we want to use, which was the original motivation for the second commit in this PR.

It should be safe to disable certificate hostname verification in this case.


Comments from Reviewable

@couchand
Copy link
Contributor

@bdarnell, can you confirm how your proposed change would affect multiple-node clusters? I'm not clear based on your comment.

It seems like we have two goals here: not having to go out to network hardware to support requests from the admin UI, and making sure that admin UI requests can always be served, regardless of the particular host/advertise-host/certificate hosts.

@bdarnell
Copy link
Contributor

My proposal should work for multi-node clusters. If no --host is given, ln.Addr() will return 0.0.0.0:$PORT or [::]:$PORT. We would parse the result of ln.Addr() and replace 0.0.0.0 with 127.0.0.1 and [::] with [::1]. This will connect via the loopback interface, which we know must be included in the wildcard bind address. If a --host is given, ln.Addr() will return that host's IP, and since we bound to it we should be able to connect to it (not on the loopback interface, but still locally). I believe this proposal solves both our problems.

@dianasaur323
Copy link
Contributor

I think this issue may also be related? #19233 (comment)

@a-robinson
Copy link
Contributor Author

@couchand you're more than welcome to take this over, I don't plan to touch it soon.

@tamird
Copy link
Contributor

tamird commented Oct 19, 2017

I just merged #17490 which includes grpc-ecosystem/grpc-gateway#454 - that change allows constructing the grpc gateway entirely without network connections; I believe that can solve this problem. Unfortunately, I'm not sure how to make the various grpc internals such as metadata work, but I'm sure it's doable.

@tamird
Copy link
Contributor

tamird commented Oct 22, 2017

Actually, I think we can do this using net.Pipe. I'm going to see if it's workable and send an alternate PR -- not sure why we didn't consider this earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants