Skip to content

consul: Configure gRPC health check for SA#6908

Merged
beautifulentropy merged 5 commits intomainfrom
consul-grpc-health-sa
May 23, 2023
Merged

consul: Configure gRPC health check for SA#6908
beautifulentropy merged 5 commits intomainfrom
consul-grpc-health-sa

Conversation

@beautifulentropy
Copy link
Member

@beautifulentropy beautifulentropy commented May 19, 2023

Enable SA gRPC health checks in Consul ahead of further changes for #6878. Calls to the Check method of the SA's grpc.health.v1.Health service must respond SERVING before the sa service will be advertised in Consul DNS. Consul will continue to poll this service every 5 seconds.

  • Add bconsul docker service to boulder bluenet and rednet
  • Add TLS credentials for consul.boulder:
    $ openssl x509 -in consul.boulder/cert.pem -text | grep DNS
                  DNS:consul.boulder
  • Update test/grpc-creds/generate.sh to add consul.boulder
  • Update test SA configs to allow consul.boulder to access to grpc.health.v1.Health

Part of #6878

@beautifulentropy beautifulentropy marked this pull request as ready for review May 19, 2023 18:59
@beautifulentropy beautifulentropy requested a review from a team as a code owner May 19, 2023 18:59
@beautifulentropy beautifulentropy requested a review from pgporada May 19, 2023 18:59
@beautifulentropy
Copy link
Member Author

beautifulentropy commented May 19, 2023

You can watch the check fail in the logs after Consul comes up and until SA actually starts (Synced == Passing):

2023-05-19 15:40:52 2023-05-19T19:40:52.988Z [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {10.77.77.77:9095 sa.boulder <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.77.77.77:9095: connect: connection refused". Reconnecting...
2023-05-19 15:40:52 2023-05-19T19:40:52.988Z [WARN]  agent: Check is now critical: check=sa-a-grpc
2023-05-19 15:40:54 2023-05-19T19:40:54.964Z [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {10.88.88.88:9095 sa.boulder <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.88.88.88:9095: connect: connection refused". Reconnecting...
2023-05-19 15:40:54 2023-05-19T19:40:54.964Z [WARN]  agent: Check is now critical: check=sa-b-grpc
2023-05-19 15:40:57 2023-05-19T19:40:57.988Z [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {10.77.77.77:9095 sa.boulder <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.77.77.77:9095: connect: connection refused". Reconnecting...
2023-05-19 15:40:57 2023-05-19T19:40:57.988Z [WARN]  agent: Check is now critical: check=sa-a-grpc
2023-05-19 15:40:59 2023-05-19T19:40:59.965Z [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {10.88.88.88:9095 sa.boulder <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.88.88.88:9095: connect: connection refused". Reconnecting...
2023-05-19 15:40:59 2023-05-19T19:40:59.965Z [WARN]  agent: Check is now critical: check=sa-b-grpc
2023-05-19 15:41:02 2023-05-19T19:41:02.998Z [INFO]  agent: Synced check: check=sa-a-grpc
2023-05-19 15:41:04 2023-05-19T19:41:04.983Z [INFO]  agent: Synced check: check=sa-b-grpc

Additionally, in the Consul UI you can see that sa-a and sa-b nodes of the sa service are failing until the SA starts and then you'll get the following check output:

gRPC check 10.77.77.77:9095: success
gRPC check 10.88.88.88:9095: success

Which originates from the gRPC health prober:

https://github.com/hashicorp/consul/blob/d20e3df63c0a2dbb13b67e0d7b4023f8c8b91da5/agent/checks/check.go#L1040

@beautifulentropy beautifulentropy marked this pull request as draft May 19, 2023 21:20
@beautifulentropy beautifulentropy marked this pull request as ready for review May 19, 2023 22:52
@pgporada
Copy link
Member

I filed IN-9146 for SRE.

@pgporada
Copy link
Member

Screenshot from 2023-05-22 15-30-44

Copy link
Contributor

@aarongable aarongable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one question

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants