Skip to content

Manual Backport of Fix resolution of service resolvers with subsets for external upstreams into release/1.14.x#16560

Merged
andrewstucki merged 4 commits intorelease/1.14.xfrom
release-1.14.x-backport-resolver-fix-2
Mar 7, 2023
Merged

Manual Backport of Fix resolution of service resolvers with subsets for external upstreams into release/1.14.x#16560
andrewstucki merged 4 commits intorelease/1.14.xfrom
release-1.14.x-backport-resolver-fix-2

Conversation

@andrewstucki
Copy link
Contributor

Backport

Manual backport from #16499 to release/1.14.x.

The below text is copied from the body of the original PR.


Description

While fixing #16498 I noticed that applying a ServiceResolver with subsets wasn't functional when referencing an external service proxied through a TerminatingGateway as an upstream. I'm not too familiar with the way we return service health for external services, but the problem appears to be that in our health check materializer we:

  1. Grab the CheckServiceNode values from our subscription, and then
  2. Apply any filters that were in our initial subscription request

Since we return the gateway associated with the service when we're using external services in conjuction with a TerminatingGateway:

// Look up gateway nodes associated with the service
// TODO(peering): we'll have to do something here
gwIdx, nodes, err := serviceGatewayNodes(tx, ws, serviceName, structs.ServiceKindTerminatingGateway, entMeta, structs.DefaultPeerKeyword)
if err != nil {
return 0, nil, fmt.Errorf("failed gateway nodes lookup: %v", err)
}
idx = lib.MaxUint64(idx, gwIdx)
for i := 0; i < len(nodes); i++ {
results = append(results, nodes[i])
name := structs.NewServiceName(nodes[i].ServiceName, &nodes[i].EnterpriseMeta)
serviceNames[name] = struct{}{}
}

The filter never passes and we are never able to resolve the upstream endpoint properly.

I'm not entirely sure whether this is the only change needed, but from what I could tell all of the health checks initiated via proxycfg go through this code path since they leverage either the gRPC endpoints or a direct subscription to the in-memory store.

Testing & Reproduction steps

Create a set of external services that has a ServiceResolver with subsets as in #16498 and a local service that leverages those services as an upstream. Hit the local proxy's admin cluster listing endpoint.

Without the fix (no ip address ever associates with the endpoint):

~ curl -s localhost:9092/clusters | grep v1.external | sort | head -n 1
v1.external.default.dc1.internal.cba29ba8-8796-2c26-cacd-0ee5dee70b82.consul::added_via_api::true

With the fix (contains terminating gateway ip for its endpoint):

~ curl -s localhost:9092/clusters | grep v1.external | sort | head -n 1
v1.external.default.dc1.internal.ea87fe29-2a6c-bd80-e248-5ebdbfed0a7a.consul::127.0.0.1:8443::canary::false

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern

Overview of commits

Copy link
Member

@nathancoleman nathancoleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Diff is slightly different due to missing helper functions in the base branch

@andrewstucki andrewstucki merged commit f38cefb into release/1.14.x Mar 7, 2023
@andrewstucki andrewstucki deleted the release-1.14.x-backport-resolver-fix-2 branch March 7, 2023 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants