Skip to content

Manual Backport of Fix issue where terminating gateway service resolvers weren't properly cleaned up into release/1.14.x#16558

Merged
andrewstucki merged 4 commits intorelease/1.14.xfrom
release-1.14.x-backport-resolver-fix-1
Mar 7, 2023
Merged

Manual Backport of Fix issue where terminating gateway service resolvers weren't properly cleaned up into release/1.14.x#16558
andrewstucki merged 4 commits intorelease/1.14.xfrom
release-1.14.x-backport-resolver-fix-1

Conversation

@andrewstucki
Copy link
Contributor

Backport

Manual backport from #16498 to release/1.14.x.

The below text is copied from the body of the original PR.


Description

This fixes an issue where terminating gateways weren't properly cleaning up service resolvers attached to their external services. When a resolver was deleted, the casting guard was keeping around the old ServiceResolver value due to a failed nil cast.

However, there are some questions that I do have about whether or not ServiceResolver subsets are ever supposed to work with an external service accessed through a terminating gateway, as they currently do not -- when a local proxy attempts to resolve the endpoint for an upstream that is an external service that has a ServiceResolver with subsets it winds up attempting to watch the subset upstream directly which is not returning any address (verses without a subset, returning the terminating gateway address). Either way though, that would be a different bug.

Testing & Reproduction steps

Run the following bash script:

#!/bin/bash

cleanup() {
  echo "shutting down upstreams"
}

trap 'trap " " SIGTERM; kill 0; wait; cleanup' SIGINT SIGTERM

cat << EOF | ./consul config write -
Kind      = "proxy-defaults"
Name      = "global"
Config {
  protocol = "http"
}
EOF

echo "Writing terminating gateway config entry"
cat << EOF | ./consul config write -
Kind = "terminating-gateway"
Name = "gateway"

Services = [
  {
    Name = "external"
  }
]
EOF

cat << EOF > /tmp/external.json
{
  "Node": "hashicorp",
  "Address": "127.0.0.1",
  "NodeMeta": {
    "external-node": "true",
    "external-probe": "true"
  },
  "Service": {
    "ID": "external-v1",
    "Service": "external",
    "Port": 9877,
    "Meta": {
      "version": "v1"
    }
  }
}
EOF
curl --request PUT --data @/tmp/external.json localhost:8500/v1/catalog/register

cat << EOF > /tmp/external.json
{
  "Node": "hashicorp",
  "Address": "127.0.0.1",
  "NodeMeta": {
    "external-node": "true",
    "external-probe": "true"
  },
  "Service": {
    "ID": "external-v2",
    "Service": "external",
    "Port": 9878,
    "Meta": {
      "version": "v2"
    }
  }
}
EOF
curl --request PUT --data @/tmp/external.json localhost:8500/v1/catalog/register

echo "Writing resolver config entry"
cat << EOF | ./consul config write -
Kind = "service-resolver"
Name = "external"
DefaultSubset = "v1"
Subsets = {  
  v1 = {    
    Filter = "Service.Meta.version == v1"  
  } 
  v2 = {    
    Filter = "Service.Meta.version == v2"  
  }
}
EOF

echo "Running terminating gateway"
./consul connect envoy -gateway terminating -register -service gateway -proxy-id gateway -- -l trace  &

wait

Before the fix:

➜  consul git:(terminating-gateway-resolvers) ✗ curl -s http://localhost:19000/clusters | grep external | cut -d':' -f1 | uniq | sort
external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul
v1.external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul
v2.external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul
➜  consul git:(terminating-gateway-resolvers) ✗ ./consul config delete -kind service-resolver -name external
Config entry deleted: service-resolver/external
➜  consul git:(terminating-gateway-resolvers) ✗ curl -s http://localhost:19000/clusters | grep external | cut -d':' -f1 | uniq | sort
external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul
v1.external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul
v2.external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul

After the fix:

➜  consul git:(terminating-gateway-resolvers) ✗ curl -s http://localhost:19000/clusters | grep external | cut -d':' -f1 | uniq | sort
external.default.dc1.internal.d27ec462-6e9d-edfa-bf3a-8d603e174492.consul
v1.external.default.dc1.internal.d27ec462-6e9d-edfa-bf3a-8d603e174492.consul
v2.external.default.dc1.internal.d27ec462-6e9d-edfa-bf3a-8d603e174492.consul
➜  consul git:(terminating-gateway-resolvers) ✗ ./consul config delete -kind service-resolver -name external
Config entry deleted: service-resolver/external
➜  consul git:(terminating-gateway-resolvers) ✗ curl -s http://localhost:19000/clusters | grep external | cut -d':' -f1 | uniq | sort
external.default.dc1.internal.d27ec462-6e9d-edfa-bf3a-8d603e174492.consul

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern

Overview of commits

Copy link
Member

@nathancoleman nathancoleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Diff is slightly different because the assert_upstream_missing_once helper didn't exist in the base branch of this PR

@andrewstucki andrewstucki merged commit 89cee20 into release/1.14.x Mar 7, 2023
@andrewstucki andrewstucki deleted the release-1.14.x-backport-resolver-fix-1 branch March 7, 2023 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants