Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static/Unmanaged Gateway Mode (vs Provisioning) #1687

Open
2 tasks
shaneutt opened this issue Jan 31, 2023 · 30 comments
Open
2 tasks

Static/Unmanaged Gateway Mode (vs Provisioning) #1687

shaneutt opened this issue Jan 31, 2023 · 30 comments
Labels
area/conformance area/mesh Mesh networks related documentation Improvements or additions to documentation kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@shaneutt
Copy link
Member

shaneutt commented Jan 31, 2023

(follow-up from GAMMA meeting 2023-01-31)

In the past we've had discussions about a mode of operation for Gateway API which we'll call "static mode" (previously called "unmanaged mode" in some old issues) wherein a Gateway resource refers to a gateway that has been provisioned by some other means (e.g. via Helm) but the implementation may or may not actually manage the lifecycle of that gateway.

Multiple implementations now have options for such a mode which can be particularly helpful for allowing legacy/pre-operator deployments to have Gateway API functionality. Implementations that do this include (but are not limited to):

  • Contour
  • Istio
  • Kong
  • NGinx

Today there are prospective implementations of GAMMA (such as LinkerD) which are in need of such a mode, but currently that mode is a colloquialism, and not properly documented or accounted for here in upstream Gateway API.

TODO:

  • add documentation that prescribes implementing "static" or "unmanaged" Gateway mode
  • make any provisions necessary in the conformance tests to accommodate such an implementation, and/or potentially add this as a separate conformance category

Prior art: #892, #925

@shaneutt shaneutt added kind/feature Categorizes issue or PR as related to a new feature. area/conformance area/mesh Mesh networks related labels Jan 31, 2023
@shaneutt shaneutt added this to the v0.7.0 milestone Jan 31, 2023
@howardjohn
Copy link
Contributor

Here is how Istio does it: https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/#manual-deployment

The idea was to match how cloud LB do it, where you point to an existing LB by address

@shaneutt
Copy link
Member Author

Thanks @howardjohn!

cc @kflynn

@sunjayBhatia
Copy link
Member

similarly here for Contour: https://projectcontour.io/docs/v1.23.2/guides/gateway-api/#option-1-statically-provisioned

@mlavacca
Copy link
Member

mlavacca commented Jan 31, 2023

This is how Kong deals with it: https://docs.konghq.com/kubernetes-ingress-controller/latest/concepts/gateway-api/#binding-kong-gateway-to-a-gateway-resource

Instead of relying on static/unmanaged gateways, we use static/unmanaged gatewayClasses, i.e., a Gateway is considered unmanaged if it uses an unmanaged GatewayClass.

@kate-osborn
Copy link
Contributor

NGINX only supports "static" mode. We support one Gateway resource that is provisioned and managed by the user. We'd love to be able to run the conformance tests against our implementation via a static mode option or something similar.

@shaneutt shaneutt added documentation Improvements or additions to documentation help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Feb 2, 2023
@kflynn
Copy link
Contributor

kflynn commented Feb 17, 2023

I've run into this with both Emissary-ingress and Linkerd. The Emissary-ingress case is simplest to describe, so I'll do that first: like NGINX as described by @kate-osborn, Emissary only supports "static" mode. It would be lovely to be able to have Emissary pass conformance, and considerably less lovely to have to completely change how Emissary works to do so.

The Linkerd case is one I saw firsthand when getting Envoy Gateway and Linkerd to play nicely together, so let me use that experience as an illustration. Please note that I'm not trying to argue that either Envoy Gateway or Linkerd is "wrong" here -- rather, I'm illustrating a point of friction.

The important things to know about Linkerd here are that:

  • From Linkerd's perspective, the ingress is just another meshed workload.

  • Linkerd requires workloads to have the linkerd.io/inject annotation to be included in the mesh. Typically this gets included in the Pod template for e.g. a Deployment. It can also be applied to a namespace, in which case it will be applied to every Pod created in that namespace.

Using an ingress installed using "static mode" with Linkerd is really easy: you install the ingress using your favorite installation method, make sure its Pods get the linkerd.io/inject annotation, and off you go. How exactly you apply the annotation depends on the installation mechanism, but they're all basically the same in the end.

If your ingress doesn't support "static mode", it's harder. In Envoy Gateway's case, it has Envoy proxy Pods that need the annotation, but that get recreated any time you touch the Gateway resource. Since you don't have control over the life cycle of those Pods, and you also don't have access to the Envoy Gateway Deployment, you have to annotate the envoy-gateway-system namespace to get Envoy Gateway into the mesh. Functionally, this is fine. Operationally, it's extra friction for this particular ingress: it would be simpler for many end users to give them the ability to just install Envoy Gateway statically.

@howardjohn
Copy link
Contributor

FWIW Istio forwards the annotations from Gateway to the generated resources for similar reasons

@shaneutt shaneutt added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Feb 20, 2023
@shaneutt shaneutt modified the milestones: v0.7.0, v1.0.0 Feb 21, 2023
@kate-osborn
Copy link
Contributor

@shaneutt based on the discussions, it seems like there's a fair amount of interest in this work. How can we move this issue forward? What are the next steps?

Nginx is eager to be able to run the conformance tests against its implementation and is willing to volunteer for this work.

@shaneutt
Copy link
Member Author

@shaneutt based on the discussions, it seems like there's a fair amount of interest in this work. How can we move this issue forward? What are the next steps?

Nginx is eager to be able to run the conformance tests against its implementation and is willing to volunteer for this work.

At this point we consider this accepted and have marked it on our Road to GA project board as something we'd like done for GA. Currently it needs a champion, someone to drive it forward and get the content (mostly it would seem, documentation) submitted. Whoever wants to do that should check out #1757 and probably touch base with @howardjohn as well, as the GEP he started there relates.

@shaneutt shaneutt removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Mar 13, 2023
@shaneutt
Copy link
Member Author

/help

@k8s-ci-robot
Copy link
Contributor

@shaneutt:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Mar 13, 2023
@shaneutt shaneutt added the release-blocker MUST be completed to complete the milestone label Mar 14, 2023
@pleshakov
Copy link
Contributor

pleshakov commented May 1, 2023

What do people think about the following approach to support the static Gateway mode in the conformance tests?

In the static mode, a Gateway implementation can only support one Gateway, so the following proposal updates the conformance tests to support that mode while preserving the support for the mode with multiple Gateways:

  1. Reduce the number of Gateways deployed by default to one:

    1. Do not deploy all-namespaces and backend-namespaces.
    2. Deploy the existing same-namespace.

    Note: Most of the existing tests either deploy their own Gateways or use the same-namespace Gateway.

  2. In all tests that deploy dedicated Gateways, where possible, collapse their listeners into a single Gateway. Those tests will need remove the same-namespace Gateway, deploy their test-specific Gateway and then restore the same-namespace Gateway upon completion.

  3. For tests that cannot be updated to use a single Gateway, group them as a feature DynamicGateways, so that their can be easily disabled.

  4. All tests that use only same-namespace Gateway will continue to use it and will not need any modification.

  5. If necessary, add a new test(s) that exercises support for multiple Gateways.

Below is the list of existing tests ( based on commit 6efcfdf ) along with used Gateways:

Test # of Gateways
GatewayInvalidRouteKind 2 (dedicated)
GatewayInvalidTLSConfiguration 4 (dedicated)
GatewayModifyListeners 2 (dedicated)
GatewayObservedGenerationBump 1 (dedicated)
GatewaySecretInvalidReferenceGrant 1 (dedicated)
GatewaySecretMissingReferenceGrant 1 (dedicated)
GatewaySecretReferenceGrantAllInNamespace 1 (dedicated)
GatewaySecretReferenceGrantSpecific 1 (dedicated)
GatewayWithAttachedRoutes 2 (dedicated)
GatewayClassObservedGenerationBump 0
HTTPRouteCrossNamespace 1 (Relies on the base Gateway backend-namespaces)
HTTPExactPathMatching, HTTPRouteHeaderMatching, HTTPRouteInvalidNonExistentBackendRef, HTTPRouteInvalidBackendRefUnknownKind, HTTPRouteInvalidCrossNamespaceBackendRef, HTTPRouteInvalidCrossNamespaceParentRef, HTTPRouteInvalidParentRefNotMatchingListenerPort, HTTPRouteInvalidParentRefNotMatchingSectionName, HTTPRouteMatching, HTTPRouteMatchingAcrossRoutes, HTTPRouteMethodMatching, HTTPRouteObservedGenerationBump, HTTPRoutePartiallyInvalidViaInvalidReferenceGrant, HTTPRoutePathMatchOrder, HTTPRouteQueryParamMatching, HTTPRouteRedirectHostAndStatus, HTTPRouteRedirectPath, HTTPRouteRedirectPort, HTTPRouteRedirectScheme, HTTPRouteReferenceGrant, HTTPRouteRequestHeaderModifier, HTTPRouteResponseHeaderModifier, HTTPRouteRewriteHost, HTTPRouteRewritePath, HTTPRouteSimpleSameNamespace 1 (Relies on the base Gateway same-namespace)
HTTPRouteDisallowedKind 1 (dedicated)
HTTPRouteHostnameIntersection 1 (dedicated)
HTTPRouteListenerHostnameMatching 1 (dedicated)
TLSRouteSimpleSameNamespace 1 (dedicated)

@shaneutt
Copy link
Member Author

shaneutt commented May 1, 2023

@mlavacca curious as to your thoughts on this, considering our current static implementation?

@pleshakov
Copy link
Contributor

The proposed approach above was discussed during the community meeting on May 8, and the group suggested not to go with that.

The suggested way forward (hope I got it right):

  • Make it explicit in the Gateway API spec that the implementations must support multiple Gateways.
  • If an implementation needs to support attaching to an existing provisioned infrastructure, the right way to do it will be GEP-1867: Per-Gateway Infrastructure #1868

cc @youngnick

@shaneutt shaneutt added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels May 18, 2023
@shaneutt
Copy link
Member Author

We still want this, but it doesn't need to be done for v1.0.0 we can do this after the GA release.

@shaneutt shaneutt removed help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-blocker MUST be completed to complete the milestone labels Jul 10, 2023
@shaneutt shaneutt removed this from the v1.0.0 milestone Jul 10, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 23, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 24, 2024
@mikemorris
Copy link
Contributor

mikemorris commented May 11, 2024

/reopen

I'd like to reopen this, as it seems several implementations have a need for this functionality - we all have slightly different ways of enabling this currently (and different intersecting features), but it feels like a common solution should be possible, and this feature would be complementary to (but not dependent on) the "gateway merging" standardization happening in #1863.

We'd like to solve this within the spec in the interest of being able to report full conformance for Azure's Application Gateway for Containers implementation, and can commit some time to putting together a proposal and implementation - we'd be interested in collaborating with anyone similarly motivated to standardize this functionality!

Two approaches we're currently considering are:

I'd prefer to stay away from using spec.addresses as the sole means of attaching to the static infrastructure because of its utility in specifying static or dynamic IP addresses.

I've tried to pull together an overview of the different implementations discussed in this thread so far - please feel free to correct any mistakes or omissions!

AKS

Cluster Locality: External
Gateway Merging: 🗺️ (on roadmap)
Dynamic Infrastructure Provisioning:
Mechanism: Binds to an existing AGC load balancer directly by specifying in an annotation the namespace and name of an ApplicationLoadBalancer custom resource (when managed from the same cluster), or by specifying in an annotation the resource ID of an AGC load balancer (if unmanaged, or managed by a controller in a different cluster). Can either bind to an existing reserved static IP in spec.addresses, or dynamically provision a static IP on the static infrastructure by leaving spec.addresses empty.

GKE

Cluster Locality: External
Gateway Merging:
Dynamic Infrastructure Provisioning:
Mechanism: Binds to an existing GKE gateway indirectly by specifying in spec.addresses an external static IP attached to the GKE gateway. Can dynamically provision infrastructure by leaving spec.addresses empty.

Istio

Cluster Locality: Internal
Gateway Merging:
Dynamic Infrastructure Provisioning:
Mechanism: Binds to an existing Istio ingress gateway directly by specifying the ingress hostname in spec.addresses. Can dynamically provision infrastructure by leaving spec.addresses empty.

Linkerd

Cluster Locality: Internal
Gateway Merging: Ingress-dependent?
Dynamic Infrastructure Provisioning:
Mechanism: Binds to an existing ingress gateway by adding an annotation to pods to add them into the mesh. More difficult if pods are created dynamically.

Contour

Cluster Locality: Internal
Gateway Merging: ?
Dynamic Infrastructure Provisioning:
Mechanism: Binds by specifying spec.controllerName in GatewayClass - target name can be customized in static infrastructure config.

Kong

Cluster Locality: Internal
Gateway Merging:
Dynamic Infrastructure Provisioning:
Mechanism: Binds by specifying spec.controllerName in GatewayClass - target name can be customized in static infrastructure config. Requires setting a konghq.com/gatewayclass-unmanaged=true annotation to attach to static infrastructure. Dynamic provisioning requires deploying an operator and creating a GatewayClass with the konghq.com/gateway-operator controllerName.

NGINX

Cluster Locality: Internal?
Gateway Merging:
Dynamic Infrastructure Provisioning:
Mechanism: ?

Emissary-ingress

Cluster Locality: Internal
Gateway Merging: ?
Dynamic Infrastructure Provisioning:
Mechanism: ?

/cc @snehachhabria @robscott @howardjohn @kflynn @sunjayBhatia @mlavacca @shaneutt @kate-osborn

@k8s-ci-robot k8s-ci-robot reopened this May 11, 2024
@k8s-ci-robot
Copy link
Contributor

@mikemorris: Reopened this issue.

In response to this:

/reopen

I'd like to reopen this, as it seems several implementations have a need for this functionality - we all have slightly different ways of enabling this currently (and different intersecting features), but it feels like a common solution should be possible, and this feature would be complementary to (but not dependent on) the "gateway merging" standardization happening in #1863.

We'd like to solve this within the spec in the interest of being able to report full conformance for Azure's Application Gateway for Containers implementation, and can commit some time to putting together a proposal and implementation - we'd be interested in collaborating with anyone similarly motivated to standardize this functionality!

I've tried to pull together an overview of the different implementations discussed in this thread so far:

AKS

Cluster Locality: External
Gateway Merging: 🗺️ (on roadmap)
Dynamic Infrastructure Provisioning:
Mechanism: Binds to an existing AGC load balancer directly by specifying in an annotation the namespace and name of an ApplicationLoadBalancer custom resource (when managed from the same cluster), or by specifying in an annotation the resource ID of an AGC load balancer (if unmanaged, or managed by a controller in a different cluster). Can either bind to an existing reserved static IP in spec.addresses, or dynamically provision a static IP on the static infrastructure by leaving spec.addresses empty.

GKE

Cluster Locality: External
Gateway Merging:
Dynamic Infrastructure Provisioning:
Mechanism: Binds to an existing GKE gateway indirectly by specifying in spec.addresses an external static IP attached to the GKE gateway. Can dynamically provision infrastructure by leaving spec.addresses empty.

Istio

Cluster Locality: Internal
Gateway Merging:
Dynamic Infrastructure Provisioning:
Mechanism: Binds to an existing Istio ingress gateway directly by specifying the ingress hostname in spec.addresses. Can dynamically provision infrastructure by leaving spec.addresses empty.

Linkerd

Cluster Locality: Internal
Gateway Merging: Ingress-dependent?
Dynamic Infrastructure Provisioning:
Mechanism: Binds to an existing ingress gateway by adding an annotation to pods to add them into the mesh. More difficult if pods are created dynamically.

Contour

Cluster Locality: Internal
Gateway Merging: ?
Dynamic Infrastructure Provisioning:
Mechanism: Binds by specifying spec.controllerName in GatewayClass - target name can be customized in static infrastructure config.

Kong

Cluster Locality: Internal
Gateway Merging:
Dynamic Infrastructure Provisioning: ?
Mechanism: Binds by specifying spec.controllerName in GatewayClass - target name can be customized in static infrastructure config. Requires setting a konghq.com/gatewayclass-unmanaged=true annotation to attach to static infrastructure.

NGINX

Cluster Locality: Internal?
Gateway Merging:
Dynamic Infrastructure Provisioning:
Mechanism: ?

Emissary-ingress

Cluster Locality: Internal
Gateway Merging: Ingress-dependent?
Dynamic Infrastructure Provisioning:
Mechanism: ?

/cc @snehachhabria @robscott @howardjohn @kflynn @sunjayBhatia @mlavacca @shaneutt @kate-osborn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@shaneutt
Copy link
Member Author

shaneutt commented May 12, 2024

@mikemorris you can mark Kong as ✔️ for Dynamic Infrastructure Provisioning. We have "unmanaged" mode if you're using our historical ingress controller directly (see https://github.com/kong/kubernetes-ingress-controller), or provisioned/managed mode if you use our operator (see https://github.com/kong/gateway-operator).

@shaneutt
Copy link
Member Author

Appreciate your thoughtful update and desire to see this one move forward also, let's un-rotten it:

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 12, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 26, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/conformance area/mesh Mesh networks related documentation Improvements or additions to documentation kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
Development

No branches or pull requests