Skip to content

Fix kube discovery service poll_interval value not set correctly#60982

Merged
kshi36 merged 5 commits intomasterfrom
kevin/discovery-poll-interval-fix
Nov 25, 2025
Merged

Fix kube discovery service poll_interval value not set correctly#60982
kshi36 merged 5 commits intomasterfrom
kevin/discovery-poll-interval-fix

Conversation

@kshi36
Copy link
Copy Markdown
Contributor

@kshi36 kshi36 commented Nov 4, 2025

Fixes #52581

Manual Tests

Test: Specify a poll interval not equal to the default value of 5 minutes

  • Verify that before the change, discovery service polls every 5 minutes regardless of poll_interval field
  • Verify that after the change, discovery service polls based on value of poll_interval field

Examples:

config file

app_service:
      enabled: true
      resources:
      - labels:
          teleport.dev/kubernetes-cluster: main-cluster
          teleport.dev/origin: discovery-kubernetes
    auth_service:
      enabled: false
    db_service:
      enabled: false
    discovery_service:
      discovery_group: main-cluster
      enabled: true
      kubernetes:
      - labels:
          '*': '*'
        namespaces:
        - '*'
        types:
        - app
      poll_interval: 1m
    jamf_service:
      enabled: false
    kubernetes_service:
      enabled: true
      kube_cluster_name: main-cluster
    proxy_service:
      enabled: false
    ssh_service:
      enabled: false
    teleport:
      join_params:
        method: token
        token_name: /etc/teleport-secrets/auth-token
      log:
        format:
          extra_fields:
          - timestamp
          - level
          - component
          - caller
          output: text
        output: stderr
        severity: INFO
      proxy_server: kevin.dev:443
    version: v3

result

2025-11-14T17:09:18.103Z INFO [DISCOVERY] New resource matches, creating pid:11.1 kind:app kind:app name:dummy-service-no-pods-5-default-main-cluster services/reconciler.go:264
2025-11-14T17:09:18.925Z INFO [APP:SERVI] New resource matches, creating kind:app kind:app name:dummy-service-no-pods-5-default-main-cluster services/reconciler.go:264
2025-11-14T17:10:18.112Z INFO [DISCOVERY] Resource was removed, deleting pid:11.1 kind:app kind:app name:dummy-service-no-pods-5-default-main-cluster services/reconciler.go:219
2025-11-14T17:10:18.925Z INFO [APP:SERVI] Resource was removed, deleting kind:app kind:app name:dummy-service-no-pods-5-default-main-cluster services/reconciler.go:219

changelog: Fixed bug where Kubernetes App Discovery poll_interval is not set correctly

@kshi36 kshi36 requested review from bernardjkim and r0mant November 12, 2025 01:18
@kshi36 kshi36 marked this pull request as ready for review November 12, 2025 01:18
@rosstimothy
Copy link
Copy Markdown
Contributor

@kshi36 could you update your PR description with the manual testing that you did to validate this change? See #59850 for an example.

Comment on lines +375 to +390
func (f *mockAuthServer) CreateApp(ctx context.Context, _ types.Application) error {
return nil
}

func (m *mockAuthServer) GetApps(ctx context.Context) ([]types.Application, error) {
return nil, nil
}

func (m *mockAuthServer) ListApps(ctx context.Context, limit int, startKey string) ([]types.Application, string, error) {
return nil, "", nil
}

func (m *mockAuthServer) RangeApps(ctx context.Context, start, end string) iter.Seq2[types.Application, error] {
return func(yield func(types.Application, error) bool) {}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I have extended the mockAuthServer implementation here for use in my test in discovery_test.go, since it contains similar logic for my usage with synctest. I felt this was the best way to write the test without interfering with existing test logic in discovery_test.go. I specifically needed the functionality of a mock AccessPoint to eliminate non-durably blocking gRPC calls that impact synctest.

In a future PR, I want to migrate everything in discovery_eks_test.go back to discovery_test.go, and modify existing tests to utilize synctest (and use the same mockAuthServer implementation), but this would be extra scope not intended for this PR.

@kshi36 kshi36 requested a review from rosstimothy November 17, 2025 20:04
Copy link
Copy Markdown
Contributor

@rosstimothy rosstimothy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to convert the new test to use synctest @kshi36! The changes look good and I've validated them locally as well, just a few further comments I'd like to see addressed before approving.

Copy link
Copy Markdown
Contributor

@rosstimothy rosstimothy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @kshi36!

@kshi36
Copy link
Copy Markdown
Contributor Author

kshi36 commented Nov 17, 2025

It seems that converting the deprecated fake.NewSimpleClientset() to fake.NewClientset() introduced flakiness, unless there's something else I'm missing. I will consider reverting previous tests to the old call

@rosstimothy
Copy link
Copy Markdown
Contributor

It seems that converting the deprecated fake.NewSimpleClientset() to fake.NewClientset() introduced flakiness, unless there's something else I'm missing. I will consider reverting previous tests to the old call

Using NewClientset is not the source of the flakiness, the number of locations that have been modified to use it is. The Flaky Test Detector runs all modified tests N times and expects them to complete in M seconds. You've modified so many tests that they no cannot complete fast enough.

We can either change these one test at a time, or we can bypass the flaky test detector on this PR as is.

@kshi36
Copy link
Copy Markdown
Contributor Author

kshi36 commented Nov 17, 2025

We can either change these one test at a time, or we can bypass the flaky test detector on this PR as is.

Thank you for the explanation. I was wondering if it was something particular with the flaky test detector, or that I couldn't track flakiness from my local tests passing

@kshi36
Copy link
Copy Markdown
Contributor Author

kshi36 commented Nov 20, 2025

Friendly ping @r0mant @bernardjkim @kopiczko

Copy link
Copy Markdown
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 line fix + 80 lines of tests. Love it!

@rosstimothy
Copy link
Copy Markdown
Contributor

/excludeflake *

@kshi36 kshi36 enabled auto-merge November 20, 2025 22:27
@kshi36
Copy link
Copy Markdown
Contributor Author

kshi36 commented Nov 25, 2025

@rosstimothy Was the bypassing flaky test detector successful?

@rosstimothy
Copy link
Copy Markdown
Contributor

@kshi36 you have to rerun the action for it to see the exclusion comment.

@kshi36 kshi36 added this pull request to the merge queue Nov 25, 2025
Merged via the queue into master with commit de20a3d Nov 25, 2025
43 of 44 checks passed
@kshi36 kshi36 deleted the kevin/discovery-poll-interval-fix branch November 25, 2025 20:35
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@kshi36 See the table below for backport results.

Branch Result
branch/v17 Failed
branch/v18 Failed

kshi36 added a commit that referenced this pull request Nov 25, 2025
…60982)

* Fix kube discovery service poll interval

* Add unit test, pass down clocks in discovery configs

* Leverage synctest, pass down clocks properly

* Refactor
kshi36 added a commit that referenced this pull request Nov 25, 2025
…60982)

* Fix kube discovery service poll interval

* Add unit test, pass down clocks in discovery configs

* Leverage synctest, pass down clocks properly

* Refactor
github-merge-queue bot pushed a commit that referenced this pull request Dec 1, 2025
…tly (#61792)

* Fix kube discovery service `poll_interval` value not set correctly (#60982)

* Fix kube discovery service poll interval

* Add unit test, pass down clocks in discovery configs

* Leverage synctest, pass down clocks properly

* Refactor

* Readd discovery_eks_test.go and new test

* Refactor logger

* Use NewClientset over deprecated NewSimpleClientset

* Update license
github-merge-queue bot pushed a commit that referenced this pull request Dec 1, 2025
…tly (#61791)

* Fix kube discovery service `poll_interval` value not set correctly (#60982)

* Fix kube discovery service poll interval

* Add unit test, pass down clocks in discovery configs

* Leverage synctest, pass down clocks properly

* Refactor

* Fix test

* Fix test

* Readd test coverage
cthach pushed a commit that referenced this pull request Dec 1, 2025
…60982)

* Fix kube discovery service poll interval

* Add unit test, pass down clocks in discovery configs

* Leverage synctest, pass down clocks properly

* Refactor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

poll_interval for the discovery service does not seem to be honored

4 participants