Skip to content

Reuse tunnel resolvers instead of creating one per connection attempt#37566

Merged
rosstimothy merged 1 commit intomasterfrom
tross/process_resolver
Feb 2, 2024
Merged

Reuse tunnel resolvers instead of creating one per connection attempt#37566
rosstimothy merged 1 commit intomasterfrom
tross/process_resolver

Conversation

@rosstimothy
Copy link
Copy Markdown
Contributor

@rosstimothy rosstimothy commented Jan 30, 2024

The CachingResolver is backed by a FnCache but does not expose a way to close the underlying cache. This leads to memory leaks as captured in #37025. Instead of modifying the resolver to allow explicit cleanup to occur, the resolvers were refactored to be created once per process instead of per connection attempt to the cluster. Since the cluster address is read from the config file, it won't be changed for the duration of the process which allows us to safely use a single resolver. The one potential downside to this approach is the cache may return possibly stale errors during an outage until the entry is TTLed.

Fixes #37025

changelog: Fix memory leak in tbot caused by never closing reverse tunnel address resolvers

@rosstimothy rosstimothy force-pushed the tross/process_resolver branch from 7089543 to 8749ef3 Compare January 31, 2024 15:28
@rosstimothy rosstimothy changed the title Refactor TeleportProcess to reuse a single tunnel resolver Reuse tunnel resolvers instead of creating one per connection attempt Jan 31, 2024
@rosstimothy rosstimothy force-pushed the tross/process_resolver branch 2 times, most recently from 46d6684 to b121c99 Compare January 31, 2024 15:55
The `CachingResolver` is backed by a `FnCache` but does not expose
a way to close the underlying cache. This leads to memory leaks
as captured in #37025. Instead of modifying the resolver to allow
explicit cleanup to occur, the resolvers were refactored to be
created once per process instead of per connenction attempt to the
cluster. Since the cluster address is read from the config file, it
won't be changed for the duration of the process which allows us to
safely use a single resolver. The one potential downside to this
approach is the cache may return possibly stale errors during an
outage until the entry is TTLed.

Fixes #37025
@rosstimothy rosstimothy force-pushed the tross/process_resolver branch from b121c99 to eed9af4 Compare January 31, 2024 16:03
@rosstimothy rosstimothy marked this pull request as ready for review January 31, 2024 16:16
@github-actions github-actions Bot requested a review from rudream January 31, 2024 16:17
@github-actions github-actions Bot added machine-id size/sm tctl tctl - Teleport admin tool labels Jan 31, 2024
@public-teleport-github-review-bot public-teleport-github-review-bot Bot removed the request for review from rudream February 2, 2024 16:59
@rosstimothy rosstimothy added this pull request to the merge queue Feb 2, 2024
Merged via the queue into master with commit 327c877 Feb 2, 2024
@rosstimothy rosstimothy deleted the tross/process_resolver branch February 2, 2024 17:28
@public-teleport-github-review-bot
Copy link
Copy Markdown

@rosstimothy See the table below for backport results.

Branch Result
branch/v12 Failed
branch/v13 Failed
branch/v14 Failed
branch/v15 Create PR

rosstimothy added a commit that referenced this pull request Feb 2, 2024
…#37566)

The `CachingResolver` is backed by a `FnCache` but does not expose
a way to close the underlying cache. This leads to memory leaks
as captured in #37025. Instead of modifying the resolver to allow
explicit cleanup to occur, the resolvers were refactored to be
created once per process instead of per connenction attempt to the
cluster. Since the cluster address is read from the config file, it
won't be changed for the duration of the process which allows us to
safely use a single resolver. The one potential downside to this
approach is the cache may return possibly stale errors during an
outage until the entry is TTLed.

Fixes #37025
rosstimothy added a commit that referenced this pull request Feb 2, 2024
…#37566)

The `CachingResolver` is backed by a `FnCache` but does not expose
a way to close the underlying cache. This leads to memory leaks
as captured in #37025. Instead of modifying the resolver to allow
explicit cleanup to occur, the resolvers were refactored to be
created once per process instead of per connenction attempt to the
cluster. Since the cluster address is read from the config file, it
won't be changed for the duration of the process which allows us to
safely use a single resolver. The one potential downside to this
approach is the cache may return possibly stale errors during an
outage until the entry is TTLed.

Fixes #37025
github-merge-queue Bot pushed a commit that referenced this pull request Feb 5, 2024
…#37566) (#37723)

The `CachingResolver` is backed by a `FnCache` but does not expose
a way to close the underlying cache. This leads to memory leaks
as captured in #37025. Instead of modifying the resolver to allow
explicit cleanup to occur, the resolvers were refactored to be
created once per process instead of per connenction attempt to the
cluster. Since the cluster address is read from the config file, it
won't be changed for the duration of the process which allows us to
safely use a single resolver. The one potential downside to this
approach is the cache may return possibly stale errors during an
outage until the entry is TTLed.

Fixes #37025
github-merge-queue Bot pushed a commit that referenced this pull request Feb 5, 2024
…#37566) (#37719)

The `CachingResolver` is backed by a `FnCache` but does not expose
a way to close the underlying cache. This leads to memory leaks
as captured in #37025. Instead of modifying the resolver to allow
explicit cleanup to occur, the resolvers were refactored to be
created once per process instead of per connenction attempt to the
cluster. Since the cluster address is read from the config file, it
won't be changed for the duration of the process which allows us to
safely use a single resolver. The one potential downside to this
approach is the cache may return possibly stale errors during an
outage until the entry is TTLed.

Fixes #37025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

machine-id size/sm tctl tctl - Teleport admin tool

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak caused by FnCache which are not shut down

3 participants