Skip to content

[9.1](backport #44932) Adding the option to disable the DNS processor failure or success cache#45078

Merged
andrewkroh merged 2 commits into9.1from
mergify/bp/9.1/pr-44932
Jun 27, 2025
Merged

[9.1](backport #44932) Adding the option to disable the DNS processor failure or success cache#45078
andrewkroh merged 2 commits into9.1from
mergify/bp/9.1/pr-44932

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify Bot commented Jun 26, 2025

Proposed commit message

Adds the option to disable the success and failure cache.

Motivation

This is to enable use cases that require capturing the current point in time dns record regardless of cache or ttl of the record. Such as the case of monitoring the dns server, or with recorded events that need to capture the current state of the environment. TTL captures the time frame over which the old value might be used over the current DNS record, in other words the frame time in which the agent might observe the old or new record based upon whenever the previous request was made. This unpredictability can be undesired when optimizing time-to-intervention.

Disabling the cache will have throughput implications, serial processing an event will be greater than DNS roundtrip time. For example if round-trip time to perform an DNS request is 1 ms, max throughput it limited to 1000/sec. Known use cases have are low throughput requirements. Parallelization, by for example deploying multiple agents, can be used to stretch this number. We would urge to reevaluate the use case and the use of the cache at this point.

NOTE: setting the ttl on the failure cache to 1ns achieves a similar, but imperfect effect.
NOTE: setting the ttl on the success cache is a valid option as per code, it is however ignored as also document in the code. in the documentation it is omitted as an option. Honoring setting and the ttl (min(ttl, dns_record_ttl)) is a different route. Similar to other dns client behaviour.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

non known, the default values leave the old behavior intact and the setting to trigger the new behavior is added in this PR

How to test this PR locally

Define the DNS processor, observe cache stats / resolver requests.

Related issues


This is an automatic backport of pull request #44932 done by [Mergify](https://mergify.com).

This enables use cases that require resolving the current DNS record,
regardless of the record's TTL or any previously cached values. It is
useful, for example, when monitoring a DNS server or when recorded
events must capture the environment's state at a specific moment.

When a cache is used, the TTL determines the time frame in which an
agent might observe a stale record instead of the current one. This
unpredictability can be undesirable when optimizing for rapid
time-to-intervention.

Disabling the cache has significant throughput implications. The
processing time for a single event will be at least the DNS round-trip
time. For example, if a DNS request takes 1 ms, the maximum serial
throughput is limited to 1000 events/sec. Known use cases for this
feature have low throughput requirements. Throughput can be increased
by deploying multiple, parallel agents.

NOTE: Setting the failure cache TTL to a very low value (e.g., 1ns)
achieves a similar, but imperfect, effect.

NOTE: While the config allows setting a TTL on the success cache, this
option is currently ignored. A future enhancement could honor this
setting (e.g., by using min(configured_ttl, record_ttl)), which would
align with the behavior of other DNS clients.

(cherry picked from commit eee15e7)
@mergify mergify Bot added the backport label Jun 26, 2025
@mergify mergify Bot requested review from a team as code owners June 26, 2025 19:00
@mergify mergify Bot added the backport label Jun 26, 2025
@botelastic botelastic Bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 26, 2025
@botelastic
Copy link
Copy Markdown

botelastic Bot commented Jun 26, 2025

This pull request doesn't have a Team:<team> label.

@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@andrewkroh andrewkroh merged commit a58635c into 9.1 Jun 27, 2025
203 checks passed
@andrewkroh andrewkroh deleted the mergify/bp/9.1/pr-44932 branch June 27, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport enhancement libbeat needs_team Indicates that the issue/PR needs a Team:* label :Processors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants