Skip to content

Conversation

@mergify
Copy link
Contributor

@mergify mergify bot commented Mar 20, 2025

Fix flaky timeout tests.

These tests confirmed that their timeouts worked by scanning the error text returned from the http request. The error text that they expected was:

error in http fetch: error making http request: Get "http://127.0.0.1:57060/stats": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

However there are two other semi-common forms:

error in http fetch: error making http request: Get "http://127.0.0.1:57052/stats": context deadline exceeded

error in http fetch: error making http request: Get "http://127.0.0.1:32839/stats": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

This result is still a successful check of the timeout feature, even though the propagated error string varies slightly. I've switched to checking substrings that are stable between all error strings.

I also reverted some of the changes in #43283 that were unrelated to the test flakiness, and that made the test potentially less stable. These tests previously had near-identical code -- the original version used a channel to terminate the server response deterministically when the test was over, but #43283 changed this to a fixed Sleep call that requires 5ms precision on timer triggers (which is usually true but not always guaranteed on CI machines. Tests shouldn't use magic timeout values when a deterministic alternative is available, and since the channel code turned out to be unrelated to the flakiness I think we should keep it.)


This is an automatic backport of pull request #43317 done by Mergify.

Fix flaky timeout tests.

These tests confirmed that their timeouts worked by scanning the error text returned from the http request. The error text that they expected was:

`error in http fetch: error making http request: Get "http://127.0.0.1:57060/stats": context deadline exceeded (Client.Timeout exceeded while awaiting headers)`

However there are two other semi-common forms:

`error in http fetch: error making http request: Get "http://127.0.0.1:57052/stats": context deadline exceeded`

`error in http fetch: error making http request: Get "http://127.0.0.1:32839/stats": net/http: request canceled (Client.Timeout exceeded while awaiting headers)`

This result is still a successful check of the timeout feature, even though the propagated error string varies slightly. I've switched to checking substrings that are stable between all error strings.

I also reverted some of the changes in #43283 that were unrelated to the test flakiness, and that made the test potentially less stable. These tests previously had near-identical code -- the original version used a channel to terminate the server response deterministically when the test was over, but #43283 changed this to a fixed Sleep call that requires 5ms precision on timer triggers (which is _usually_ true but not always guaranteed on CI machines. Tests shouldn't use magic timeout values when a deterministic alternative is available, and since the channel code turned out to be unrelated to the flakiness I think we should keep it.)

(cherry picked from commit cb097b4)
@mergify mergify bot added the backport label Mar 20, 2025
@mergify mergify bot requested review from a team as code owners March 20, 2025 18:20
@mergify mergify bot assigned faec Mar 20, 2025
@mergify mergify bot requested review from faec and mauri870 and removed request for a team March 20, 2025 18:20
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 20, 2025
@github-actions github-actions bot added flaky-test Unstable or unreliable test cases. Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team labels Mar 20, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 20, 2025
@mergify
Copy link
Contributor Author

mergify bot commented Mar 24, 2025

This pull request has not been merged yet. Could you please review and merge it @faec? 🙏

@faec faec merged commit c6e5735 into 8.x Mar 24, 2025
32 checks passed
@faec faec deleted the mergify/bp/8.x/pr-43317 branch March 24, 2025 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport flaky-test Unstable or unreliable test cases. Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants