Skip to content

otel: fix flakiness and enable logs_dynamic_id in TestFBOtelRestartE2E#6819

Merged
mauri870 merged 17 commits intoelastic:mainfrom
mauri870:otel-restart-test-duplicates
Apr 28, 2025
Merged

otel: fix flakiness and enable logs_dynamic_id in TestFBOtelRestartE2E#6819
mauri870 merged 17 commits intoelastic:mainfrom
mauri870:otel-restart-test-duplicates

Conversation

@mauri870
Copy link
Member

@mauri870 mauri870 commented Feb 11, 2025

What does this PR do?

This test starts the collector with a timeout, but the error returned is not
always a context cancelled, sometimes it returns err == nil, which is also
fine, just not handled properly.

While at it, fix some other issues I found while testing:

  • Using require inside a goroutine calls runtime.GoExit on failure, meaning
    the test exits immediatelly without doing any cleanup, causing resource leaks. Use assert in those
    cases.
  • Now with the beats dependency up to date, deduplication works as intended otelconsumer: set document id attribute for elasticsearchexporter beats#42412. Update the test to use logs_dynamic_id in the elasticsearchexporter options and assert data is deduplicated in Elasticsearch.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Related issues

@mauri870 mauri870 added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team backport-8.x Automated backport to the 8.x branch with mergify backport-9.0 Automated backport to the 9.0 branch labels Feb 11, 2025
@mauri870 mauri870 self-assigned this Feb 11, 2025
@mauri870 mauri870 requested a review from a team as a code owner February 11, 2025 16:56
@mauri870 mauri870 requested review from pchila and swiatekm February 11, 2025 16:56
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@mauri870 mauri870 changed the title otel: adjust TestFBOtelRestartE2E to validate deduplication works otel: adjust TestFBOtelRestartE2E to validate that deduplication works Feb 11, 2025
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Feb 11, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@mauri870 mauri870 marked this pull request as draft February 12, 2025 11:31
@mauri870
Copy link
Member Author

mauri870 commented Feb 12, 2025

Moving this to draft since it requires work done in beats via elastic/beats#42412 . I need to bump the beats dependency in go.mod #6837.

@mauri870 mauri870 force-pushed the otel-restart-test-duplicates branch from dc95b0a to d04606e Compare February 19, 2025 20:34
@mauri870 mauri870 changed the title otel: adjust TestFBOtelRestartE2E to validate that deduplication works otel: fix flaky behavior in TestFBOtelRestartE2E Feb 19, 2025
@mauri870 mauri870 changed the title otel: fix flaky behavior in TestFBOtelRestartE2E otel: fix flakiness and various issues in TestFBOtelRestartE2E Feb 19, 2025
@mauri870 mauri870 marked this pull request as ready for review February 19, 2025 20:41
@mauri870
Copy link
Member Author

mauri870 commented Feb 19, 2025

I'm repurposing this PR to include a series of fixes for the otel tests. Having the fixes as a batch as oposed to separate PRs speeds up the continuous integration builds.

@mauri870 mauri870 requested a review from swiatekm February 19, 2025 20:43
This test starts the collector with a timeout, but the error returned is not
always a context cancelled, sometimes it returns err == nil, which is also
fine, just not handled properly.

While at it, fix some other issues I found while testing:

- Using require inside a goroutine calls runtime.GoExit on failure, meaning
  the test exits immediatelly without doing any cleanup. Use assert in those
  cases.
@mauri870 mauri870 force-pushed the otel-restart-test-duplicates branch from d04606e to 95c25c9 Compare February 20, 2025 11:29
swiatekm
swiatekm previously approved these changes Feb 20, 2025
@mauri870 mauri870 enabled auto-merge (squash) February 20, 2025 12:51
@mauri870 mauri870 marked this pull request as draft February 20, 2025 18:51
auto-merge was automatically disabled February 26, 2025 14:27

Pull request was converted to draft

@mauri870
Copy link
Member Author

Leaving this as draft since we had to remove all the tests as a part of a last ditch fix for EDOT in v9.0. I will revisit this once #7023 is reverted.

mauri870 and others added 3 commits February 27, 2025 08:25
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
@elastic-sonarqube
Copy link

@mauri870 mauri870 dismissed stale reviews from pchila, leehinman, and swiatekm via 4a6beef March 5, 2025 17:03
@ycombinator ycombinator added backport-8.19 Automated backport to the 8.19 branch and removed backport-8.x Automated backport to the 8.x branch with mergify labels Apr 22, 2025
@mauri870 mauri870 marked this pull request as ready for review April 28, 2025 11:48
@mauri870 mauri870 changed the title otel: fix flakiness and various issues in TestFBOtelRestartE2E otel: fix flakiness and enable logs_dynamic_index in TestFBOtelRestartE2E Apr 28, 2025
@elastic-sonarqube
Copy link

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

History

cc @mauri870

@mauri870 mauri870 merged commit 82460a2 into elastic:main Apr 28, 2025
12 checks passed
@mauri870 mauri870 changed the title otel: fix flakiness and enable logs_dynamic_index in TestFBOtelRestartE2E otel: fix flakiness and enable logs_dynamic_id in TestFBOtelRestartE2E Apr 28, 2025
mergify bot pushed a commit that referenced this pull request Apr 28, 2025
…tE2E (#6819)

* otel: fix flaky behavior on TestFBOtelRestartE2E

This test starts the collector with a timeout, but the error returned is not
always a context cancelled, sometimes it returns err == nil, which is also
fine, just not handled properly.

While at it, fix some other issues I found while testing:

- Using require inside a goroutine calls runtime.GoExit on failure, meaning
  the test exits immediatelly without doing any cleanup. Use assert in those
  cases.

* don't fail if ignored field is equal

* use a different index name to avoid conflicts

* Update testing/integration/otel_test.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* Update testing/integration/otel_test.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* use assert.Conditionf for error check

---------

Co-authored-by: Khushi Jain <khushi.jain@elastic.co>
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
(cherry picked from commit 82460a2)
mergify bot pushed a commit that referenced this pull request Apr 28, 2025
…tE2E (#6819)

* otel: fix flaky behavior on TestFBOtelRestartE2E

This test starts the collector with a timeout, but the error returned is not
always a context cancelled, sometimes it returns err == nil, which is also
fine, just not handled properly.

While at it, fix some other issues I found while testing:

- Using require inside a goroutine calls runtime.GoExit on failure, meaning
  the test exits immediatelly without doing any cleanup. Use assert in those
  cases.

* don't fail if ignored field is equal

* use a different index name to avoid conflicts

* Update testing/integration/otel_test.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* Update testing/integration/otel_test.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* use assert.Conditionf for error check

---------

Co-authored-by: Khushi Jain <khushi.jain@elastic.co>
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
(cherry picked from commit 82460a2)
mauri870 added a commit that referenced this pull request Apr 28, 2025
…tE2E (#6819) (#8007)

* otel: fix flaky behavior on TestFBOtelRestartE2E

This test starts the collector with a timeout, but the error returned is not
always a context cancelled, sometimes it returns err == nil, which is also
fine, just not handled properly.

While at it, fix some other issues I found while testing:

- Using require inside a goroutine calls runtime.GoExit on failure, meaning
  the test exits immediatelly without doing any cleanup. Use assert in those
  cases.

* don't fail if ignored field is equal

* use a different index name to avoid conflicts

* Update testing/integration/otel_test.go



* Update testing/integration/otel_test.go



* use assert.Conditionf for error check

---------



(cherry picked from commit 82460a2)

Co-authored-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Co-authored-by: Khushi Jain <khushi.jain@elastic.co>
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
mauri870 added a commit that referenced this pull request Apr 28, 2025
…tE2E (#6819) (#8006)

* otel: fix flaky behavior on TestFBOtelRestartE2E

This test starts the collector with a timeout, but the error returned is not
always a context cancelled, sometimes it returns err == nil, which is also
fine, just not handled properly.

While at it, fix some other issues I found while testing:

- Using require inside a goroutine calls runtime.GoExit on failure, meaning
  the test exits immediatelly without doing any cleanup. Use assert in those
  cases.

* don't fail if ignored field is equal

* use a different index name to avoid conflicts

* Update testing/integration/otel_test.go



* Update testing/integration/otel_test.go



* use assert.Conditionf for error check

---------



(cherry picked from commit 82460a2)

Co-authored-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Co-authored-by: Khushi Jain <khushi.jain@elastic.co>
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.19 Automated backport to the 8.19 branch backport-9.0 Automated backport to the 9.0 branch skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Flaky Test]: TestFBOtelRestartE2E – expected the collector to have stopped

10 participants