Skip to content

[receiver/awscloudwatch]: fix issue with polling of deleted log group while polling#40571

Merged
andrzej-stencel merged 6 commits into
open-telemetry:mainfrom
schmikei:fix/log-group-deletion
Jul 14, 2025
Merged

[receiver/awscloudwatch]: fix issue with polling of deleted log group while polling#40571
andrzej-stencel merged 6 commits into
open-telemetry:mainfrom
schmikei:fix/log-group-deletion

Conversation

@schmikei

@schmikei schmikei commented Jun 9, 2025

Copy link
Copy Markdown
Contributor

Description

Resolves #38940

If a log group was deleted while a poll was being executed we were silently failing after the first log message. While examining the code I don't believe we were failing on future polls but can imagine that this change is an improvement as it now radiates a tad bit more on the pollForLogs function as it now actually returns up errors.

Link to tracking issue

Fixes #38940

Testing

Unit testing the situation described in the issue

Documentation

changelog entry

@schmikei schmikei marked this pull request as ready for review June 10, 2025 15:07
@schmikei schmikei requested a review from a team as a code owner June 10, 2025 15:07
@schmikei

Copy link
Copy Markdown
Contributor Author

Hi @sousa-andre, I reviewed your PR in #40537 and wanted to propose a targeted change that I believe addresses the issue. Would you mind taking a look here and letting me know if this resolves the problem on your end?

You're also welcome to incorporate these changes into your PR if you prefer. I've added some tests that I believe reflect the scenario you were encountering—specifically, deleted log groups during polling—so I'm hopeful this will help resolve it.

@sousa-andre

Copy link
Copy Markdown
Contributor

Hi @sousa-andre, I reviewed your PR in #40537 and wanted to propose a targeted change that I believe addresses the issue. Would you mind taking a look here and letting me know if this resolves the problem on your end?

You're also welcome to incorporate these changes into your PR if you prefer. I've added some tests that I believe reflect the scenario you were encountering—specifically, deleted log groups during polling—so I'm hopeful this will help resolve it.

Hi @schmikei, thank you for keeping me in the loop! Your changes are similar to mine and fix the error where the receiver gets suck. I think it would be nice for the receiver to print every time the groups are updated (which is what my updateGroupRequests does) and not only when the receiver would crash/is currently reading from the log group.

Having this said, I'm happy to get either PR closed, merged or incorporated as you see fit!
(I've fixed the linting errors from my PR)

@schmikei

Copy link
Copy Markdown
Contributor Author

I think it would be nice for the receiver to print every time the groups are updated (which is what my updateGroupRequests does) and not only when the receiver would crash/is currently reading from the log group.

Sweet thank you! I think I'd prefer if we were to be logging that information for it to be within discoverGroups and perhaps change the scraper's stored groupRequests to be a map[string]*groupRequest.

But I'd argue thats a larger refactor than what I'd like to take on at the moment (at least specifically in this PR) but feel free to pursue a separate PR and I'd be happy to lend a review :)

@sousa-andre

sousa-andre commented Jun 10, 2025

Copy link
Copy Markdown
Contributor

Sweet thank you! I think I'd prefer if we were to be logging that information for it to be within discoverGroups and perhaps change the scraper's stored groupRequests to be a map[string]*groupRequest.

For my point of view, is better for the discoverGroups to contain only the AWS requests logic, while the assignment should live elsewhere (perhaps this snippet could be extracted into a new function).
I do agree that moving groupRequests into a map would be beneficial, but as you mentioned, it would require a larger refactor that is probably not worth the lift for such a "small" gain such as improved visibility.

You know what’s best here, so I’ll leave the decision up to you!

@schmikei

Copy link
Copy Markdown
Contributor Author

As a codeowner I think its not worth the extra CPU cycles on logging a diff if an error is going to propogate upward for this case and the receiver will recover on the next autodiscover.

@dmitryax or @fatsheep9146 (just pinging off auto-assign) if either of you have a moment to look over the changes in this PR, I know a couple of people who'd like to use this fix :)

@github-actions

github-actions Bot commented Jul 1, 2025

Copy link
Copy Markdown
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions Bot added the Stale label Jul 1, 2025
@mat-rumian

Copy link
Copy Markdown
Contributor

Hello :)
Any updates on this PR? This fix would make many people life easier :)

@andrzej-stencel andrzej-stencel merged commit 71f36fd into open-telemetry:main Jul 14, 2025
177 checks passed
@github-actions github-actions Bot added this to the next release milestone Jul 14, 2025
@schmikei schmikei deleted the fix/log-group-deletion branch July 14, 2025 13:25
Dylan-M pushed a commit to Dylan-M/opentelemetry-collector-contrib that referenced this pull request Aug 5, 2025
… while polling (open-telemetry#40571)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

Resolves open-telemetry#38940

If a log group was deleted while a poll was being executed we were
silently failing after the first log message. While examining the code I
don't believe we were failing on future polls but can imagine that this
change is an improvement as it now radiates a tad bit more on the
`pollForLogs` function as it now actually returns up errors.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#38940


<!--Describe what testing was performed and which tests were added.-->
#### Testing

Unit testing the situation described in the issue

#### Documentation

changelog entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[receiver/awscloudwatch] Inerrupted logs polling in case of deleted log group

5 participants