Retry saving Agent information file (fleet.enc) on Windows#9224
Retry saving Agent information file (fleet.enc) on Windows#9224ycombinator merged 6 commits intoelastic:mainfrom
fleet.enc) on Windows#9224Conversation
|
This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
|
|
@pchila @swiatekm Following the feedback on elastic/elastic-agent-libs#333, I'm replacing that PR with this one. At the moment, I've sketched out two alternate implementations in this PR — one that specifically retries saving the Let me know which of these approaches you prefer and I'll get rid of the other one + add tests. Or if you prefer a totally different approach to solving this bug, that's fine too — just let me know. Thanks! |
pchila
left a comment
There was a problem hiding this comment.
I would start by touching only the handler_action_policy_change with a retry and see how that goes (ideally we would need a test that reproduces the issue but as I understand we don't have one yet)
Please have a look at my comment below
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go
Outdated
Show resolved
Hide resolved
fleet.enc file rotation on Windowsfleet.enc) on Windows
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
d835648 to
fd1c7dc
Compare
pchila
left a comment
There was a problem hiding this comment.
I think we can start with retrying only at the handler_policy_change level (without the additional retry loop at EncryptedStore.
There are also a couple of comments about the usage of the backoff library
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go
Outdated
Show resolved
Hide resolved
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go
Outdated
Show resolved
Hide resolved
Can you clarify what you mean by this? In other words, what would you like to see done differently than the current implementation in this PR? |
I would do without the encrypted store retry (I am referring to this part that duplicate the original implementation of the handler retry). The bug we are trying to fix is relative to not being able to write a new |
0d1c73f to
866d457
Compare
@pchila My bad, I just forgot to remove the previous implementation (the one you linked to) which is why I was unsure as to what I was missing 🤦. Thanks for the clarification. Removed the previous implementation in 866d457. |
|
The Moving PR to draft while I investigate this issue. |
e52e8c3 to
7d79815
Compare
pchila
left a comment
There was a problem hiding this comment.
Just a couple of nitpicks, nothing blocking. Please have a look.
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go
Outdated
Show resolved
Hide resolved
internal/pkg/agent/application/actions/handlers/handler_helpers_other.go
Outdated
Show resolved
Hide resolved
pchila
left a comment
There was a problem hiding this comment.
A small change in the debug logs, not blocking
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go
Outdated
Show resolved
Hide resolved
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go
Outdated
Show resolved
Hide resolved
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go
Outdated
Show resolved
Hide resolved
c2e79a5 to
553678b
Compare
|
|
@Mergifyio backport 8.17 8.18 8.19 9.0 9.1 |
💚 Build Succeeded
History
cc @ycombinator |
✅ Backports have been createdDetails
|
* Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes (cherry picked from commit ead9249)
* Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes (cherry picked from commit ead9249)
* Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes (cherry picked from commit ead9249)
* Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes (cherry picked from commit ead9249)
* Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes (cherry picked from commit ead9249)
…#9224) * Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes




What does this PR do?
This PR retries saving the Agent information file,
fleet.enc, on Windows. The retries are done every 50ms for up to 2 seconds.Why is it important?
Saving the Agent information file involves renaming/moving a file to its final destination. However, on Windows, it is sometimes not possible to rename/move a file to its destination file because the destination file is locked by another process (e.g. antivirus software). For such situations, we now retry the save operation on Windows.
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration files./changelog/fragmentsusing the changelog toolI have added an integration test or an E2E testDisruptive User Impact
None.
Related issues