Skip to content

Refactor uninstall on Windows#13106

Closed
swiatekm wants to merge 25 commits into
mainfrom
fix/bump-golang-1.25.1
Closed

Refactor uninstall on Windows#13106
swiatekm wants to merge 25 commits into
mainfrom
fix/bump-golang-1.25.1

Conversation

@swiatekm
Copy link
Copy Markdown
Member

@swiatekm swiatekm commented Mar 10, 2026

What does this PR do?

Refactors how we uninstall Elastic Agent on Windows. The essential problem is that we want to delete an executable file that is running the uninstall command itself, which is not allowed on Windows. We used to have a workaround involving NTFS Alternative Data Streams, which unfortunately stopped working in Go 1.25 due to changes to os.RemoveAll. See the linked issue for details.

The workaround for the Go 1.25 upgrade was to fall back to the Go 1.24 implementation of os.RemoveAll. In this PR, I'd like to propose an alternative way of carrying out the executable deletion that doesn't depend so much on Windows filesystem particulars.

My proposal to fix this is to delete everything we can, rename and move the executable to the root path, and mark it for deletion on reboot. We also make sure to delete this renamed executable path on install to avoid leaving more than one after repeated install/uninstall chains without reboots. Throughout, we emit warnings for the user in case they prefer to manually delete the remaining data.

Potential problems:

  • We're leaving a large executable behind after uninstall, which users may not be happy about. We are emitting a warning about it, and clean up on install, so blast radius should be limited.

Why is it important?

The uninstall process shouldn't depend on implementation details of the Go standard library and should ideally be easy to understand without detailed knowledge of Windows syscalls and filesystem semantics.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
    - [ ] I have added an integration test or an E2E test

Disruptive User Impact

After uninstalling, the agent binary now continues to exist on disk until the next reboot, though it's safe to manually delete.

How to test this PR locally

Build agent, install it on Windows, then uninstall.

Related issues

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 10, 2026

This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 12, 2026

⚠️ The sha of the head commit of this PR conflicts with #10156. Mergify cannot evaluate rules on this PR. Once #10156 is merged or closed, Mergify will resume processing this PR. ⚠️

@VihasMakwana
Copy link
Copy Markdown
Contributor

/test

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 13, 2026

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix/bump-golang-1.25.1 upstream/fix/bump-golang-1.25.1
git merge upstream/main
git push upstream fix/bump-golang-1.25.1

@swiatekm swiatekm force-pushed the fix/bump-golang-1.25.1 branch from f33de71 to c7dc618 Compare March 13, 2026 11:56
@swiatekm swiatekm changed the title Fix/bump golang 1.25.1 Fix uninstall on Windows Mar 13, 2026
@swiatekm swiatekm force-pushed the fix/bump-golang-1.25.1 branch 3 times, most recently from e1aed19 to 77ab2c7 Compare March 16, 2026 16:04
@swiatekm swiatekm changed the title Fix uninstall on Windows Refactor uninstall on Windows Mar 17, 2026
@swiatekm swiatekm force-pushed the fix/bump-golang-1.25.1 branch from 77ab2c7 to dcc8a29 Compare March 17, 2026 11:01
@swiatekm swiatekm force-pushed the fix/bump-golang-1.25.1 branch from dcc8a29 to 7ecd612 Compare March 17, 2026 13:54
@swiatekm swiatekm added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Mar 17, 2026
@swiatekm swiatekm marked this pull request as ready for review March 17, 2026 15:05
@swiatekm swiatekm requested a review from a team as a code owner March 17, 2026 15:05
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@ebeahan
Copy link
Copy Markdown
Member

ebeahan commented Mar 17, 2026

@blakerouse can you take a look at these changes?

michalpristas
michalpristas previously approved these changes Mar 18, 2026
Copy link
Copy Markdown
Contributor

@michalpristas michalpristas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice PR @swiatekm
seems it works ok

_ = windows.CloseHandle(h)
if err != nil {
return fmt.Errorf("failed to dispose handle for %q: %w", path, err)
if err := windows.MoveFileEx(tmpPathPtr, nil, windows.MOVEFILE_DELAY_UNTIL_REBOOT); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth a comment that passing a nil as destination will schedule it for deletion.
it is documented in a syscall page, but it would be better to have it duplicated here so not involved person passing by understands on first read

@swiatekm swiatekm added the enhancement New feature or request label Apr 2, 2026
leehinman
leehinman previously approved these changes Apr 2, 2026
Copy link
Copy Markdown
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@swiatekm
Copy link
Copy Markdown
Member Author

swiatekm commented Apr 3, 2026

@blakerouse if we want to ensure the versioned path is deleted on reboot, we could rename the whole path instead of just the executable. Then there's no problem if the same version is subsequently reinstalled, as the scheduled delete is for the renamed path. Does that sound better?

@blakerouse
Copy link
Copy Markdown
Contributor

@swiatekm Yeah I prefer that.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

TL;DR

golangci-lint failed in lint (windows-latest) because scheduleDeleteOnReboot returns nil from a filepath.WalkDir error callback, which triggers nilerr (internal/pkg/agent/install/uninstall_windows.go:78). Replace that return nil with a non-nil error path.

Remediation

  • In internal/pkg/agent/install/uninstall_windows.go (scheduleDeleteOnReboot), change the callback branch:
    • from: if err != nil { return nil // skip inaccessible entries }
    • to: return a non-nil value (for example return err), or explicitly handle/log and return a wrapped non-nil error if you want to fail uninstall scheduling clearly.
  • Re-run the golangci-lint workflow for this PR after the change.
Investigation details

Root Cause

The Windows-only code in scheduleDeleteOnReboot uses filepath.WalkDir. In the callback, when err != nil, it currently returns nil, which satisfies the nilerr lint rule condition (“error is not nil but returns nil”).

Evidence

##[error]internal\pkg\agent\install\uninstall_windows.go:78:4: error is not nil (line 76) but it returns nil (nilerr)
		return nil // skip inaccessible entries
##[error]issues found
  • Affected code at PR head (2ce5852185daaf9e535348a6f40e49012701f7c6): internal/pkg/agent/install/uninstall_windows.go around lines 76-79.

Validation

  • Not run locally in this workflow context (read-only investigation).

Follow-up

  • I could not reliably read prior PR comments in this environment due integrity restrictions, so I could not compare this diagnosis against the latest prior detective report.

Note

🔒 Integrity filtering filtered 3 items

Integrity filtering activated and filtered the following items during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.

  • pr:Refactor uninstall on Windows #13106 (pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
  • resource:get_job_logs (get_job_logs: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
  • issue:elastic/elastic-agent#unknown (search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)

What is this? | From workflow: PR Actions Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@swiatekm swiatekm requested a review from leehinman April 10, 2026 14:48
@swiatekm
Copy link
Copy Markdown
Member Author

Renaming the whole versioned directory ran into permission problems in some circumstances. I've settled on moving the executable to the root install path and cleaning it up on install. As a result, we only leave the root path and the executable after an uninstall on Windows.

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Apr 10, 2026

My proposal to fix this is to delete everything we can, rename and move the executable to the root path, and mark it for deletion on reboot. We also make sure to delete this renamed executable path on install to avoid leaving more than one after repeated install/uninstall chains without reboots. Throughout, we emit warnings for the user in case they prefer to manually delete the remaining data.

What are the cons of continuing to use the Go 1.24 and prior approach? Which as far as we have seen does not have the chance that user has to manually clean up.

This new solution in it's latest iteration is slightly less complex but has a chance of failure that I don't think existed before. I am not totally sold it's worth the risk to make this change but could be convinced otherwise.

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Apr 10, 2026

⏳ Build in-progress, with failures

Failed CI Steps

History

cc @swiatekm

@swiatekm
Copy link
Copy Markdown
Member Author

My proposal to fix this is to delete everything we can, rename and move the executable to the root path, and mark it for deletion on reboot. We also make sure to delete this renamed executable path on install to avoid leaving more than one after repeated install/uninstall chains without reboots. Throughout, we emit warnings for the user in case they prefer to manually delete the remaining data.

What are the cons of continuing to use the Go 1.24 and prior approach? Which as far as we have seen does not have the chance that user has to manually clean up.

This new solution in it's latest iteration is slightly less complex but has a chance of failure that I don't think existed before. I am not totally sold it's worth the risk to make this change but could be convinced otherwise.

The con is what caused us to take so much time to diagnose the Go 1.25 breakage: The current implementation is effectively coupled to implementation details of os.Remove and similar library functions. The ADS rename trick we use requires that the files and directories are deleted using particular Windows syscalls.

I'd also accept changing the current implementation to explicitly use those syscalls to avoid future breakage. Then we'd be on the hook for maintaining it, but at least it'd be clear what is necessary for the logic to work.

Separately, I find the approach in this PR easier to understand in general. I can also try the alternative and we can see how ugly it is.

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Apr 13, 2026

The con is what caused us to take so much time to diagnose the Go 1.25 breakage: The current implementation is effectively coupled to implementation details of os.Remove and similar library functions. The ADS rename trick we use requires that the files and directories are deleted using particular Windows syscalls.

I'd also accept changing the current implementation to explicitly use those syscalls to avoid future breakage. Then we'd be on the hook for maintaining it, but at least it'd be clear what is necessary for the logic to work.

Separately, I find the approach in this PR easier to understand in general. I can also try the alternative and we can see how ugly it is.

I am much more concerned about making sure it works correctly than about how ugly it looks, we have broken the windows uninstall in the past so I more conservative here than in other places.

Replicating what os.Remove does with the system calls directly and clearly documenting it sounds fine to me. I do like the delete on reboot approach as a way to completely avoid dealing with the "Access is denied" failure mode trying to remove a running binary, but I don't love any change that would regularly leave files behind if there's a way to avoid that.

@swiatekm swiatekm marked this pull request as draft April 24, 2026 11:52
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 24, 2026

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix/bump-golang-1.25.1 upstream/fix/bump-golang-1.25.1
git merge upstream/main
git push upstream fix/bump-golang-1.25.1

@swiatekm
Copy link
Copy Markdown
Member Author

Closing in favor of #13710.

@swiatekm swiatekm closed this Apr 29, 2026
@swiatekm swiatekm deleted the fix/bump-golang-1.25.1 branch April 29, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-skip enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Troubleshoot and refactor uninstall on Windows with Go 1.25+

9 participants