Refactor uninstall on Windows#13106
Conversation
|
This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
|
a26c20a to
196ecd2
Compare
8d6d62e to
743a105
Compare
|
/test |
|
This pull request is now in conflicts. Could you fix it? 🙏 |
f33de71 to
c7dc618
Compare
e1aed19 to
77ab2c7
Compare
77ab2c7 to
dcc8a29
Compare
dcc8a29 to
7ecd612
Compare
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
|
@blakerouse can you take a look at these changes? |
michalpristas
left a comment
There was a problem hiding this comment.
nice PR @swiatekm
seems it works ok
| _ = windows.CloseHandle(h) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to dispose handle for %q: %w", path, err) | ||
| if err := windows.MoveFileEx(tmpPathPtr, nil, windows.MOVEFILE_DELAY_UNTIL_REBOOT); err != nil { |
There was a problem hiding this comment.
maybe worth a comment that passing a nil as destination will schedule it for deletion.
it is documented in a syscall page, but it would be better to have it duplicated here so not involved person passing by understands on first read
|
@blakerouse if we want to ensure the versioned path is deleted on reboot, we could rename the whole path instead of just the executable. Then there's no problem if the same version is subsequently reinstalled, as the scheduled delete is for the renamed path. Does that sound better? |
|
@swiatekm Yeah I prefer that. |
TL;DR
Remediation
Investigation detailsRoot CauseThe Windows-only code in Evidence
Validation
Follow-up
Note 🔒 Integrity filtering filtered 3 itemsIntegrity filtering activated and filtered the following items during workflow execution.
What is this? | From workflow: PR Actions Detective Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
|
Renaming the whole versioned directory ran into permission problems in some circumstances. I've settled on moving the executable to the root install path and cleaning it up on install. As a result, we only leave the root path and the executable after an uninstall on Windows. |
What are the cons of continuing to use the Go 1.24 and prior approach? Which as far as we have seen does not have the chance that user has to manually clean up. This new solution in it's latest iteration is slightly less complex but has a chance of failure that I don't think existed before. I am not totally sold it's worth the risk to make this change but could be convinced otherwise. |
The con is what caused us to take so much time to diagnose the Go 1.25 breakage: The current implementation is effectively coupled to implementation details of I'd also accept changing the current implementation to explicitly use those syscalls to avoid future breakage. Then we'd be on the hook for maintaining it, but at least it'd be clear what is necessary for the logic to work. Separately, I find the approach in this PR easier to understand in general. I can also try the alternative and we can see how ugly it is. |
I am much more concerned about making sure it works correctly than about how ugly it looks, we have broken the windows uninstall in the past so I more conservative here than in other places. Replicating what |
|
This pull request is now in conflicts. Could you fix it? 🙏 |
|
Closing in favor of #13710. |
What does this PR do?
Refactors how we uninstall Elastic Agent on Windows. The essential problem is that we want to delete an executable file that is running the uninstall command itself, which is not allowed on Windows. We used to have a workaround involving NTFS Alternative Data Streams, which unfortunately stopped working in Go 1.25 due to changes to os.RemoveAll. See the linked issue for details.
The workaround for the Go 1.25 upgrade was to fall back to the Go 1.24 implementation of
os.RemoveAll. In this PR, I'd like to propose an alternative way of carrying out the executable deletion that doesn't depend so much on Windows filesystem particulars.My proposal to fix this is to delete everything we can, rename and move the executable to the root path, and mark it for deletion on reboot. We also make sure to delete this renamed executable path on
installto avoid leaving more than one after repeated install/uninstall chains without reboots. Throughout, we emit warnings for the user in case they prefer to manually delete the remaining data.Potential problems:
install, so blast radius should be limited.Why is it important?
The uninstall process shouldn't depend on implementation details of the Go standard library and should ideally be easy to understand without detailed knowledge of Windows syscalls and filesystem semantics.
Checklist
./changelog/fragmentsusing the changelog tool- [ ] I have added an integration test or an E2E testDisruptive User Impact
After uninstalling, the agent binary now continues to exist on disk until the next reboot, though it's safe to manually delete.
How to test this PR locally
Build agent, install it on Windows, then uninstall.
Related issues