-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Agent failed to upgrade from 8.4.2 to 8.5.0 BC1 for MAC 12 agent using agent binary. #1392
Conversation
🌐 Coverage report
|
linter fails cause I touched some files to refactor and extract the app bundle binary path building.
not going to introduce more changes into this PR in order to keep it's scope limited to only relevant changes. can fix the linter when backporting to main. |
/package |
can you verify
|
errors.M("destination", paths.ShellWrapperPath)) | ||
} | ||
} | ||
err = os.Symlink("/Library/Elastic/Agent/elastic-agent", paths.ShellWrapperPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with:
filepath.Join(paths.InstallPath, paths.BinaryName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will leave as is for now, even though the change is trivial, because:
- previously this path was hardcode in the shell wrapper script as is
// ShellWrapper is the wrapper that is installed.
ShellWrapper = `#!/bin/sh
exec /Library/Elastic/Agent/elastic-agent $@
`
- this is a platform specific for Darwin so path separators and the path are fixed
can change (and would have to retest) when merging to the main branch, not for the urgent fix
/packaging |
The laundchd had no issues executing the unsigned shell script in all of the previous releases.
The content of that file was
That was one of the issues before for FDA, so the FDA could not even be given by the agent signature for example because the root process for launch was The commit that was reverted and then reverted back here also fixed that, the Also in upgrade testing that I did so far the newly introduced script that fixes up the symlink was working fine as well.
The signature of the app bundle was already checked when the change for app bundle was introduced back in August, was able to download the snapshot build from artifactory, check the signature, and use that signature to give FDA to the agent with MDM policy. |
run end-to-end tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested
- upgrade from 8.4 to 8.5
- from 8.5 to future release
- rollback from 8.5 to 8.4
seems ok
let's see how e2e test ends up
lint errors do not appear difficult to fix as well. please address them
@@ -151,6 +177,9 @@ func findDirectory() (string, error) { | |||
// executable path is being reported as being down inside of data path | |||
// move up to directories to perform the copy | |||
sourceDir = filepath.Dir(filepath.Dir(sourceDir)) | |||
if runtime.GOOS == darwin { | |||
sourceDir = filepath.Dir(filepath.Dir(filepath.Dir(sourceDir))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check we're inside *.app
dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? the agent binary will always be inside of the app bundle with this change, it's not going to be installed outside of the app bundle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah right if it's inside data it cannot be root executable.
I was just trying to prevent some unintentional mixup of executables execution as this is taken from os.Executable path. being defensive on the change so it won't surprise us
yeah, they are not difficult to fix, but it widens the scope of the changes. Even what seems to be the trivial changes need full regression testing. I'm trying to keep the scope of the changes minimal here for 8.5. |
e2e tests |
Got in touch on the details of JAMF install, looks like it's always a new install with JAMF (at least with out IT) and then upgrade through the fleet path. So this approach should work ok without a need of any updates for JAMF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, my primary concerns are mostly comments/documentation and testing. We can handle those as follow up changes if you want to get this merged to unblock further manual testing given where we are in the release cycle.
@@ -172,10 +177,15 @@ func SetInstall(path string) { | |||
// initialTop returns the initial top-level path for the binary | |||
// | |||
// When nested in top-level/data/elastic-agent-${hash}/ the result is top-level/. | |||
// The agent fexecutable for MacOS is wrappend in the bundle, so the path to the binary is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The agent fexecutable for MacOS is wrappend in the bundle, so the path to the binary is | |
// The agent executable for MacOS is wrapped in the bundle, so the path to the binary is |
@@ -28,13 +37,21 @@ func RunningInstalled() bool { | |||
// executable path is being reported as being down inside of data path | |||
// move up to directories to perform the comparison | |||
execDir = filepath.Dir(filepath.Dir(execDir)) | |||
if runtime.GOOS == darwin { | |||
execDir = filepath.Dir(filepath.Dir(filepath.Dir(execDir))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here documenting that you are stripping off the elastic-agent.app/Contents/MacOS
paths? Either that or make a small helper function for this, I see the same logic in paths/common.go.
Those who haven't memorized the application bundle structure on Mac might not recognize what this is doing.
) | ||
|
||
func TestIsInsideData(t *testing.T) { | ||
validExePath := paths.BinaryDir(filepath.Join("data", fmt.Sprintf("elastic-agent-%s", release.ShortCommit()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you explicitly include a case that contains the MacOS bundle path?
@@ -23,7 +23,8 @@ func ChangeSymlink(ctx context.Context, log *logger.Logger, targetHash string) e | |||
hashedDir := fmt.Sprintf("%s-%s", agentName, targetHash) | |||
|
|||
symlinkPath := filepath.Join(paths.Top(), agentName) | |||
newPath := filepath.Join(paths.Top(), "data", hashedDir, agentName) | |||
|
|||
newPath := paths.BinaryPath(filepath.Join(paths.Top(), "data", hashedDir), agentName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we forget paths.BinaryPath
here (or elsewhere) to wrap this is there a test that fails? If not, is there a unit test we could write?
I am concerned the need for this won't be obvious to the next engineer to touch this code, and it could be forgotten or removed in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the same thing in service_darwin.go, so this seems to be necessary in lots of places now but I am not sure if anything enforces the need for it.
func invokeCmd(topPath string) *exec.Cmd {
homeExePath := paths.BinaryPath(topPath, agentName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah currently the logic is scattered in different places, so we call these wrapper func in few different places. Relying on e2e tests to catch any issues more or less.
It's the same issue with the code before this change actually, nothing is enforcing the right path to be used.
That’s how I found all of the places, changing in on place and finding the next places where it breaks while running e2e upgrades.
I'll address the comments when backporting to main branch.
err, | ||
fmt.Sprintf("failed to write shell wrapper (%s)", paths.ShellWrapperPath), | ||
errors.M("destination", paths.ShellWrapperPath)) | ||
// Install symlink for darwin instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we explain why we need to do this? Let's document the reason we need this symlink in the code or in the templates/darwin/elastic-agent.tmpl
script itself for future engineers to understand why this was necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this change the agent used the shell script wrapper to start the agent, this posed an issue with FDA cause the root process of the launchd service was sh
instead of the elastic-agent, so in order to grant FDA the FDA needed to be granted to /bin/sh
. Here we replace that wrapper shell script with symlink, so the root process of the service is the elastic-agent
. Will add more comments here for main branch backports.
looks like e2e failed on setup
|
/test |
What does this PR do?
This workaround is ONLY activated when upgrading from pre 8.5 version to 8.5+ on for Mac OS distribution.
The workaround is not used or affects the clean install, nor the it's not going used for any upgrades after 8.5.
For upgrades after 8.5 the symlink will be correctly established from the upgrade handle:
Here is how is how the workaround works for let say the upgrade from 8.4 to 8.5.
elastic-agent
binary location. The sole purpose of the shell script is to fix the symlink created by pre 8.5 versions of upgrade handle.elastic-agent
shell script.elastic-agent
shell script that fixes up the symlink to point correctly to the binary inside of the elastic agent app bundle and exits.elastic-agent
comes up and the upgrade completes.I tested the upgrades on all 3 platforms macOS and as a regression test on linux and windows:
This would need to be backported to main branch as well.
Why is it important?
Addresses the issue: "Agent failed to upgrade from 8.4.2 to 8.5.0 BC1 for MAC 12 agent using agent binary."
#1298
Checklist
How to test this PR locally
I tested these changes with the local builds that had the artifactory download/verify code disabled so I could drop the newer locally built elastic-agent binaries into the downloads folder.
Related issues
Screenshots
MacOS upgrade from 8.4 to 8.5
MacOS upgrade from 8.5 to 8.6
Linux (Ubuntu) upgrade from 8.4 to 8.5, regression test
Linux (Ubuntu) upgrade from 8.5 to 8.6, regression test
Windows (10) upgrade from 8.4 to 8.5, regression test
Windows (10) upgrade from 8.5 to 8.6, regression test