-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows service #1070
Windows service #1070
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome. Thanks!
Few comments... mostly text nits.
|
||
const ( | ||
eventLogName = "AmazonECSAgent" | ||
eventLogID = 999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment here explaining that this is a default value so others know its overridden later on a case-by-case basis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it's a constant without being overridden. I'm going to leave it as-is for now and we can come back to making it play more nicely with Windows later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help to list the eventid in the Readme or in a comment so that it is easy to search. Even though windows event log id's are specific to the application and not always well documented, administrators like to be able to find the id quickly in a search.
@@ -86,7 +87,7 @@ func SetLevel(logLevel string) { | |||
// GetLevel gets the log level | |||
func GetLevel() string { | |||
levelLock.RLock() | |||
defer levelLock.RLock() | |||
defer levelLock.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: RUnlock -> Runlock or RunLock??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This look like it's a https://godoc.org/sync#RWMutex, so RUnlock is a reader taking a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUnlock is unlocking the read-lock. This is unrelated to Windows; I just saw gometalinter complain about it while I was working in the package.
README.md
Outdated
```powershell | ||
PS C:\> # Set up directories the agent uses | ||
PS C:\> New-Item -Type directory -Path $ProgramFiles\Amazon\ECS | ||
PS C:\> New-Item -Type directory -Path $ProgramData\Amazon\ECS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New-Item -Type directory -Path **$ENV:**ProgramData\Amazon\ECS -Force
README.md
Outdated
|
||
```powershell | ||
PS C:\> # Set up directories the agent uses | ||
PS C:\> New-Item -Type directory -Path $ProgramFiles\Amazon\ECS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New-Item -Type directory -Path **$ENV:**ProgramFiles\Amazon\ECS -Force
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is New-Item -Type dir -Path ... -Force
like mkdir -p
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you meant -d, (dir) yes.
README.md
Outdated
PS C:\> Invoke-RestMethod -OutFile $zipFile -Uri $agentZipUri | ||
PS C:\> # Put the executables in the executable directory. | ||
PS C:\> Expand-Archive -Path $zipFile -DestinationPath $ecsExeDir -Force | ||
PS C:\> # Uncomment the following line to enable task IAM roles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isnt super clear cause the following line is a comment. Maybe add a bool here for users to change so its more clear? Also... I think you want to cd into the exec dir in either case??
Example:
Set-Location $ecsExeDir
/# Set '$EnableTaskIAMRoles' to '$true' if you wish to use Task IAM Roles.
[bool]$EnableTaskIAMRoles = $false
if ($EnableTaskIAMRoles) {
.\hostsetup.ps1
}
acceptInsecureCertUsage = "Disable SSL certificate verification. We do not recommend setting this option" | ||
licenseUsage = "Print the LICENSE and NOTICE files and exit" | ||
blacholeEC2MetadataUsage = "Blackhole the EC2 Metadata requests. Setting this option can cause the ECS Agent to fail to work properly. We do not recommend setting this option" | ||
ecsAttributesUsage = "Print the Agent's ECS Attributes based on its environment" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This got reordered.
// ReceiveMessage receives a log line from seelog and emits it to the Windows event log | ||
func (r *eventLogReceiver) ReceiveMessage(message string, level seelog.LogLevel, context seelog.LogContextInterface) error { | ||
switch level { | ||
case seelog.DebugLvl, seelog.InfoLvl: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should this be indented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this is Go's spec :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gofmt
does this. I recommend running gofmt
on save every time you modify a Go file.
@@ -61,11 +66,11 @@ func FinalSave(saver statemanager.Saver, taskEngine engine.TaskEngine) error { | |||
engineDisabled := make(chan error) | |||
|
|||
disableTimer := time.AfterFunc(engineDisableTimeout, func() { | |||
engineDisabled <- errors.New("Timed out waiting for TaskEngine to settle") | |||
engineDisabled <- errors.New("final save: timed out waiting for TaskEngine to settle") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Final save:...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is conventional idiomatic go casing for the error message to be lower cased. Here's an example:
agent error: this package errored fetching data: I couldn't reach the server.
This can come from 3 different sources and would look weird if it was capitalized after each :
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is described here.
@@ -75,10 +80,10 @@ func FinalSave(saver statemanager.Saver, taskEngine engine.TaskEngine) error { | |||
|
|||
stateSaved := make(chan error) | |||
saveTimer := time.AfterFunc(finalSaveTimeout, func() { | |||
stateSaved <- errors.New("Timed out trying to save to disk") | |||
stateSaved <- errors.New("final save: timed out trying to save to disk") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Final...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above: idiomatic error messages.
}) | ||
go func() { | ||
log.Debug("Saving state before shutting down") | ||
seelog.Debug("Saving state before shutting down") | ||
stateSaved <- saver.ForceSave() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excuse my not-Go-savviness... Should this have a trap/catch with a log so in the event that the state was not saved correctly... its at least logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The saver.ForceSave
sends its results on down the stateSaved
channel and its up to the other side to determine what to do about that. However.. we could be better on how we log failures and its context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log line is above, on line 83. That function will get triggered on a timeout so we have a record that the save failed.
@samuelkarp #1009 merged =] |
case <-agentDone: | ||
// This means that the agent stopped on its own. This is where it's appropriate to light the event log on fire | ||
// and set off all the alarms. | ||
seelog.Error("Exiting") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add more context in this exit log message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a ton of context here except that the agent stopped on its own... What would you like me to add here?
agent/app/agent_windows.go
Outdated
} | ||
|
||
// sleepCtx provides a cancelable sleep | ||
func sleepCtx(ctx context.Context, duration time.Duration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can instead use the stdlib (or golang.org/x/net) context.WithDeadline
function and wait on ctx.Done()
of that context.
golang.org/x/sys/windows/svc golang.org/x/sys/windows/svc/eventlog
09ea837
to
4aea88d
Compare
e1ea0c7
to
903ec56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the things!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Found some small suggestions and nits.
|
||
```powershell | ||
PS C:\> # Set up directories the agent uses | ||
PS C:\> New-Item -Type directory -Path $ProgramFiles\Amazon\ECS | ||
PS C:\> New-Item -Type directory -Path $ProgramData\Amazon\ECS | ||
PS C:\> New-Item -Type directory -Path ${env:ProgramFiles}\Amazon\ECS -Force |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Even thought it's case insensitive, powershell docs capitalize Env
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave this as-is for now; we can come back and fix it later.
wg.Wait() | ||
} | ||
|
||
func TestHandler_Execute_WindowsStops(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A service has 30 seconds to report back a stopped status, and a 20 seconds to respond to a shutdown event. Can you add an assert to make sure it does not exceed these timeouts?
https://msdn.microsoft.com/en-us/library/windows/desktop/ms685149(v=vs.85).aspx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tests are run with a 25s timeout right now. This completes in about a second. Our unit tests aren't really able to simulate the worst-case right now, so I'd rather not add an assert that won't tell us anything or try to make the unit test different. Instead, I think this can be addressed in a higher-level acceptance test (probably outside the agent itself).
|
||
const ( | ||
eventLogName = "AmazonECSAgent" | ||
eventLogID = 999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help to list the eventid in the Readme or in a comment so that it is easy to search. Even though windows event log id's are specific to the application and not always well documented, administrators like to be able to find the id quickly in a search.
@cabbruzzese Not sure why I can't seem to reply inline to your comment about the event log ID. Is it important to include it in the README even though it's a constant right now? |
Was just looking for the event id to have better visibility on search engines for future debugging. Not a blocker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
>> .\hostsetup.ps1 | ||
>> } | ||
PS C:\> # Install the agent service | ||
PS C:\> New-Service -Name "AmazonECS" ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it confusing the call the ECS Agent service "AmazonECS"? This is the agent. Would it be better to call it "AmazonECSAgent"?
Sorry if this topic came up before, feel free to ignore & point me to any existing discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named it "AmazonECS" so that we'd have the flexibility to swap out a bit in terms of implementation without requiring customers to change the name that they're using or making the name confusing. Specifically, this enables us to potentially transition to running the agent inside a container and using a service on the host to manage the container (similar to what we do with ecs-init on Linux). If we called it "AmazonECSAgent", we could still do that but it could create confusion because the service is separate from the agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AmazonECSService?
I had success when setting the service Startup Type to "Automatic (Delayed Start)". Setting it to "Automatic" caused it to try to start before networking was available and then fail because it couldn't retrieve the instance identity document. For now, I would say that we want to recommend "Automatic (Delayed Start)" and as a follow up later we should implement retries for getting the instance identity document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Had some questions but overall it looks good
// start starts the Agent execution | ||
start() int | ||
// setTerminationHandler sets the termination handler | ||
setTerminationHandler(sighandlers.TerminationHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat!
h.ecsAgent.start() // should block forever, unless there is an error | ||
// TODO: distinguish between recoverable and unrecoverable errors | ||
indicator.agentStopped() | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a defered cancel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. I will be changing some of this in the next PR.
<-derivedCtx.Done() | ||
} | ||
|
||
type termHandlerIndicator struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the way this indicator struct is being used. Out of curiosity -- is this implementing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'm aware of, but I didn't do a very deep search yet.
|
||
func (t *termHandlerIndicator) wait() { | ||
t.mu.Lock() | ||
invoked := t.handlerInvoked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't completely understand how the 'invoked' state is consumed. Is this the only place 'invoked' is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. This is just to determine whether we need to wait on the channel or not.
>> .\hostsetup.ps1 | ||
>> } | ||
PS C:\> # Install the agent service | ||
PS C:\> New-Service -Name "AmazonECS" ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AmazonECSService?
|
||
// ReceiveMessage receives a log line from seelog and emits it to the Windows event log | ||
func (r *eventLogReceiver) ReceiveMessage(message string, level seelog.LogLevel, context seelog.LogContextInterface) error { | ||
switch level { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How tightly is this coupled to seelog? Are we adding significant work for if we want to swap out logging implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Seelog's interface, so it's somewhat tightly coupled. However, the Windows event log interface is pretty simple so it should be easy to change this if we switch to a different logger as long as that logger has a pluggable backend.
t.agentRunning = false | ||
} | ||
|
||
func (t *termHandlerIndicator) done() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would recommend this be called 'finish' or 'end'. This isn’t a strong preference, just that 'done()' is used by the context class for listening instead of terminating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will change in the next PR.
Merging this into the |
Summary
This set of commits enables the agent to be run as a Windows service and write logs into the Windows event log.
Note: This is marked WIP as it will fail to build because I haven't vendored the new dependencies (two more packages insidegolang.org/x/sys/windows
). I'm waiting for #1009 to be merged so I can manage dependencies with dep.Implementation details
agent
structTODO
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
) (as long as you havegolang.org/x/sys/windows on your
GOPATH)make test
) passgo test -timeout=25s ./agent/...
) pass (as long as you havegolang.org/x/sys/windows on your
GOPATH)make run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes: yes
Description for the changelog
Feature - The Amazon ECS agent can now be installed as a Windows service
Licensing
This contribution is under the terms of the Apache 2.0 License: yes