Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent CodeDeploy Agent SSL_connect error #208

Closed
if-igor opened this issue Apr 25, 2019 · 31 comments
Closed

Intermittent CodeDeploy Agent SSL_connect error #208

if-igor opened this issue Apr 25, 2019 · 31 comments
Labels

Comments

@if-igor
Copy link

if-igor commented Apr 25, 2019

We are running the latest version of CodeDeploy Agent on Windows Server 2019 EC2 instances. Occasionally, the agent fails, and the logs are full of NetworkingError SSL_connect. Restarting the Agent resolves the problem until the next such occurrence.

This happens across multiple regions/accounts.

`2019-04-24T19:10:47 DEBUG [codedeploy-agent(2428)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: PollHostCommand: Host Command = nil
2019-04-24T19:10:48 DEBUG [codedeploy-agent(2428)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Calling PollHostCommand:
2019-04-24T19:10:48 INFO [codedeploy-agent(2428)]: Version file found in C:/ProgramData/Amazon/CodeDeploy/.version with agent version OFFICIAL_1.0.1.1597_msi.
2019-04-24T19:11:49 INFO [codedeploy-agent(2428)]: [Aws::CodeDeployCommand::Client 200 60.093372 0 retries] poll_host_command(host_identifier:"instance_arn")

2019-04-24T19:11:49 DEBUG [codedeploy-agent(2428)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: PollHostCommand: Host Command = nil
2019-04-24T19:11:50 DEBUG [codedeploy-agent(2428)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Calling PollHostCommand:
2019-04-24T19:11:50 INFO [codedeploy-agent(2428)]: Version file found in C:/ProgramData/Amazon/CodeDeploy/.version with agent version OFFICIAL_1.0.1.1597_msi.
2019-04-24T19:11:52 INFO [codedeploy-agent(2428)]: [Aws::CodeDeployCommand::Client 0 2.197781 3 retries] poll_host_command(host_identifier:"instance_arn") Seahorse::Client::NetworkingError SSL_connect returned=1 errno=0 state=error: certificate verify failed
`

@rohkat-aws
Copy link
Contributor

amazon-archives/aws-sdk-core-ruby#166
may be this might help .

@if-igor
Copy link
Author

if-igor commented Apr 26, 2019

@rohkat-aws Thanks for the quick response. That link is for the sdk. Not sure how it relates to the CodeDeploy Agent.

@rohkat-aws
Copy link
Contributor

the sdk is used by codeploy-agent @if-igor , i remember something like this had been reported earlier

@rohkat-aws rohkat-aws added the bug label May 28, 2019
@ChristianHartTE
Copy link

Any idea when this will get picked up? We have the same issue as described by if-igor.

@amico412
Copy link

Same here, random failures in CodeDeploy and get the same SSL error in the log. Reboot, log clears up, and deployments start working again.

@tbronchain
Copy link

tbronchain commented Oct 3, 2019

Happening for me too. Quite randomly, after running for a while:

2019-10-03T00:02:32 INFO [codedeploy-agent(1000)]: Version file found in C:/ProgramData/Amazon/CodeDeploy/.version with agent version OFFICIAL_1.0.1.1597_msi.
2019-10-03T00:02:34 INFO [codedeploy-agent(1000)]: [Aws::CodeDeployCommand::Client 0 2.20307 3 retries] poll_host_command(host_identifier:"arn:aws:ec2:us-east-1:[xxx]:instance/i-[xxx]") Seahorse::Client::NetworkingError SSL_connect returned=1 errno=0 state=error: certificate verify failed

2019-10-03T00:02:34 ERROR [codedeploy-agent(1000)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Seahorse::Client::NetworkingError - SSL_connect returned=1 errno=0 state=error: certificate verify failed - C:/Windows/TEMP/ocr113C.tmp/lib/ruby/2.3.0/net/http.rb:933:in `connect_nonblock'

@advanw
Copy link

advanw commented Oct 31, 2019

Is there any update on this issue? This is preventing us from reliably deploying to our instances, as from time to time we have to restart the AWS CodeDeploy Agent service to make it work again.

@James-Pickett
Copy link

We've experienced this issue as well, after rebooting the instance it started working.

@tbronchain
Copy link

Reboot didn't fix it for me. Neither reinstalling did.
The only way I could fix it was to set a cron task to restart the agent every night - which unfortunately causes a few minutes downtime.

Would be great if someone in the agent team (or maybe it's mainly related to the API) could take a look at the issue.

@pieterdw
Copy link

I was in contact with AWS support about this. The reply was very interesting/useful so I'll share it here.

I have done some research internally and found that this error is caused by Windows SilentCleanup deleting temp folders which in turn causes the CodeDeploy agent to encounter errors. The CodeDeploy agent is not yet officially supported on Windows Server 2019 [2], however I found a workaround for this that may work here:

You can prevent the temp folder from being deleted by disabling SilentCleanup on the instance, to do this you can run the following command [3,4]:

schtasks.exe /change /TN “\Microsoft\Windows\DiskCleanup\SilentCleanup” /Disable

However I cannot guarantee that this will fully fix the error, in my tests I was not able to reproduce the error once I disabled SilentCleanup however as the error is intermittent I cannot guarantee it.
The CodeDeploy team is aware of this issue and they are working on finding a long term resolution, I have added your case to the internal ticket regarding this. Currently there is no ETA for when this will be resolved, you can keep an eye on the GitHub issue ticket [1] and the CodeDeploy release history [5] for updates.


[1] #208
[2] https://docs.aws.amazon.com/codedeploy/latest/userguide/codedeploy-agent.html#codedeploy-agent-supported-operating-systems-ec2
[3] https://support.microsoft.com/en-us/help/4015218/activex-is-automatically-deleted-in-the-windows-10-x86-environment#3-3
[4] https://pupuweb.com/windows-server-2019-bug-silentcleanup-delete-temp-tmp-folder/
[5] https://docs.aws.amazon.com/codedeploy/latest/userguide/document-history.html

@sacag
Copy link

sacag commented Dec 3, 2019

Any update on this yet? The suggestion from @pieterdw didn't work for me. I have created a cron task to stop and start the codedeployagent service every night. Let's see if that makes any difference.

@mpidlisnyi
Copy link

quick temporary solution

+++ /opt/codedeploy-agent/vendor/gems/codedeploy-commands-1.0.0/lib/aws/codedeploy_commands.rb	2019-12-09 09:07:52.282086062 -0500
@@ -5,6 +5,7 @@
 require "#{gem_root}/lib/aws/plugins/deploy_control_endpoint"
 require "#{gem_root}/lib/aws/plugins/deploy_agent_version"
+Aws.use_bundled_cert!
 version = '1.0.0'

see amazon-archives/aws-sdk-core-ruby#166

@annamataws
Copy link

Hi @if-igor, @ChristianHart, @amico412, @tbronchain, @advanw

Will you please help us by answering these questions:

  1. Are you still experiencing the issue?
  2. Do you have the latest version of the agent?
  3. Have you tried the fix of @mpidlisnyi? Did it help?
  4. Which AMI for Windows Server 2019 are you using?

Thank you

@ChristianHartTE
Copy link

@annamataws

  1. Yes
  2. We are on version 1.0.1.1597, this looks to be the newest release
  3. No, I am not sure where that snippet goes. Does it get added to a config file?
  4. We have used many Windows Server AMIs with the same issue. The latest one we have deployed is ami-02b212aba9eb84405

@annamataws
Copy link

Hi @ChristianHart ,

Thank you for the help! For number 3, @mpidlisnyi did the fix on a linux box so the path is a lot different on windows. I don't know the exact path on a Windows OS, but, if I find it, I will add it here.

@ChristianHartTE
Copy link

@annamataws any update on this? Do we have a timeline for the fix for server 2019? ++ @smiller63

@ChristianHartTE
Copy link

ChristianHartTE commented Feb 13, 2020

My work around is

  • Create a script which
    • Checks the last 50 lines of the codedeploy-agent-log.txt
    • Restarts the CodeDeploy service if the error text is found
  • Schedule this script in task scheduler with a trigger that repeats every 10 minutes

Script for reading log file

$CodedeployAgentLogFile = "C:\ProgramData\Amazon\CodeDeploy\log\codedeploy-agent-log.txt"

$CodeDeployError = Get-Content -Tail 55 "$CodedeployAgentLogFile" | Select-String "InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Seahorse::Client::NetworkingError - SSL_connect returned=1 errno=0 state=error: certificate verify failed "
$CodeDeployError2 = Get-Content -Tail 55 "$CodedeployAgentLogFile" | Select-String "InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: LoadError - cannot load such file -- seahorse/client/networking_error"

if ($CodeDeployError -or $CodeDeployError2) {
    Write-Host "Connection error found in CodeDeploy Service log, restarting codedeployagent"
    Restart-Service codedeployagent
} else {
    Write-Host "No issues found..."
}

Edit: Updated script with second error text we started seeing.

@annamataws
Copy link

annamataws commented Feb 17, 2020

Hi,

Thank you for posting your work around.
We are working on a new release of the agent that will address this issue and a few others.

@johnhaczewski
Copy link

'We are working on a new release of the agent that will address this issue and a few others.'

  • Any ETA on this new agent version?

@Helen1987
Copy link
Contributor

We'll update the ticket as soon as a new release is ready. We are aiming to deliver a release in next two monthes or so

@JKnowlesuk
Copy link

@ChristianHart Thank you for the script. That was super helpful.

@ChristianHartTE
Copy link

@JamesKnowlesEbsta last week we started seeing a slightly different error. I just updated the script in my previous comment with the change I made to handle that.

@JKnowlesuk
Copy link

JKnowlesuk commented Mar 27, 2020

@ChristianHart I also ran it out in testing so I shorten error down and it picks up other problems also, add in some logging. Although this does not create the directory. The tail 55 might lead to a double tap restart. So I have not perfected for each error. Thank you again for the idea.

<#
   WARNING : Currently need to create a log directory. 
   This Restarts Codedeploy currently catching errors from this issue raised in https://github.com/aws/aws-codedeploy-agent/issues/208
   Thanks to @Christianhart for the idea, see post. 

   Catches at present. 
   "InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: LoadError - cannot load such file -- seahorse/client/networking_error"
   "InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Seahorse::Client::NetworkingError - SSL_connect returned=1 errno=0 state=error: certificate verify failed "

   We run this a task Scheduler with Administrator every 10 mins. 
   Command 
   powershell <filelocation>
   Last updated: 27/03/2020
#>
$CodedeployAgentLogFile = "C:\ProgramData\Amazon\CodeDeploy\log\codedeploy-agent-log.txt"
$FolderPath = "C:\log"
$FilName = "codedeployrestarter.log"
$CodeDeployRestarterLog = $FolderPath + "\" + $FilName

if (!(Test-Path $CodeDeployRestarterLog))
{
New-Item -itemType File -Path $FolderPath -Name ($FilName)
}


$date1 = Get-Date
$dateStamp = $date1.ToString("f")

$CodeDeployError = Get-Content -Tail 55 "$CodedeployAgentLogFile" | Select-String "InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: "

if ($CodeDeployError) {
    Write-Host "Connection error found in CodeDeploy Service log, restarting codedeployagent"
    Add-Content $CodeDeployRestarterLog "$dateStamp : Connection error found in CodeDeploy Service log, restarting codedeployagent"
    Restart-Service codedeployagent
} else {
    Write-Host "No issues found..."
    Add-Content $CodeDeployRestarterLog "$dateStamp : No Issues"
}

@jyeagle
Copy link

jyeagle commented Jun 25, 2020

We are experiencing the same issue at our shop:

2020-06-25T00:42:30 ERROR [codedeploy-agent(1284)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Seahorse::Client::NetworkingError - SSL_connect returned=1 errno=0 state=error: certificate verify failed - C:/Windows/TEMP/ocr714E.tmp/lib/ruby/2.3.0/net/http.rb:933:in `connect_nonblock'

@educoutinho
Copy link

Same issue.

2020-07-07T13:15:08 ERROR [codedeploy-agent(1404)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Seahorse::Client::NetworkingError - SSL_connect returned=1 errno=0 state=error: certificate verify failed - C:/Windows/TEMP/ocrA51A.tmp/lib/ruby/2.3.0/net/http.rb:933:in `connect_nonblock'

Windows Server 2019 v1809 (OS Build 17763.1098)
CodeDeployAgent 1.0.1.1597
Python 3.7.4

@JKnowlesuk
Copy link

JKnowlesuk commented Jul 7, 2020

@Helen1987 Hope this message finds you well. If you need a hand with testing, or coding to move this issue along I would be happy to help. The message you last left sort of implies that you are fixing this issue ?, if that is not the case or there is something you need help with please shout and would be happy to give time. This issue is major for us and we have pretty much rolled back out of 2019 in places.

@Helen1987
Copy link
Contributor

We are in the process of releasing of a new version of CodeDeploy Agent that includes fix for Windows Server 2019. However, we turned off auto-update cron-job for the Agent in previous release. Check release notes. Make sure you are onboarded with SSM Distributor if you want to get the latest version of CodeDeploy Agent.

@vaxinate
Copy link

We are experiencing this same problem or one similar to it. I see that there is a 1.1.1 release that includes support for windows server 2019 on the releases page in this repo. Is there any timeline for when we will see this release available in the codedeploy s3 buckets?

@JKnowlesuk
Copy link

JKnowlesuk commented Jul 16, 2020

@vaxinate I believe they rolled it back, from looking at other issues. @Helen1987 should probably be able to clarify 100%

@Helen1987
Copy link
Contributor

Yeah, we had to rollback version 1.1.1, but we will deliver version 1.1.2 as soon as we can

@AnandarajuCS
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests