Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(custom-resource): implement lambda function that validate certificate and add aliases for NLB #3057

Merged
merged 19 commits into from
Nov 23, 2021
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion cf-custom-resources/lib/custom-domain-app-runner.js
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,6 @@ exports.withSleep = function (s) {
};
exports.reset = function () {
sleep = defaultSleep;

};
exports.withDeadlineExpired = function (d) {
exports.deadlineExpired = d;
Expand Down
301 changes: 301 additions & 0 deletions cf-custom-resources/lib/nlb-cert-validator-updater.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0

const AWS = require('aws-sdk');
const ATTEMPTS_VALIDATION_OPTIONS_READY = 10;
const ATTEMPTS_RECORD_SETS_CHANGE = 10;
const DELAY_RECORD_SETS_CHANGE = 30;
const ATTEMPTS_CERTIFICATE_VALIDATED = 19;
const DELAY_CERTIFICATE_VALIDATED = 30;
Lou1415926 marked this conversation as resolved.
Show resolved Hide resolved

let acm, envRoute53, envHostedZoneID, appName, envName, certificateDomain;
Lou1415926 marked this conversation as resolved.
Show resolved Hide resolved
let defaultSleep = function (ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
};
let sleep = defaultSleep;
let random = Math.random;

/**
* Upload a CloudFormation response object to S3.
*
* @param {object} event the Lambda event payload received by the handler function
* @param {object} context the Lambda context received by the handler function
* @param {string} responseStatus the response status, either 'SUCCESS' or 'FAILED'
* @param {string} physicalResourceId CloudFormation physical resource ID
* @param {object} [responseData] arbitrary response data object
* @param {string} [reason] reason for failure, if any, to convey to the user
* @returns {Promise} Promise that is resolved on success, or rejected on connection error or HTTP error response
*/
function report (
event,
context,
responseStatus,
physicalResourceId,
responseData,
reason
) {
return new Promise((resolve, reject) => {
const https = require("https");
const { URL } = require("url");

let reasonWithLogInfo = `${reason} (Log: ${context.logGroupName}/${context.logStreamName})`;
var responseBody = JSON.stringify({
Status: responseStatus,
Reason: reasonWithLogInfo,
PhysicalResourceId: physicalResourceId || context.logStreamName,
StackId: event.StackId,
RequestId: event.RequestId,
LogicalResourceId: event.LogicalResourceId,
Data: responseData,
});

const parsedUrl = new URL(event.ResponseURL);
const options = {
hostname: parsedUrl.hostname,
port: 443,
path: parsedUrl.pathname + parsedUrl.search,
method: "PUT",
headers: {
"Content-Type": "",
"Content-Length": responseBody.length,
},
};

https
.request(options)
.on("error", reject)
.on("response", (res) => {
res.resume();
if (res.statusCode >= 400) {
reject(new Error(`Error ${res.statusCode}: ${res.statusMessage}`));
} else {
resolve();
}
})
.end(responseBody, "utf8");
});
}

exports.handler = async function (event, context) {
const props = event.ResourceProperties;

let {LoadBalancerDNS: loadBalancerDNS,
LoadBalancerHostedZoneID: loadBalancerHostedZoneID,
ServiceName: serviceName,
DomainName: domainName,
} = props;
let aliases = new Set(props.Aliases);
Lou1415926 marked this conversation as resolved.
Show resolved Hide resolved

acm = new AWS.ACM();
envRoute53 = new AWS.Route53();
envHostedZoneID = props.EnvHostedZoneId;
envName = props.EnvName;
appName = props.AppName;
certificateDomain = `${serviceName}-nlb.${envName}.${appName}.${domainName}`;

let aliasesSorted = [...aliases].sort().join(",");
const physicalResourceID = `/${serviceName}/${aliasesSorted}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do you know if there is a limit on the length of the physical resource ID? Would you mind investigating that, I tried a quick search but couldn't find anything might be worthwhile experimenting.

  2. Using the aliases in the resource ID makes total sense because we want to delete the old certificate right if the alias changes? if so can we add that as a short comment here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer 2.:
Yes. For each unique combination of [service name + alias names], the [certificate ('s subject alternative name) + validation records + A records] needed should be the same. If there is no change in [service name + alias names], then the certificate shouldn't change, neither do the validation records and the A records; on the contrary, if a change is detected in [service name + alias names], then a replacement surely needs to happen to at least one of the three resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked offline but I'm posting here so that this info is shared:
Answering 1.: There is virtually no limit on Physical Resource ID length - tried a 800-character long ID, and it was fine. However, if the ID is too long, the response object sent from the lambda will be too big, causing the custom resource to fail.


let handler = async function() {
switch (event.RequestType) {
case "Create":
efekarakus marked this conversation as resolved.
Show resolved Hide resolved
await validateAliases(aliases, loadBalancerDNS);
const certificateARN = await requestCertificate(aliases, physicalResourceID);
const options = await waitForValidationOptionsToBeReady(certificateARN, aliases);
await activate(options, certificateARN, loadBalancerDNS, loadBalancerHostedZoneID);
break;
case "Update":
case "Delete":
default:
throw new Error(`Unsupported request type ${event.RequestType}`);
}
};

try {
await handler();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we Promise.race the handler with a 13-14 min timeout so that we avoid the scenario where the custom resource is stuck for 3h:

Set reasonable timeout periods, and report when they're about to be exceeded
If an operation doesn't run within its defined timeout period, the function raises an exception and no response is sent to AWS CloudFormation.
https://aws.amazon.com/premiumsupport/knowledge-center/best-practices-custom-cf-lambda/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I was planning to add the lambda time out later since it introduces more testing logic into this PR, but I can do it now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

await report(event, context, "SUCCESS", physicalResourceID);
} catch (err) {
console.log(`Caught error for service ${serviceName}: ${err.message}`);
await report(event, context, "FAILED", physicalResourceID, null, err.message);
}
};

/**
* Validate that the aliases are not in use.
*
* @param {Set<String>} aliases for the service.
* @param {String} loadBalancerDNS the DNS of the service's load balancer.
* @throws error if at least one of the aliases is not valid.
*/
async function validateAliases(aliases, loadBalancerDNS) {
for (let alias of aliases) {
const data = await envRoute53.listResourceRecordSets({
HostedZoneId: envHostedZoneID,
MaxItems: "1",
StartRecordName: alias,
}).promise();

let recordSet = data["ResourceRecordSets"];
if (!recordSet || recordSet.length === 0) {
continue; // The alias record does not exist in the hosted zone, hence valid.
}
if (recordSet[0].AliasTarget && recordSet[0].AliasTarget.DNSName === `${loadBalancerDNS}.`) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (recordSet[0].AliasTarget && recordSet[0].AliasTarget.DNSName === `${loadBalancerDNS}.`) {
if (recordSet[0].AliasTarget.DNSName === `${loadBalancerDNS}.`) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an ECMA 2020 syntax (by default we are using ECMA 5 I think). I tried enabling the ECMA 2020 parser but it still fails to acknowledge the optional chaining syntax. I will use this syntax after I figure out how to really enable ECMA 2020 :)

continue; // The record is an alias record and is in use by myself, hence valid.
}
throw new Error(`alias ${alias} is in use`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe alias is in used by xxx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure if we should include the record value or alias target in the error message 🤔 If the custom domain is in use, it's possible that it's in use by another service (hence the alias target is a copilot LB's DNS); however, it's also possible that the record is a non-alias record, pointing to whatever address.

}
}

/**
* Requests a public certificate from AWS Certificate Manager, using DNS validation.
*
* @param {Set<String>} aliases the subject alternative names for the certificate.
* @param {String} physicalResourceID
* @return {String} The ARN of the requested certificate.
*/
async function requestCertificate(aliases, physicalResourceID) {
const { CertificateArn } =await acm.requestCertificate({
Lou1415926 marked this conversation as resolved.
Show resolved Hide resolved
DomainName: certificateDomain,
IdempotencyToken: physicalResourceID,
SubjectAlternativeNames: aliases.size === 0? null: [...aliases],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it throw an error if we pass in an empty array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SubjectAlternativeNames needs to either be specified with an array with length >= 1, or not specified at all (by not including the key, or having null as its value). So it'll throw an error if we pass in an empty array, but it won't if we pass in a null.

Tags: [
{
Key: "copilot-application",
Value: appName,
},
{
Key: "copilot-environment",
Value: envName,
},
Lou1415926 marked this conversation as resolved.
Show resolved Hide resolved
],
ValidationMethod: "DNS",
}).promise();
return CertificateArn;
}

/**
* Wait until the validation options are ready
*
* @param certificateARN
* @param {Set<String>} aliases for the service.
*/
async function waitForValidationOptionsToBeReady(certificateARN, aliases) {
let expectedCount = aliases.size + 1; // Expect one validation option for each alias and the cert domain.

let attempt; // TODO: This wait loops could be further abstracted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess when we add update/delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in a following PR - especially because DELETE also needs to have a wait loop like this. I think it may improve code's readability if this can be abstracted.

for (attempt = 0; attempt < ATTEMPTS_VALIDATION_OPTIONS_READY; attempt++) {
let readyCount = 0;
const { Certificate } = await acm.describeCertificate({
CertificateArn: certificateARN,
}).promise();
const options = Certificate.DomainValidationOptions || [];
options.forEach(option => {
if (option.ResourceRecord && (aliases.has(option.DomainName) || option.DomainName === certificateDomain)) {
readyCount++;
}
})
if (readyCount === expectedCount) {
return options;
}

// Exponential backoff with jitter based on 200ms base
// component of backoff fixed to ensure minimum total wait time on
// slow targets.
const base = Math.pow(2, attempt);
await sleep(random() * base * 50 + base * 150);
}
throw new Error(`resource validation records are not ready after ${attempt} tries`);
}

/**
* Validate the certificate and insert the alias records
*
* @param {Array<Object>} validationOptions
* @param {String} certificateARN
* @param {String} loadBalancerDNS
* @param {String} loadBalancerHostedZone
*/
async function activate(validationOptions, certificateARN, loadBalancerDNS, loadBalancerHostedZone) {
let promises = [];
for (let option of validationOptions) {
promises.push(activateOption(option, loadBalancerDNS, loadBalancerHostedZone));
}
await Promise.all(promises);
console.log("finished upserting records");
Lou1415926 marked this conversation as resolved.
Show resolved Hide resolved

await acm.waitFor("certificateValidated", {
// Wait up to 9 minutes and 30 seconds
$waiter: {
delay: DELAY_CERTIFICATE_VALIDATED,
maxAttempts: ATTEMPTS_CERTIFICATE_VALIDATED,
},
CertificateArn: certificateARN,
}).promise();
}

/**
* Upsert the validation record for the alias, as well as adding the A record if the alias is not the default certificaite domain.
Lou1415926 marked this conversation as resolved.
Show resolved Hide resolved
*
* @param {Object} option
* @param {String} loadBalancerDNS
* @param {String} loadBalancerHostedZone
*/
async function activateOption(option, loadBalancerDNS, loadBalancerHostedZone) {
let changes = [{
Action: "UPSERT",
ResourceRecordSet: {
Name: option.ResourceRecord.Name,
Type: option.ResourceRecord.Type,
TTL: 60,
ResourceRecords: [
{
Value: option.ResourceRecord.Value,
},
],
}
}];

if (option.DomainName !== certificateDomain) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me to understand the domain name is indeed updated when any alias is set right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the option.DomainName would be the alias domain name.

The option is given by acm when it provides us with the validation options for each aliases. Each option should contain at least the following information:

  1. The domain it's validated (i.e. the alias)
  2. The validation record name
  3. The validation record value

changes.push({
Action: "UPSERT", // It is validated that if the alias is in use, it is in use by the service itself.
ResourceRecordSet: {
Name: option.DomainName,
Type: "A",
AliasTarget: {
DNSName: loadBalancerDNS,
EvaluateTargetHealth: true,
HostedZoneId: loadBalancerHostedZone,
}
}
});
}

let { ChangeInfo } = await envRoute53.changeResourceRecordSets({
ChangeBatch: {
Comment: "Validate the certificate and create A record for the alias",
Changes: changes,
},
HostedZoneId: envHostedZoneID,
}).promise();
console.log(`change info for ${option.DomainName}: ${ChangeInfo}`)

await envRoute53.waitFor('resourceRecordSetsChanged', {
// Wait up to 5 minutes
$waiter: {
delay: DELAY_RECORD_SETS_CHANGE,
maxAttempts: ATTEMPTS_RECORD_SETS_CHANGE,
},
Id: ChangeInfo.Id,
}).promise();
}


exports.withSleep = function (s) {
sleep = s;
};
exports.reset = function () {
sleep = defaultSleep;
};
exports.withDeadlineExpired = function (d) {
exports.deadlineExpired = d;
};
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,6 @@ describe("Custom Domain for App Runner Service", () => {
return LambdaTester(handler)
.event({
ResponseURL: mockResponseURL,
ResourceProperties: {},
LogicalResourceId: "mockID",
ResourceProperties: {
ServiceARN: mockServiceARN,
Expand Down
Loading