Resource name sanitization #108

DavidSeptimus · 2023-01-18T19:33:57Z

Resolves #49 (minus the actual namespacing piece)

This PR adds sanitization for the identifiers of AWS resources specified for creation via Klotho's IAC layer. This change should not change any happy path identifiers (resource names or Pulumi URNs)

Resource names are generated using tagged string templates as in the following examples using the sanitized() tag function:

let appName = 'my-very-long-app-name'
let resourceId = 'my-load-balancer'

// ensure that the output is a valid load balancer name
// from left to right, hash appName and then resourceId if the name is too long
sanitized(loadBalancer.nameValidation())`${h(appName)}-${h(resourceId)}-lb`
// 2d958-my-load-balancer-lb

// prioritize hashing resourceId before appName since its priority is 1 (default is 100)
sanitized(loadBalancer.nameValidation())`${h(appName)}-${h(resourceId, 1)}-lb`
// my-very-long-app-name-0ea7a-lb

When shortening resource names, raw text or template expressions of type string are not modified. Expressions of type Component are shortened from left to right based on priority (default priority is 100) using the Component instance's configured ShorteningStrategy. In the examples above, the h() function returns a Component with a ShorteningStrategy that converts the supplied text into a 5 character truncated SHA256 hash.

The process of generating a valid resource name is as follows:

Shorten Components from left to right ranked by priority until the name's length <= maxLength.
Sanitize the generated string in one or more passes (up to 5 passes by default):
1. Truncate the string if it's longer than max length (arguably, we might want this to be an error).
2. Throw an error if the string is shorter than the min length.
3. Apply the supplied SanitizationRules and capture any violations.
4. For any violations, sequentially apply the violated rules' supplied FixFunc (if present) to resolve any issues.
5. Revalidate the string if it's been fixed
6. Return the valid string or start another pass if the string is still invalid
return the sanitized result or throw an exception if any constraint violations are present in the result.

Standard checks

Unit tests: Any special considerations? We don't have a pattern for testing IAC-related code at the moment, but we should really figure that out.
Docs: Do we need to update any docs, internal or public? No
Backwards compatibility: Will this break existing apps? This PR aims to avoid changing "happy path" resource names so there should be no issue there.

DavidSeptimus · 2023-01-18T20:00:47Z

rebased and squashed old commits

ewucc

First pass, looks good in general. I think general comments

seems like for every new entity that we support we'll need to add that set of rules. Seems like the right way to do it but yeh definitely a lot of bloat to the compiled dir since i believe these will end up there.
Might just be me but while it does seem comprehensive, it also seems somewhat complex of a process. Specifically around the weighting (0 -> 100) of components and in the shortening func, it does a length subtraction vs. building from most important components up and hashing once it's exceeded the length.

Last point is also not really like a negative, just more of an observation so not a big deal if no one else has similar thoughts.

pkg/infra/pulumi_aws/iac/sanitization/sanitizer.ts

DavidSeptimus · 2023-01-19T22:36:36Z

First pass, looks good in general. I think general comments

seems like for every new entity that we support we'll need to add that set of rules. Seems like the right way to do it but yeh definitely a lot of bloat to the compiled dir since i believe these will end up there.

That's true. Ideally, this functionality will move into the compiler at some point. Another option is running it through a build tool to merge all the sanitization files into a single file that we include in the output.

jhsinger-klotho · 2023-01-20T00:16:01Z

What do we do if we have like multiple clusters and install a controller on both. Is it on the plugin to label that resourceId cluster-resource?

jhsinger-klotho · 2023-01-20T00:18:30Z

pkg/infra/pulumi_aws/iac/load_balancing.ts

        switch (params.loadBalancerType) {
            case 'application':
                lb = new aws.lb.LoadBalancer(`${appName}-${resourceId}-alb`, {
-                    name: `${appName}-${resourceId}`,
+                    name: lbName,


also a general thing, we likely shouldnt set actual names of resources because then they cant get replaced if they change. We should be doing the validation on the resource id for pulumi i believe so that it autogenerates the name + some uuid for uniqueness if we ever need replacements.

let me know your thought on that, i think i added the name field, but i was going to remove it in my next pr because i noticed it could cause issues. We should likely see where else we do this

jhsinger-klotho · 2023-01-20T00:19:28Z

pkg/infra/pulumi_aws/iac/elasticache.ts

@@ -46,7 +47,10 @@ export const setupElasticacheCluster = (
        retentionInDays: 0,
    })

-    const clusterName = sanitizeClusterName(appName, dbName)
+    // TODO: look into removing sanitizeClusterName when making other breaking changes to resource names
+    const clusterName = sanitized(


same here, need to remove clustername as the actual name for this and memdb but lets maybe chat about that tomorrow

ghost

lgtm, other than the merge conflicts. I had a suggestion for a simplification, but it's not blocking. May want to get @jhsinger-klotho's quick glance too, since he took a more in-depth look a couple weeks back.

pkg/infra/pulumi_aws/iac/sanitization/aws/common.ts

ghost · 2023-01-31T22:43:44Z

pkg/infra/pulumi_aws/iac/sanitization/sanitizer.ts

+export function regexpMatch(
+    description: string,
+    pattern: RegExp,
+    fix: FixFunc | undefined = undefined


small suggestion: I think all of the calls to this are basically:

regexpMatch(desc, /^P+$/, (n) => n.replace(/P/g, R))

... where P is some pattern and R is a replacement string. For example:

regexpMatch('', /^[\w-]+$/, (n) => n.replace(/[^\w-]/g, '-'))

If that's true, I think you could simplify this to just take the pattern and a replace char, and generate the validate and FixFunc from them:

regexpMatch('', /[^\w-]/, '-'))

That's true for a lot of them, but definitely not all.

jhsinger-klotho · 2023-01-31T23:02:29Z

yeah, LGTM and if the test are passing definitely a good starting point. The name issue looks to be resolved but we can continue to remove those if we find more

Co-authored-by: ewucc <[email protected]>

Co-authored-by: Yuval Shavit <[email protected]>

gordon-klotho · 2023-02-01T15:09:20Z