Skip to content

Commit

Permalink
Handle existing workspace directories better (#552)
Browse files Browse the repository at this point in the history
### Proposed changes

I've changed the stack reconciliation code to clean up existing
workspace directories when it finds them instead of treating them as a
lock and failing forever. We've seen numerous times where the operator
leaves a workspace directory behind for some unknown reason and it
causes the stack to fail to reconcile forever. The only way to resolve
the issue is to remove the directory manually or restart the entire
operator pod.

The operator shouldn't treat directories as locks in my opinion. Given
that they're processed by 1 thread at a time and Pulumi has its own
state lock files (for SaaS & self-hosted backends), an existing
directory shouldn't block the reconciliation and cause a failure that
never resolves itself.
 
Also, I fixed a few typos. Let me know if there's anything I've missed
for this or if there's any concerns with this approach.
  • Loading branch information
JDTX0 authored Mar 12, 2024
1 parent aec0d10 commit 6ea80df
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 5 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ CHANGELOG
=========

## HEAD (unreleased)
- Clean up stale workspace directories and don't treat them as a crude lock. [#552](https://github.com/pulumi/pulumi-kubernetes-operator/pull/552)
- Fixed `nodeSelector`, `affinity`, and `tolerations` Helm chart values that were previously effectively ignored.
[#548](https://github.com/pulumi/pulumi-kubernetes-operator/pull/548)

Expand Down
2 changes: 1 addition & 1 deletion docs/create-stacks-using-kubectl.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ kubectl delete secret pulumi-api-secret
kubectl delete secret pulumi-aws-secrets
```

Check out [`ext_s3_bucket_stack.yaml`](../stack-examples/yaml/ext_s3_bucket_stack.yaml) for an extended options exmaple.
Check out [`ext_s3_bucket_stack.yaml`](../stack-examples/yaml/ext_s3_bucket_stack.yaml) for an extended options example.

## Troubleshooting

Expand Down
10 changes: 6 additions & 4 deletions pkg/controller/stack/stack_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -477,7 +477,7 @@ func (r *ReconcileStack) Reconcile(ctx context.Context, request reconcile.Reques
return reconcile.Result{}, sess.finalize(ctx, instance)
}

// This makes sure the status reflects the outcome of reconcilation. Any non-error return means
// This makes sure the status reflects the outcome of reconciliation. Any non-error return means
// the object definition was observed, whether the object ended up in a ready state or not. An
// error return (now we have successfully fetched the object) means it is "in progress" and not
// ready.
Expand Down Expand Up @@ -1131,7 +1131,7 @@ func (sess *reconcileStackSession) resolveResourceRef(ctx context.Context, ref *
}
return string(secretVal), nil
}
return "", errors.New("Mising secret reference in ResourceRef")
return "", errors.New("Missing secret reference in ResourceRef")
default:
return "", fmt.Errorf("Unsupported selector type: %v", ref.SelectorType)
}
Expand Down Expand Up @@ -1262,15 +1262,17 @@ func (sess *reconcileStackSession) getPulumiHome() string {
// thing) the go build cache does not treat new clones of the same repo as distinct files. Since a
// stack is processed by at most one thread at a time, and stacks have unique qualified names, and
// the workspace directory is expected to be removed after processing, this won't cause collisions; but, we
// check anyway, treating the existence of the workspace directory as a crude lock.
// check anyway and cleanup any left over directories from previous runs. Using the directory as a lock isn't
// needed as Pulumi's state has locks to prevent concurrent operations
func (sess *reconcileStackSession) MakeWorkspaceDir() (string, error) {
workspaceDir := filepath.Join(sess.rootDir, "workspace")
_, err := os.Stat(workspaceDir)
switch {
case os.IsNotExist(err):
break
case err == nil:
return "", fmt.Errorf("expected workspace directory %q for stack not to exist already, but it does", workspaceDir)
sess.logger.Debug("Found leftover workspace directory %q, cleaning it up", workspaceDir)
sess.CleanupWorkspaceDir()
case err != nil:
return "", fmt.Errorf("error while checking for workspace directory: %w", err)
}
Expand Down

0 comments on commit 6ea80df

Please sign in to comment.