Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use reduced IAM permissions on worker nodes instance profile #991

Merged
merged 3 commits into from
Jan 23, 2025

Conversation

AndiDog
Copy link
Contributor

@AndiDog AndiDog commented Jan 14, 2025

What this PR does / why we need it

Towards giantswarm/roadmap#3795, see giantswarm/capa-iam-operator#369 which should be reviewed together with this PR

Checklist

  • Updated CHANGELOG.md.

Trigger E2E tests

Let's first roll out the operator before running tests – else it's a wasted test run.

@AndiDog AndiDog requested a review from a team as a code owner January 14, 2025 14:11
@AndiDog AndiDog force-pushed the minimum-permissions branch from 60a4653 to 1caafbf Compare January 14, 2025 14:11
@AndiDog
Copy link
Contributor Author

AndiDog commented Jan 21, 2025

capa-iam-operator backward-compatible change is rolled out, so let's test E2E

/run cluster-test-suites

@tinkerers-ci
Copy link

tinkerers-ci bot commented Jan 21, 2025

cluster-test-suites

Run name pr-cluster-aws-991-cluster-test-suitesv49tx
Commit SHA 6979486
Result Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites


Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

@AndiDog
Copy link
Contributor Author

AndiDog commented Jan 21, 2025

At least the private-cluster tests failed because AWS didn't have the instance type available. Trying again, slowly.

/run cluster-test-suites TARGET_SUITES=./providers/capa/standard

@tinkerers-ci
Copy link

tinkerers-ci bot commented Jan 21, 2025

cluster-test-suites

Run name pr-cluster-aws-991-cluster-test-suitesz9ftx
Commit SHA 6979486
Result Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites


Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

@AndiDog
Copy link
Contributor Author

AndiDog commented Jan 21, 2025

/run cluster-test-suites TARGET_SUITES=./providers/capa/standard

@tinkerers-ci
Copy link

tinkerers-ci bot commented Jan 21, 2025

cluster-test-suites

Run name pr-cluster-aws-991-cluster-test-suitesmxzsx
Commit SHA 6979486
Result Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites


Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

@AndiDog
Copy link
Contributor Author

AndiDog commented Jan 21, 2025

/run cluster-test-suites TARGET_SUITES=./providers/capa/standard

@tinkerers-ci
Copy link

tinkerers-ci bot commented Jan 21, 2025

cluster-test-suites

Run name pr-cluster-aws-991-cluster-test-suitesgbwt7
Commit SHA 6979486
Result Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites


Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

@AndiDog
Copy link
Contributor Author

AndiDog commented Jan 21, 2025

I'm still unsure why pulling the private ECR image fails. This is the equivalent of what the E2E tests try:

crictl pull 992382781567.dkr.ecr.eu-west-2.amazonaws.com/giantswarm/alpine:latest

This started failing today, but still worked this morning. And the capa-iam-operator change happened yesterday evening, is backward-compatible and doesn't look buggy to me.

I've tested with and without the zero-permissions IAM role, and neither works.

@AndiDog AndiDog force-pushed the minimum-permissions branch from ae1765a to 027bcaf Compare January 22, 2025 11:52
@AndiDog AndiDog force-pushed the minimum-permissions branch from 027bcaf to 102625f Compare January 22, 2025 11:53
Copy link
Contributor

There were differences in the rendered Helm template, please check! ⚠️

Output
=== Differences when rendered with values file helm/cluster-aws/ci/test-auditd-values.yaml ===

/metadata/labels  (infrastructure.cluster.x-k8s.io/v1beta2/AWSMachinePool/org-giantswarm/test-wc-minimal-pool0)
  + one map entry added:
    alpha.aws.giantswarm.io/reduced-instance-permissions-workers: "true"



=== Differences when rendered with values file helm/cluster-aws/ci/test-eni-mode-values.yaml ===

/metadata/labels  (infrastructure.cluster.x-k8s.io/v1beta2/AWSMachinePool/org-giantswarm/test-wc-pool0)
  + two map entries added:
    alpha.aws.giantswarm.io/reduced-instance-permissions-workers: "true"
    alpha.aws.giantswarm.io/ipam-mode: eni



=== Differences when rendered with values file helm/cluster-aws/ci/test-lifecycle-hook-heartbeattimeout-values.yaml ===

/metadata/labels  (infrastructure.cluster.x-k8s.io/v1beta2/AWSMachinePool/org-giantswarm/test-wc-minimal-pool0)
  + one map entry added:
    alpha.aws.giantswarm.io/reduced-instance-permissions-workers: "true"



=== Differences when rendered with values file helm/cluster-aws/ci/test-local-registry-cache-values.yaml ===

/metadata/labels  (infrastructure.cluster.x-k8s.io/v1beta2/AWSMachinePool/org-giantswarm/test-wc-pool0)
  + one map entry added:
    alpha.aws.giantswarm.io/reduced-instance-permissions-workers: "true"



=== Differences when rendered with values file helm/cluster-aws/ci/test-mc-proxy-values.yaml ===

/metadata/labels  (infrastructure.cluster.x-k8s.io/v1beta2/AWSMachinePool/org-giantswarm/test-mc-proxy-pool0)
  + one map entry added:
    alpha.aws.giantswarm.io/reduced-instance-permissions-workers: "true"



=== Differences when rendered with values file helm/cluster-aws/ci/test-multiple-authenticated-mirrors-values.yaml ===

/metadata/labels  (infrastructure.cluster.x-k8s.io/v1beta2/AWSMachinePool/org-giantswarm/test-wc-pool0)
  + one map entry added:
    alpha.aws.giantswarm.io/reduced-instance-permissions-workers: "true"



=== Differences when rendered with values file helm/cluster-aws/ci/test-multiple-service-account-issuers-values.yaml ===

/metadata/labels  (infrastructure.cluster.x-k8s.io/v1beta2/AWSMachinePool/org-giantswarm/test-wc-pool0)
  + one map entry added:
    alpha.aws.giantswarm.io/reduced-instance-permissions-workers: "true"



=== Differences when rendered with values file helm/cluster-aws/ci/test-network-topology-values.yaml ===

/metadata/labels  (infrastructure.cluster.x-k8s.io/v1beta2/AWSMachinePool/org-giantswarm/test-wc-minimal-pool0)
  + one map entry added:
    alpha.aws.giantswarm.io/reduced-instance-permissions-workers: "true"



=== Differences when rendered with values file helm/cluster-aws/ci/test-spot-instances-values.yaml ===

/metadata/labels  (infrastructure.cluster.x-k8s.io/v1beta2/AWSMachinePool/org-giantswarm/test-wc-minimal-pool0)
  + one map entry added:
    alpha.aws.giantswarm.io/reduced-instance-permissions-workers: "true"



=== Differences when rendered with values file helm/cluster-aws/ci/test-subnet-tags-values.yaml ===

/metadata/labels  (infrastructure.cluster.x-k8s.io/v1beta2/AWSMachinePool/org-giantswarm/test-wc-pool0)
  + one map entry added:
    alpha.aws.giantswarm.io/reduced-instance-permissions-workers: "true"



=== Differences when rendered with values file helm/cluster-aws/ci/test-wc-minimal-values.yaml ===

/metadata/labels  (infrastructure.cluster.x-k8s.io/v1beta2/AWSMachinePool/org-giantswarm/test-wc-minimal-pool0)
  + one map entry added:
    alpha.aws.giantswarm.io/reduced-instance-permissions-workers: "true"

@AndiDog
Copy link
Contributor Author

AndiDog commented Jan 22, 2025

crictl pull 992382781567.dkr.ecr.eu-west-2.amazonaws.com/giantswarm/alpine:latest

This wasn't meant to work as it doesn't use the ECR authentication plugin that kubelet would. So it was the IAM permissions. Fixed in capa-iam-operator. Now the tests should work.

/run cluster-test-suites

@tinkerers-ci
Copy link

tinkerers-ci bot commented Jan 22, 2025

cluster-test-suites

Run name pr-cluster-aws-991-cluster-test-suitespqqnm
Commit SHA 102625f
Result Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites


Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

@AndiDog
Copy link
Contributor Author

AndiDog commented Jan 23, 2025

NTH restarts are due to apps coming up so slowly. Not this change's fault. Trying again.

/run cluster-test-suites TARGET_SUITES=./providers/capa/private

@AndiDog
Copy link
Contributor Author

AndiDog commented Jan 23, 2025

Oh I guess NTH isn't proxy-aware yet

FTL Unable to get AWS credentials error="WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.eu-north-1.amazonaws.com/\": dial tcp 52.46.200.7:443: i/o timeout"

@tinkerers-ci
Copy link

tinkerers-ci bot commented Jan 23, 2025

cluster-test-suites

Run name pr-cluster-aws-991-cluster-test-suiteslqr6c
Commit SHA 102625f
Result Completed ✅

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites


Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

@AndiDog AndiDog merged commit a67ddc8 into main Jan 23, 2025
14 checks passed
@AndiDog AndiDog deleted the minimum-permissions branch January 23, 2025 12:25
@AndiDog AndiDog mentioned this pull request Jan 23, 2025
@AndiDog AndiDog mentioned this pull request Jan 23, 2025
1 task
AndiDog added a commit that referenced this pull request Jan 27, 2025
* Chart: Reduce default etcd volume size to 50 GB. (#994)
* Explicitly set Ignition user data storage type to S3 bucket objects for machine pools (#981)
* Use reduced IAM permissions on worker nodes instance profile (#991)
* Explicitly set aws-node-termination-handler queue region so crash-loops are avoided, allowing faster startup (#977)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants