Skip to content
This repository was archived by the owner on Oct 22, 2021. It is now read-only.

Conversation

@andreas-kupries
Copy link
Contributor

Description

Removed all the patches which removed ulimit calls from various pre-start and other kubecf scripts.

Motivation and Context

This is in the context of tickets #504 and #505, where a first set of these patches was given to upstream for consideration. In the recent check of what they were used for the discussion introduced the possibility that the patches might not be necessary anymore, or even if, that the underlying issue could be solved differently, via ops files modifying job capabilities.

Testing without the patches strongly indicated them as not necessary anymore. Thus this PR removes them, and the relevant upstream PRs (cloudfoundry/capi-release#172, cloudfoundry/diego-release#512) will be retracted and closed.

⚠️ I am deferring the retractions until after this PR is reviewed, accepted, and merged.

How Has This Been Tested?

It was tested that both then-master f90161a and the branch from it without the ulimit patches deploy and then pass the 🐈 tests.

⚠️ The tested branch was rebased to the current master commit fef5111, it has not been re-tested.

The initial testing using an SA config on a local minikube with 16/20 G memory failed severely in the 🐈 even for the unchanged baseline, making this attempt useless.

The final testing then happened using a GKE cluster deployed via catapult. The necessary bundle of charts was created from the relevant commit via

bazel build //deploy/bundle:kubecf-bundle

The catapult configuration used was

export AUTOSCALER=false
export BACKEND=gke
export CLUSTER_NAME=_ulimit
export DEBUG_MODE=true
export ENABLE_EIRINI=false
export HA=false
export KUBECF_TEST_SUITE=cats
export SCF_OPERATOR=true
export SCF_TESTGROUP=true
export GKE_CLUSTER_NAME=ak-kubecf-505-ulimit

export SCF_CHART=/work/SUSE/dev/kubecf-1/output/kubecf-bundle.tgz
# ! Modify the above to whatever local location of the bundle

export GKE_CRED_JSON=/work/SUSE/dev/kubecf-1/output/gke-cred.json
# ! Creds pulled out of black box.

using the targets

scf, scf-login, tests-kubecf, scf-clean, clean
to deploy, test and clear the cluster in question.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code has security implications.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

mook-as
mook-as previously approved these changes Jun 18, 2020
Copy link
Contributor

@mook-as mook-as left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the commit messages to actually include reasons the patches are safe to remove (and why we're removing them), rather than just saying that they are, next time.

@andreas-kupries
Copy link
Contributor Author

Thank you for the reminder about the commit message. Will fix.

… calls from a pre-start and various other scripts.

With the exception of (`router/gorouter/jobs/patch_pre-start.sh`) none of the scripts they change are used. They are old-style start/stop control scripts ignored by the operator (which uses the BPM information instead).

The pre-start script TOH is used, however as the change happens in a different container (versus the job runtime) it is not doing anything useful, either way.
@andreas-kupries
Copy link
Contributor Author

Squashed history and amended the commit message based on Mark's comments.
Rebased.

Copy link
Contributor

@mook-as mook-as left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the problem wasn't that the ulimit calls weren't doing anything useful, it's that they failed (because we don't have the permissions to make things higher), causing the whole thing to fall over.

It appears that the operator gives us containers with big enough preset limits that we no longer fail the calls.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants