Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E test for cpu scaler #2441

Merged
merged 1 commit into from
Jan 17, 2022
Merged

E2E test for cpu scaler #2441

merged 1 commit into from
Jan 17, 2022

Conversation

Ritikaa96
Copy link
Contributor

@Ritikaa96 Ritikaa96 commented Jan 5, 2022

This PR add E2E tests for cpu scaler.

Checklist

  • Commits are signed with Developer Certificate of Origin (DCO - learn more)
  • Tests have been added
  • A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Changelog has been updated

Fixes #2219

@Ritikaa96 Ritikaa96 requested a review from a team as a code owner January 5, 2022 10:41
@Ritikaa96
Copy link
Contributor Author

Hi @tomkerkhove @JorTurFer PTAL

JorTurFer
JorTurFer previously approved these changes Jan 5, 2022
Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks for doing it ❤️
Please, update the changelog (adding this PR under improvements section) and let me trigger it before merging :)

@JorTurFer
Copy link
Member

JorTurFer commented Jan 5, 2022

/run-e2e cpu.test.*
Update: You can check the progres here

@JorTurFer JorTurFer self-requested a review January 5, 2022 11:11
@JorTurFer JorTurFer dismissed their stale review January 5, 2022 11:12

pending changelog

@JorTurFer
Copy link
Member

JorTurFer commented Jan 5, 2022

/run-e2e cpu.test.*
Update: You can check the progres here

@JorTurFer JorTurFer requested a review from a team January 5, 2022 13:53
tests/scalers/cpu.test.ts Outdated Show resolved Hide resolved
@Ritikaa96 Ritikaa96 changed the title E2E cpu scaler E2E test for cpu scaler Jan 6, 2022
@Ritikaa96
Copy link
Contributor Author

Hi, @JorTurFer , @zroubalik I've changed the image and updated the changelog.
PTAL!

@JorTurFer
Copy link
Member

JorTurFer commented Jan 6, 2022

/run-e2e cpu.test.*
Update: You can check the progres here

@zroubalik
Copy link
Member

/run-e2e cpu.test.* Update: You can check the progres here

@Ritikaa96 seems like the test is failing ^

@Ritikaa96
Copy link
Contributor Author

/run-e2e cpu.test.* Update: You can check the progres here

@Ritikaa96 seems like the test is failing ^

Hi @zroubalik , I tried the same file and ran npx ava scalers/cpu.test.ts to test it. The test is successfully run on my system. Do you think its because of images used? i saw we have used busybox in graphite scaler too.

@zroubalik
Copy link
Member

/run-e2e cpu.test.* Update: You can check the progres here

@Ritikaa96 seems like the test is failing ^

Hi @zroubalik , I tried the same file and ran npx ava scalers/cpu.test.ts to test it. The test is successfully run on my system. Do you think its because of images used? i saw we have used busybox in graphite scaler too.

I don't think that busybox is the problem. Let me rerun the test.

@zroubalik
Copy link
Member

zroubalik commented Jan 11, 2022

/run-e2e cpu.test.*
Update: You can check the progres here

@Ritikaa96
Copy link
Contributor Author

/run-e2e cpu.test.* Update: You can check the progres here

@zroubalik , Seems like now the job is deployed successfully whereas it wasn't in the previous run. But now the next step is getting failed. any suggestions where it can be going wrong?

@JorTurFer
Copy link
Member

JorTurFer commented Jan 16, 2022

I can reproduce the error locally, let me check if I figure out any tip or solution

@JorTurFer
Copy link
Member

JorTurFer commented Jan 16, 2022

In my local, the generated load expires too early and that's why never reaches 5 instances.
Maybe you could reduce the expected instances or extend the load time to ensure that it's enough, maybe extending it to 600 or 800 and deleting the job after the scaling out step to ensure that the load finishes. Other option is using a Deployment instead of a Job for better management of the load generation (you could maintain the load despite the HPA is not started yet).

Note: If you extend the load, set the maximum amount of instances instead of having 100 (default), for this e2e test it's not important the final calculation of the instances, it depends on different factors that could change, the important point is that it scales out/in.

Another tip to avoid waiting for 8 minutes is using the advanced section of the ScaledObject, I have reduced it to 3 minutes only setting scaleDown.stabilizationWindowSeconds to 0.

I have also been troubleshooting the test and my solution is this:

import * as fs from 'fs'
import * as sh from 'shelljs'
import * as tmp from 'tmp'
import test from 'ava'
import { waitForDeploymentReplicaCount } from './helpers'

const testNamespace = 'cpu-test'
const deployMentFile = tmp.fileSync()
const triggerFile = tmp.fileSync()


test.before(t => {
  sh.config.silent = true
  sh.exec(`kubectl create namespace ${testNamespace}`)

  fs.writeFileSync(deployMentFile.name, deployMentYaml)
  t.is(
    0,
    sh.exec(`kubectl apply -f ${deployMentFile.name} --namespace ${testNamespace}`).code,
    'Deploying php deployment should work.'
  )
  t.is(0, sh.exec(`kubectl rollout status deploy/php-apache -n ${testNamespace}`).code, 'Deployment php rolled out succesfully')
})

test.serial('Deployment should have 1 replicas on start', t => {
  const replicaCount = sh.exec(
    `kubectl get deployment.apps/php-apache --namespace ${testNamespace} -o jsonpath="{.spec.replicas}"`
  ).stdout
  t.is(replicaCount, '1', 'replica count should start out as 1')
})

test.serial(`Creating Job should work`, async t => {
  fs.writeFileSync(triggerFile.name, triggerJob)
  t.is(
    0,
    sh.exec(`kubectl apply -f ${triggerFile.name} --namespace ${testNamespace}`).code,
    'creating job should work.'
  )
})

test.serial(`Deployment should scale to 4 after 3 minutes`, async t => {
  // check replicacount on constant triggering :
  t.true(await waitForDeploymentReplicaCount(4, 'php-apache', testNamespace, 18, 10000), 'Replica count should be 5 after 180 seconds')
})

test.serial(`Deleting Job should work`, async t => {
  fs.writeFileSync(triggerFile.name, triggerJob)
  t.is(
    0,
    sh.exec(`kubectl delete -f ${triggerFile.name} --namespace ${testNamespace}`).code,
    'Deleting job should work.'
  )
})

test.serial(`Deployment should scale back to 1 after 3 minutes`, async t => {
  // check for the scale down :
  t.true(await waitForDeploymentReplicaCount(1, 'php-apache', testNamespace, 18, 10000), 'Replica count should be 1 after 8 minutes')
})

test.after.always.cb('clean up workload test related deployments', t => {
  const resources = [
    'deployment.apps/php-apache',
    'jobs.batch/trigger-job',
  ]
  for (const resource of resources) {
    sh.exec(`kubectl delete ${resource} --namespace ${testNamespace}`)
  }
  sh.exec(`kubectl delete namespace ${testNamespace}`)
  t.end()
})

const deployMentYaml = `apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
        imagePullPolicy: IfNotPresent
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: cpu-scaledobject
  labels:
    run: php-apache
spec:
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 0
  maxReplicaCount:  4
  scaleTargetRef:
    name: php-apache
  triggers:
  - type: cpu
    metadata:
      type: Utilization
      value: "50"`
const triggerJob = `apiVersion: batch/v1
kind: Job
metadata:
  name: trigger-job
  namespace: cpu-test
spec:
  template:
    spec:
      containers:
      - image: busybox
        name: test
        command: ["/bin/sh"]
        args: ["-c", "for i in $(seq 1 600);do wget -q -O- http://php-apache.cpu-test.svc/;sleep 0.1;done"]
      restartPolicy: Never
  activeDeadlineSeconds: 600
  backoffLimit: 3`

Basically I have limited the max instances to 4, increased the load duration, deleted the load job after the scaling out and modify the HPA behavior to scale in faster reducing the timeout from 8 minutes to 3.

Feel free to use it, ignore it, or take some parts.

BTW, Thanks for helping with e2e tests ❤️❤️❤️❤️

@Ritikaa96
Copy link
Contributor Author

Basically I have limited the max instances to 4, increased the load duration, deleted the load job after the scaling out, and modify the HPA behavior to scale in faster reducing the timeout from 8 minutes to 3.

Hi @JorTurFer, Thanks for taking out the time to test it locally.
I have limited the max instances to 4, increased the load duration, deleted the load job after the scaling out, and modify the HPA behavior to scale in faster reducing the timeout from 8 minutes to 3

Deleting the job in mid and modifying HPA behavior to scale in faster helped a lot in reducing the time and after corrections, the whole test merely takes a few minutes. Thanks a lot for the suggestions, I wonder why I didn't think of this before.

As per the more logical bet, I've changed the constant expected final replica count check to one where we focus on whether it is scaling out or not, with:

test.serial(`Deployment should scale in next 3 minutes`, async t => {
  // check for increased replica count on constant triggering :
  t.true(await waitForDeploymentScaleUp(2, 'php-apache', testNamespace, 18, 10000), 'Replica count should scale up in next 3 minutes')
})

Also on setting scaleDown.stabilizationWindowSeconds to 0 in ScaledObject, the deployment is going to immediately scale down to 0, consequentially creating a problem in its scaling up with the trigger, though adding minReplicaCount resolved that.

PTAL! I hope crossed out the apparent reason the test failed last time.

@JorTurFer
Copy link
Member

JorTurFer commented Jan 17, 2022

/run-e2e cpu.test.*
Update: You can check the progres here

Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, ❤️
Let's wait till test execution 🤞
Only 2 small things:

  • Please add the missing resources that I said in the comment
  • Is waitForDeploymentScaleUp really needed? I mean, you could set maxReplicaCount to 2, and you will not need it anymore.

tests/scalers/cpu.test.ts Outdated Show resolved Hide resolved
updated cpu.test.ts, Resolved conflicts from changelog.md
@Ritikaa96
Copy link
Contributor Author

Hi, @JorTurFer , PTAL.

@JorTurFer
Copy link
Member

JorTurFer commented Jan 17, 2022

/run-e2e cpu.test.*
Update: You can check the progres here

Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks for your contribution ❤️ ❤️ ❤️ ❤️

@JorTurFer JorTurFer merged commit 633923a into kedacore:main Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add e2e test for CPU Scaler
3 participants