Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-1909] add gang scheduling with hugepages e2e test #653

Closed
wants to merge 2 commits into from

Conversation

FrankYang0529
Copy link
Member

What is this PR for?

Add e2e test for gang scheduling with hugepages.

What type of PR is it?

  • - Improvement

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-1909

How should this be tested?

Test in e2e test.

@FrankYang0529
Copy link
Member Author

I will rebase the branch after #651 is merged.

Run Verify_HugePage with Ubuntu hugepage setting in my repo, and it can pass.

https://github.com/FrankYang0529/yunikorn-k8shim/actions/runs/5819432739/job/15777810463?pr=1

@codecov
Copy link

codecov bot commented Aug 10, 2023

Codecov Report

Merging #653 (d64c6f3) into master (358751a) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #653   +/-   ##
=======================================
  Coverage   71.92%   71.92%           
=======================================
  Files          51       51           
  Lines        8082     8082           
=======================================
  Hits         5813     5813           
  Misses       2073     2073           
  Partials      196      196           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@FrankYang0529 FrankYang0529 changed the title [YUNIKORN-1908] add gang scheduling with hugepages e2e test [YUNIKORN-1909] add gang scheduling with hugepages e2e test Aug 11, 2023
@craigcondit
Copy link
Contributor

@FrankYang0529 #651 has been merged, feel free to rebase.

@FrankYang0529
Copy link
Member Author

@craigcondit I rebased it. Thank you.

@FrankYang0529
Copy link
Member Author

The failed e2e test case is Verify_allow_preemption_tag. It's not about gang scheduling cases.

Copy link
Contributor

@manirajv06 manirajv06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup. Better organised now.

looks good. Minor nit.

Also, do we need to cover extended resources like "nvidia.com/gpu"? If yes, we can file a separate jira.

@FrankYang0529 FrankYang0529 force-pushed the YUNIKORN-1908 branch 2 times, most recently from 484f90d to 0717bff Compare September 7, 2023 11:28
Copy link
Contributor

@manirajv06 manirajv06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. Pending jenkins run.

@manirajv06
Copy link
Contributor

Also, do we need to cover extended resources like "nvidia.com/gpu"? If yes, we can file a separate jira.

Can we file a follow up?

@FrankYang0529
Copy link
Member Author

Also, do we need to cover extended resources like "nvidia.com/gpu"? If yes, we can file a separate jira.

Can we file a follow up?

Okay, but GPU is not supported by GitHub-hosted runners now.

@craigcondit
Copy link
Contributor

craigcondit commented Sep 7, 2023

Also, do we need to cover extended resources like "nvidia.com/gpu"? If yes, we can file a separate jira.

Can we file a follow up?

Okay, but GPU is not supported by GitHub-hosted runners now.

I also think we need to keep in mind that these tests are run locally during development as well, not just on GitHub. There needs to be a short-circuit skip of the test implemented in the case where there are not huge pages resources available in the cluster. Checking that at least one node exists that has huge pages resources should be sufficient.

@FrankYang0529
Copy link
Member Author

Also, do we need to cover extended resources like "nvidia.com/gpu"? If yes, we can file a separate jira.

Can we file a follow up?

Okay, but GPU is not supported by GitHub-hosted runners now.

@manirajv06, in this case, do we still want to file a follow-up issue? In YUNIKORN-1909, it also mentions kind doesn't have GPU resources.

@FrankYang0529
Copy link
Member Author

FrankYang0529 commented Sep 7, 2023

Also, do we need to cover extended resources like "nvidia.com/gpu"? If yes, we can file a separate jira.

Can we file a follow up?

Okay, but GPU is not supported by GitHub-hosted runners now.

I also think we need to keep in mind that these tests are run locally during development as well, not just on GitHub. There needs to be a short-circuit skip of the test implemented in the case where there are not huge pages resources available in the cluster. Checking that at least one node exists that has huge pages resources should be sufficient.

@craigcondit, yeah, we have a line check whether the cluster has huge pages reosurces.

                 if !hasHugePages {
			ginkgo.Skip("Skip hugepages test as no node has hugepages")
		}

@manirajv06
Copy link
Contributor

in this case, do we still want to file a follow-up issue? In YUNIKORN-1909, it also mentions kind doesn't have GPU resources.

Ok, we can leave it.

@manirajv06 manirajv06 closed this in 1efe521 Sep 7, 2023
@FrankYang0529 FrankYang0529 deleted the YUNIKORN-1908 branch November 23, 2023 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants