Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nightly build #1503

Merged
merged 10 commits into from
Aug 31, 2021
Merged

Add nightly build #1503

merged 10 commits into from
Aug 31, 2021

Conversation

laserprec
Copy link
Contributor

@laserprec laserprec commented Aug 19, 2021

Description

To improve developer experience working with this repo, we will begin migrating our DevOps pipelines into Github Action, leveraging popular DevOps tools like tox and flake8. This will be a sequence of updates to our DevOps infrastructures:

  1. Propose and draft the CI pipeline in Github Action
  2. Setup self-hosted machines to run our GPU workloads
  3. Create feature parity with the existing CI pipelines (pr-gates & nightly-build) <---- (Current PR)
  4. Run tests on the appropriate dependency subsets
  5. Optimize build time for unit tests
  6. Enforce flake8 (coding style checks) on the build and clean up coding styles to pass flake8 checks
  7. Deprecate CI pipelines in ADO and switch to Github Actions

In this PR:

Add a nightly build pipeline that runs the full test suites: unit, integration and smoke tests:
image

Related Issues

#1498

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • This PR is being made to staging branch and not to main branch.

Comment on lines 82 to 89
test-kind: ['unit', 'smoke']
# pytest markers configured in tox.ini. See https://docs.pytest.org/en/6.2.x/example/markers.html
test-marker: ['not spark and not gpu']
include:
# Integration tests needs a more powermachine with more memory. GitHub-hosted agents only have 7GB memory.
- os: self-hosted
python: 3.6
test-kind: 'integration'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question about this, is this pipeline running unit, smoke and integration test?

Copy link
Contributor Author

@laserprec laserprec Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! I know this extra include clause is looking unusual. At the moment, the integration tests cannot run on the smaller SKUs (Standard_DS2_v2: 7GB) provided by GitHub because there isn't enough memory to run the training on the full dataset.

Ideally, all our tests shouldn't be that compute intensive and they can be ran on the default SKUs provided by the CI (unless we need to test for compatibility with GPU enabled machines). Thus the matrix should ideally look like:

   test-kind: ['unit', 'smoke', 'integration']
   test-marker: [....]
   ....

I think we can have a discussion over the objective of the different unit/smoke/integration tests and whether doing training on the full dataset achieve those goals. Added a new issue for the discussion: #1507.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I don't understand, you mentioned that this pipeline runs unit, smoke and integration but later that the integration cannot run on the smaller SKUs. Some questions:

  1. Where are the integration tests running?
  2. Is there a UI where I can see an example on how the tests are running?
  3. there are 3 tests running for CPU, unit, smoke and integration, are they running in parallel or sequentially?
  4. Do we know if GitHub has plans to support bigger SKUs?
  5. Related to 4, Azure Pipelines also has SKUs for testing, are they the same as GitHub, or are they bigger? (if they are bigger, then GitHub might catch up eventually)

Copy link
Contributor Author

@laserprec laserprec Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries!

  1. The integration tests are currently running in the self-hosted gpu agent. This is configured with the self-hosted keyword.
include:
  - os: self-hosted
  1. Please find all of the GitHub Action runs here, especially the pr-gate and nightly pipeline.
  2. Currently all of them are running in parallel. However because we set up only 2 self-hosted but there are about 5 nightly step needing to run on self-hosted machine, we haven't scale up the resource to run them in true parallel.
  3. As far as I know, I don't think there's a plan. Maybe the SKU offerings are different with GitHub Enterprise, but I haven't gotten too much info on that.
  4. Azure Pipeline and GitHub Action both use the same SKU: Standard_DS2_v2.

I hope that helps.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it thanks for the clarification

  1. would you mind explaining in the file that some test are being run on the hosted machine and that some tests on the self-hosted one?
  2. For nightly, in the current pipeline we are running only smoke and integration, see https://github.com/microsoft/recommenders/blob/main/tests/ci/azure_pipeline_test/dsvm_nightly_linux_cpu.yml#L30, and these two are sequential. The reason why they are sequential is that the smoke tests are just a small version of the integration tests. Is it possible to replicate this behavior in github?

We are making so many changes in test, I think we need also to review the test readme, there might be info that is not up to date: https://github.com/microsoft/recommenders/blob/main/tests/README.md

Copy link
Contributor Author

@laserprec laserprec Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great feedbacks!

  1. I already added a comment on these mixed runs for integration tests and link to a new issue (Tests runnable on default CI agent #1507) on whether we should optimize/trim these integration tests and make them fit to run on hosted VMs:
 include:
          # Integration tests need a powerful machine with more memory. GitHub-hosted agents only have 7GB memory.
          # TODO: Optimization/refactoring should be done to ensure that these test can run in the default CI agent (#1507)
          - os: self-hosted

@miguelgfierro is there anywhere else you would like me to comment?

  1. If I understand correctly, in the existing pipeline, the smoke tests are running before integration tests so we can "fail fast" and avoid running the full integration test which is destined to fail. In the similar manner, github action has the fail-fast keyword which will cancel running jobs in the matrix if any failed. If this isn't what you are going after and we need to run both of them sequentially, I think we would need to refactor the CI structure because is designed to run everything in parallel as much as possible.

We are making so many changes in test, I think we need also to review the test readme

This is a great document when I was onboarding :). All of the CI changes is compatible with the what's written in the document in terms of test execution. One thing that we can change is that we no longer need the --duration 0 flag from, for example, pytest tests/integration -m "integration and not spark and not gpu" --durations 0, because these flags are stored in tox.ini and pytest will pick the flags up from tox.ini without you specifying it. Of course, it doesn't hurt to repeat yourself add them in when running. 😃

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laserprec

@miguelgfierro is there anywhere else you would like me to comment?

For new users, particularly when the pipeline is complex like this one, it helps save them time to have a short summary at the top of the file. Here is an example: https://github.com/microsoft/recommenders/blob/main/tests/ci/azure_artifact_feed.yaml

.github/workflows/nightly.yml Show resolved Hide resolved
.github/workflows/nightly.yml Show resolved Hide resolved
.github/workflows/nightly.yml Outdated Show resolved Hide resolved
Comment on lines 82 to 89
test-kind: ['unit', 'smoke']
# pytest markers configured in tox.ini. See https://docs.pytest.org/en/6.2.x/example/markers.html
test-marker: ['not spark and not gpu']
include:
# Integration tests needs a more powermachine with more memory. GitHub-hosted agents only have 7GB memory.
- os: self-hosted
python: 3.6
test-kind: 'integration'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I don't understand, you mentioned that this pipeline runs unit, smoke and integration but later that the integration cannot run on the smaller SKUs. Some questions:

  1. Where are the integration tests running?
  2. Is there a UI where I can see an example on how the tests are running?
  3. there are 3 tests running for CPU, unit, smoke and integration, are they running in parallel or sequentially?
  4. Do we know if GitHub has plans to support bigger SKUs?
  5. Related to 4, Azure Pipelines also has SKUs for testing, are they the same as GitHub, or are they bigger? (if they are bigger, then GitHub might catch up eventually)

.github/workflows/nightly.yml Outdated Show resolved Hide resolved
.github/workflows/nightly.yml Show resolved Hide resolved
@miguelgfierro
Copy link
Collaborator

@laserprec do you have the time saving between doing the old pipeline vs this one?

@laserprec
Copy link
Contributor Author

laserprec commented Aug 25, 2021

@laserprec do you have the time saving between doing the old pipeline vs this one?

@miguelgfierro, please interpret the numbers with a grain of salt. It's not an apples-to-apples comparison:
The current nightly build on ADO takes about 4hrs50min to run sequentially the 3 builds and smoke & integration tests.

image
Link to pipeline runs

This new nightly build on GitHub Action is about 4hrs40min to run the 3 builds and the full test suites (unit/smoke/integration) in partially parallel.

image
link to pipeline run

As you see the GH action is bottlenecked by the longest running thread which is the GPU integration tests (4hr40min) and is considerably longer than the same run in ADO. This is because we are not using the same Azure SKUs to run them. On ADO we are using NC6s_v2 (which has P100 GPU but we no longer have any quota to allocate more) and in GitHub we are using NC12_promo (which has two P40 GPUs).

So at this point, we don't have much immediate gains, but with the new CIs in GitHub Action we are in better grounds to scale up the infra for distributed loads when we have more environments to support and centralized interface to interpret the run and optimize the individual jobs in the next step.

@miguelgfierro
Copy link
Collaborator

On ADO we are using NC6s_v2 (which has P100 GPU but we no longer have any quota to allocate more) and in GitHub we are using NC12_promo (which has two P40 GPUs).

I can try to fix that and get you more quota, for the comparison we should have the same SKU, please ping me privately and I'll try to unblock you.

Another point I see is that, as you said this is not apples, in the old ADO pipeline we are not running the unit tests. Do we want to have the unit test in the nightly build?

The unit tests for spark take 30min? whereas in the ADO pipeline they take around 14, see this and this, do you know why?

I also noticed that this PR is triggering the nightly builds with GitHub, they shouldn't be triggered in the PR but scheduled, can you review?

One way to scale would be to use azureml VMs to execute the tests, this was an initiative that was tried in the past but we didn't put in production. We have some code here that might help you to understand what we did. Can you estimate how much work and how difficult it would be to have azureml VMs instead of a hosted VM on github?

@laserprec
Copy link
Contributor Author

Do we want to have the unit test in the nightly build?

@miguelgfierro I think it's typical to run the full test suites on nightly build. This way we can get the full code coverage report. Relatively speaking the time to run the full unit test is probably a small fraction of that of the integrations. I don't think it hurts to run 😄.

The unit tests for spark take 30min? whereas in the ADO pipeline they take around 14, see this and this, do you know why?

This is because the unit tests are running in the default SKU, with only 7GB memory. It took longer because we were running with less compute.

I also noticed that this PR is triggering the nightly builds with GitHub, they shouldn't be triggered in the PR but scheduled, can you review?

Yup, the manual trigger will not be registered in the UI until these changes are checked in into main branch. I am just adding a custom PR trigger to make sure the nightly build runs. Now that we confirm it can run fine, I can remove it (todo).

We have some code here that might help you to understand what we did. Can you estimate how much work and how difficult it would be to have azureml VMs instead of a hosted VM on github?

I think that's a good idea for managed compute pools for model training. At the moment, I am not aware of any good integration plugins available for azureml and Github Action. I think this is not a trivial change in terms of secret managements, complexity introduced involving aml in the pipeline, etc. The gains, I assume, is relieving the team from managing the self-hosted VMs and potentially introduce AML as the training scenario in our NB? I think we can investigate this direction. @miguelgfierro what would be the best way to backlog this item for us to come back to?

@miguelgfierro
Copy link
Collaborator

Got it, makes sense, really good suggestions, and thought processes.

I think we can investigate this direction. @miguelgfierro what would be the best way to backlog this item for us to come back to?

We have an open issue about this #995, we can either reuse this one or create a new one and close this, whatever you prefer

@laserprec
Copy link
Contributor Author

We have an open issue about this #995, we can either reuse this one or create a new one and close this, whatever you prefer

Awesome! We can reuse this issue and I already assigned this to me. I think we can look into this after we have finished in migration. Btw, @miguelgfierro, is there anything else I haven't address? I would love to get your approval before merging this PR 😄.

@miguelgfierro
Copy link
Collaborator

Btw, @miguelgfierro, is there anything else I haven't address? I would love to get your approval before merging this PR 😄.

there are still some questions that it would be good to have clarity on:

@laserprec
Copy link
Contributor Author

laserprec commented Aug 27, 2021

@miguelgfierro

Clarity on the naming, see Add nightly build #1503 (comment)

I hope I gave a better explanation :p. If not, I can hop into a call.

Trying 3.7, see Add nightly build #1503 (comment)

I feel like this support for 3.7 can be its own PR. Trying to keep this PR modular. I have created a new issue for this #1511. I can create like a shadow CI job to try the build in 3.7 (but would not fail the build when related jobs fail) in upcoming changes.

Short summary at the top to help users understand the pipeline quickly, see Add nightly build #1503 (comment)

I updated the comments with more details commit in 97faf6341d. Let me know if we need more context.

@codecov-commenter
Copy link

Codecov Report

Merging #1503 (97faf63) into staging (3e23511) will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           staging    #1503      +/-   ##
===========================================
- Coverage    61.96%   61.95%   -0.01%     
===========================================
  Files           84       84              
  Lines         8368     8369       +1     
===========================================
  Hits          5185     5185              
- Misses        3183     3184       +1     
Impacted Files Coverage Δ
recommenders/models/vae/standard_vae.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e23511...97faf63. Read the comment docs.

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good job @laserprec

@laserprec laserprec merged commit c5029af into staging Aug 31, 2021
@laserprec laserprec linked an issue Aug 31, 2021 that may be closed by this pull request
@miguelgfierro miguelgfierro deleted the laserprec/nightly-ci branch September 1, 2021 08:43
@miguelgfierro miguelgfierro restored the laserprec/nightly-ci branch September 1, 2021 08:43
@miguelgfierro miguelgfierro deleted the laserprec/nightly-ci branch September 1, 2021 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add nightly build
5 participants