skip some gpt_neox tests that require 80G RAM by ydshieh · Pull Request #17923 · huggingface/transformers

ydshieh · 2022-06-28T12:23:05Z

What does this PR do?

GPT-NeoX requires ~80G RAM to run. Our CI runners have only 60G RAM. Skip a few tests for now.

Do you think it's better to use something like

@unittest.skipUnless(psutil.virtual_memory().total / 1024 ** 3 > 80, "GPT-NeoX requires 80G RAM for testing")

The problem is that psutil is not in the requirements.

HuggingFaceDocBuilderDev · 2022-06-28T12:33:05Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

We will never have a runner with 80GB of RAM so if there is no better alternative, we should jsut delete those tests.

stas00 · 2022-06-28T14:49:41Z

What Sylvain said and I'd ask an even more different question - why are we running the same test on many identical models of different sizes. The purpose of our test suite is not to test models on the hub, it's to test the model's code. So such tests should never be there in the first place.

99% of the time the tests should be run against tiny random models, most of which reside under https://huggingface.co/hf-internal-testing - these are functional tests.
1% of tests should be against the smallest non-random model to test the quality of the results. And typically these are @slow tests.

Of course, the % breakdown is symbolic, the point I was trying to convey is that most tests should be really fast in download and execution.

If there is a need to test models on the hub, there should be another CI that all it does is loading the models and performs some basic test on them. That CI would need to have a ton of CPU and GPU memory and # of GPUs for obvious reasons - e.g. t5-11b and other huge models.

ydshieh · 2022-06-28T15:09:10Z

Hi @stas00

The related tests here are decorated with @slow and run in the daily scheduled CI, not push CI. And only one size is tested GPT_NEOX_PRETRAINED_MODEL_ARCHIVE_LIST[:1].

For test_model_from_pretrained, I think we can use tiny random models in hf-internal-testing for GPTNeoX if we want to keep the test. However, we always have integration tests (like GPTNeoXModelIntegrationTest) which are important to have.

Note that on scheduled CI, we use a cache server (FileStore on GCP), so there is no real downloading (e.g. the downloading is very fast, happening between GCP's network).

They also have 16 vCPUs and 60G RAM.

stas00 · 2022-06-28T15:16:17Z

Ah, good point, I missed [:1] - why then there is a loop then?

for model_name in GPT_NEOX_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:

probably should write out explicitly the desired smallest real model then and perhaps it's small enough to fit?

sgugger · 2022-06-28T15:25:19Z

The main point is that GPT-Neo-X does not come with a smaller pretrained model, there is only the 20B version.

ydshieh · 2022-06-28T15:41:24Z

Ah, good point, I missed [:1] - why then there is a loop then?
for model_name in GPT_NEOX_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
probably should write out explicitly the desired smallest real model then and perhaps it's small enough to fit?

I think this is from old code. We don't want to maintain ...PRETRAINED_MODEL_ARCHIVE_LIST anymore, and for some models, we do use the explicit checkpoint name.

I will just remove the 2 tests here.

ydshieh · 2022-06-28T18:18:07Z

Removed. Will rebase on main later to see if tests all pass

ydshieh · 2022-07-01T01:52:10Z

I am ready for the merge :-)

* skip some gpt_neox tests that require 80G RAM * remove tests * fix quality Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

skip some gpt_neox tests that require 80G RAM

10e157c

ydshieh requested review from sgugger and stas00 June 28, 2022 12:23

sgugger reviewed Jun 28, 2022

View reviewed changes

ydshieh added 2 commits June 28, 2022 18:34

remove tests

08f5e78

fix quality

7d08d5d

Merge branch 'main' into skip_a_gpt_neox_test

a9e0f0a

sgugger approved these changes Jul 1, 2022

View reviewed changes

sgugger merged commit 14fb8a6 into huggingface:main Jul 1, 2022

ydshieh deleted the skip_a_gpt_neox_test branch September 7, 2022 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skip some gpt_neox tests that require 80G RAM#17923

skip some gpt_neox tests that require 80G RAM#17923
sgugger merged 4 commits intohuggingface:mainfrom
ydshieh:skip_a_gpt_neox_test

ydshieh commented Jun 28, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jun 28, 2022 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

stas00 commented Jun 28, 2022 •

edited

Loading

Uh oh!

ydshieh commented Jun 28, 2022 •

edited

Loading

Uh oh!

stas00 commented Jun 28, 2022

Uh oh!

sgugger commented Jun 28, 2022

Uh oh!

ydshieh commented Jun 28, 2022 •

edited

Loading

Uh oh!

ydshieh commented Jun 28, 2022

Uh oh!

ydshieh commented Jul 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ydshieh commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

stas00 commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas00 commented Jun 28, 2022

Uh oh!

sgugger commented Jun 28, 2022

Uh oh!

ydshieh commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented Jun 28, 2022

Uh oh!

ydshieh commented Jul 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ydshieh commented Jun 28, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 28, 2022 •

edited

Loading

stas00 commented Jun 28, 2022 •

edited

Loading

ydshieh commented Jun 28, 2022 •

edited

Loading

ydshieh commented Jun 28, 2022 •

edited

Loading