[fsmt test] basic config test with online model + super tiny model #7860

stas00 · 2020-10-17T01:27:45Z

This PR does:

Seeing the issue of [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies #7659 I realized fsmt didn't have a basic non-slow test that loads the tokenizer via online model files. Luckily a totally unrelated examples test caught this issue in that PR, so adding a very simple quick test in the main test suite, so that it runs by the normal CI.
while I was building this test, I needed a new tiny model, so I refined the script https://github.com/huggingface/transformers/blob/master/scripts/fsmt/fsmt-make-tiny-model.py and made a new one that creates a 50 times smaller model, so we are now at 60KB, instead of 3MB.

thomwolf · 2020-10-17T10:01:41Z

Hi Stas, thanks, #7659 will fix this (we will now require at least one example checkpoint for each tokenizer and we test it automatically).

stas00 · 2020-10-17T16:27:49Z

@thomwolf, I'm not certain why you closed this. This is a tokenizer test that is needed - the issue caught in examples was just a flag that there was a missing test in the normal tests.

Actually, not only it's needed, I will have to expand this test to verify that it doesn't get the hardcoded default values, but fetches the correct values from tokenizer_config.json. And to change the default values to be different from TINY_FSMT, since otherwise it won't be testing the right thing.

Feel free to add it as part of #7659 but please make sure you used different from TINY_FSMT hardcoded defaults.

I hope this makes sense.

stas00 · 2020-10-17T16:32:29Z

Hmm, but you copied TINY_FSMT in 1885ca7, how will then the tests check it can fetch that data if this is now hardcoded? I am not following this. Do I need to create TINY_FSMT2 with different values?

I haven't read the new code in depth, but my gut feeling is that the defaults may mask a problem.

thomwolf · 2020-10-17T20:47:01Z

Hi stas, I'll let you read the new code and then we can have a look together.

The basic idea is that we now require a full and working checkpoint for the tokenizers to be fully tested in various conditions and the slow vs. fast compared.

The question of testing that tokenizers load and use tokenizer_config.json is another question and we should indeed address it in a subsequent PR if it's not addressed already indeed.

stas00 · 2020-10-17T21:31:25Z

That works.

But please re-open this PR, since we need it anyway. I will add more changes to it after your big PR merge to ensure that the loading of the tokenizer is properly tested.

stas00 · 2020-10-21T21:36:45Z

This PR is complete now.

tests/test_tokenization_fsmt.py

LysandreJik

LGTM

basic config test with online model

54d6fcb

stas00 changed the title ~~basic config test with online model~~ [fstm test] basic config test with online model Oct 17, 2020

stas00 changed the title ~~[fstm test] basic config test with online model~~ [fsmt test] basic config test with online model Oct 17, 2020

stas00 added 2 commits October 16, 2020 18:28

typo

d7f25ac

style

246f60f

thomwolf closed this Oct 17, 2020

stas00 mentioned this pull request Oct 17, 2020

[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies #7659

Merged

2 tasks

thomwolf reopened this Oct 18, 2020

stas00 changed the title ~~[fsmt test] basic config test with online model~~ [wip] [fsmt test] basic config test with online model Oct 21, 2020

stas00 added 2 commits October 21, 2020 13:53

Merge remote-tracking branch 'origin/master' into fsmt-test-tiny

21edb6f

better test

ebd02f8

stas00 changed the title ~~[wip] [fsmt test] basic config test with online model~~ [fsmt test] basic config test with online model Oct 21, 2020

stas00 changed the title ~~[fsmt test] basic config test with online model~~ [fsmt test] basic config test with online model + super tiny model Oct 21, 2020

sshleifer approved these changes Oct 21, 2020

View reviewed changes

tests/test_tokenization_fsmt.py Show resolved Hide resolved

sshleifer requested a review from LysandreJik October 21, 2020 23:09

sshleifer approved these changes Oct 22, 2020

View reviewed changes

tests/test_tokenization_fsmt.py Show resolved Hide resolved

LysandreJik approved these changes Oct 22, 2020

View reviewed changes

LysandreJik merged commit 64b4d25 into huggingface:master Oct 22, 2020

stas00 deleted the fsmt-test-tiny branch October 22, 2020 15:32

stas00 mentioned this pull request Oct 22, 2020

[PretrainedConfig] Fix save pretrained config for edge case #7943

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fsmt test] basic config test with online model + super tiny model #7860

[fsmt test] basic config test with online model + super tiny model #7860

Uh oh!

stas00 commented Oct 17, 2020 •

edited

Loading

Uh oh!

thomwolf commented Oct 17, 2020

Uh oh!

stas00 commented Oct 17, 2020 •

edited

Loading

Uh oh!

stas00 commented Oct 17, 2020 •

edited

Loading

Uh oh!

thomwolf commented Oct 17, 2020

Uh oh!

stas00 commented Oct 17, 2020 •

edited

Loading

Uh oh!

stas00 commented Oct 21, 2020

Uh oh!

Uh oh!

Uh oh!

LysandreJik left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[fsmt test] basic config test with online model + super tiny model #7860

[fsmt test] basic config test with online model + super tiny model #7860

Uh oh!

Conversation

stas00 commented Oct 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomwolf commented Oct 17, 2020

Uh oh!

stas00 commented Oct 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas00 commented Oct 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomwolf commented Oct 17, 2020

Uh oh!

stas00 commented Oct 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas00 commented Oct 21, 2020

Uh oh!

Uh oh!

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stas00 commented Oct 17, 2020 •

edited

Loading

stas00 commented Oct 17, 2020 •

edited

Loading

stas00 commented Oct 17, 2020 •

edited

Loading

stas00 commented Oct 17, 2020 •

edited

Loading