Skip to content

Conversation

@stas00
Copy link
Contributor

@stas00 stas00 commented Oct 17, 2020

This PR does:

@LysandreJik, @sshleifer

@stas00 stas00 changed the title basic config test with online model [fstm test] basic config test with online model Oct 17, 2020
@stas00 stas00 changed the title [fstm test] basic config test with online model [fsmt test] basic config test with online model Oct 17, 2020
@thomwolf
Copy link
Member

Hi Stas, thanks, #7659 will fix this (we will now require at least one example checkpoint for each tokenizer and we test it automatically).

@thomwolf thomwolf closed this Oct 17, 2020
@stas00
Copy link
Contributor Author

stas00 commented Oct 17, 2020

@thomwolf, I'm not certain why you closed this. This is a tokenizer test that is needed - the issue caught in examples was just a flag that there was a missing test in the normal tests.

Actually, not only it's needed, I will have to expand this test to verify that it doesn't get the hardcoded default values, but fetches the correct values from tokenizer_config.json. And to change the default values to be different from TINY_FSMT, since otherwise it won't be testing the right thing.

Feel free to add it as part of #7659 but please make sure you used different from TINY_FSMT hardcoded defaults.

I hope this makes sense.

@stas00
Copy link
Contributor Author

stas00 commented Oct 17, 2020

Hmm, but you copied TINY_FSMT in 1885ca7, how will then the tests check it can fetch that data if this is now hardcoded? I am not following this. Do I need to create TINY_FSMT2 with different values?

I haven't read the new code in depth, but my gut feeling is that the defaults may mask a problem.

@thomwolf
Copy link
Member

Hi stas, I'll let you read the new code and then we can have a look together.

The basic idea is that we now require a full and working checkpoint for the tokenizers to be fully tested in various conditions and the slow vs. fast compared.

The question of testing that tokenizers load and use tokenizer_config.json is another question and we should indeed address it in a subsequent PR if it's not addressed already indeed.

@stas00
Copy link
Contributor Author

stas00 commented Oct 17, 2020

That works.

But please re-open this PR, since we need it anyway. I will add more changes to it after your big PR merge to ensure that the loading of the tokenizer is properly tested.

@thomwolf thomwolf reopened this Oct 18, 2020
@stas00 stas00 changed the title [fsmt test] basic config test with online model [wip] [fsmt test] basic config test with online model Oct 21, 2020
@stas00 stas00 changed the title [wip] [fsmt test] basic config test with online model [fsmt test] basic config test with online model Oct 21, 2020
@stas00 stas00 changed the title [fsmt test] basic config test with online model [fsmt test] basic config test with online model + super tiny model Oct 21, 2020
@stas00
Copy link
Contributor Author

stas00 commented Oct 21, 2020

This PR is complete now.

@sshleifer sshleifer requested a review from LysandreJik October 21, 2020 23:09
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LysandreJik LysandreJik merged commit 64b4d25 into huggingface:master Oct 22, 2020
@stas00 stas00 deleted the fsmt-test-tiny branch October 22, 2020 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants