-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Pass datasets trust_remote_code #31406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass datasets trust_remote_code #31406
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
I think this is more intricate than I first thought. What about if I make another PR to pin upper version of datasets and then handle this PR more calmly? |
|
For example, the "hf-internal-testing/librispeech_asr_dummy" dataset name is passed in this test:
Should we define it in |
Sounds good!
Happy with that 👍 |
This reverts commit b767282.
This reverts commit 833fc17.
ydshieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @albertvillanova ! LGTM, but I have 2 questions:
-
Any reason that we don't have this
trust_remote_dataset_codein all files inexamplesthat hasclass DataTrainingArguments? (it seems this PR only adds to some files) -
I would like to have @muellerzr 's comment as he deals with
examplesfolder frequently -
I guess the name
trust_remote_dataset_codeis because we have usedtrust_remote_codeforModelArguments, right? Would like @amyeroberts 's opinion on this too.
So if the argument already exists in |
Yeah, I would love the latter if |
|
Thanks for your reviews, @amyeroberts and @ydshieh. So, you would prefer to have just a single load_dataset(
data_args.dataset_name,
data_args.dataset_config_name,
trust_remote_code=model_args.trust_remote_code,Just to be sure, because I find weird to use a |
|
Hi @albertvillanova Not really. What I am thinking is:
|
|
The assignment of the argument by In this PR, I will use
|
Thanks for checking @albertvillanova. That part is rather on our side, but if you want to take it, that would be much appreciated.
Works for me :-) |
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for handling all of these cases!
What does this PR do?
Next release of
datasetswill raise an error for script-datasets iftrust_remote_codeis not set to True.Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.