Skip to content

Conversation

@fyvo
Copy link

@fyvo fyvo commented Apr 27, 2022

Added prompts for machine translation from WMT 14 en-fr (both directions, so these are two separate tasks, the prompt name tells the direction). Templates are close to ones used in GPT-3 and T5 papers. Most prompts come in two flavours: the source language is either explicit or must be inferred. One prompt is also translated in French. WMT14 datasets are in Huggingface Datasets but are unavailable from the promptsource interface. The filter_english_datasets method in utils.py had to be locally modified to include these (huge) datasets.

@thinkzink
Copy link
Collaborator

I would have changed "If the original version says" to "If the English version says", and I likewise would be hesitant about alternating between using "passage" vs. "sentence" in the prompt, especially considering e.g. the use of questions in the dataset. But maybe that's too pedantic?

@jzf2101
Copy link
Collaborator

jzf2101 commented Jun 27, 2022

@fyvo there are build errors. Could you merge with main and repush?

@VictorSanh
Copy link
Member

VictorSanh commented Jun 27, 2022

@fyvo could you run the test show_new_templates locally since you have already downloaded the datasets?

Merging back the main eval-hackathon branch into this PR won't solve the tests, since it's a GH action limitation (disk space limitation), even though you should do it cause a few things have been updated!

If they pass locally, then let's merge this!

@fyvo
Copy link
Author

fyvo commented Jun 28, 2022

@VictorSanh I just ran the test for wmt14/fr-en, the result is in your mailbox

Copy link
Member

@VictorSanh VictorSanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @fyvo !
I fixed a small bug in the prompt names
I ran all the tests locally, and they all pass except for wmt17/zh-en, wmt18/zh-en, and wmt19/zh-en, which is not a problem in the prompts but in HF datasets. The prompts lgtm though so still merging

see huggingface/datasets#4575

@VictorSanh VictorSanh merged commit 29afd13 into bigscience-workshop:eval-hackathon Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants