Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Added Seq2Seq tasks #37

Merged
merged 43 commits into from
Feb 2, 2021
Merged

Added Seq2Seq tasks #37

merged 43 commits into from
Feb 2, 2021

Conversation

SeanNaren
Copy link
Contributor

@SeanNaren SeanNaren commented Feb 1, 2021

What does this PR do?

Add Translation/Summarization and general Seq2Seq Task. This is a first pass, and will need more iteration in the future:

Painpoints:

  • I didn't have enough time to train a mt5 model on translation, as this would be common for all translation tasks. This is due to requiring some additional pre-processing, and longer training times. As a stop gap, I've used a student model trained on the en/ro translation task as I think is the case in HF, but that shouldn't be what we deliver in the future.
  • Upload WMT translation sets in different languages!

Left to do:

  • Train Summarization model, ensure this works with predict
  • Clean up Translation predict output, check single string input works
  • Upload datasets/models to S3
  • Create tests replicating classification tests
  • In a separate PR (done by someone familiar with the doc structure if poss) we need to add these additional tasks

@SeanNaren SeanNaren requested review from Borda and carmocca February 1, 2021 12:35
@SeanNaren SeanNaren self-assigned this Feb 1, 2021
@codecov
Copy link

codecov bot commented Feb 1, 2021

Codecov Report

Merging #37 (cdf5859) into master (d5409bd) will decrease coverage by 0.26%.
The diff coverage is 76.05%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #37      +/-   ##
==========================================
- Coverage   81.17%   80.91%   -0.27%     
==========================================
  Files          32       46      +14     
  Lines        1036     1383     +347     
==========================================
+ Hits          841     1119     +278     
- Misses        195      264      +69     
Flag Coverage Δ
unittests 80.91% <76.05%> (-0.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
flash/text/seq2seq/core/finetuning.py 35.71% <35.71%> (ø)
flash/text/seq2seq/summarization/metric.py 53.44% <53.44%> (ø)
flash/text/seq2seq/core/model.py 68.25% <68.25%> (ø)
flash/text/seq2seq/translation/model.py 72.22% <72.22%> (ø)
flash/core/trainer.py 82.50% <75.00%> (+10.00%) ⬆️
flash/text/seq2seq/summarization/utils.py 75.00% <75.00%> (ø)
flash/text/seq2seq/summarization/model.py 82.35% <82.35%> (ø)
flash/text/seq2seq/summarization/data.py 84.61% <84.61%> (ø)
flash/text/seq2seq/translation/data.py 84.61% <84.61%> (ø)
flash/text/seq2seq/core/data.py 85.33% <85.33%> (ø)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5409bd...23aacf9. Read the comment docs.

@pep8speaks
Copy link

pep8speaks commented Feb 1, 2021

Hello @SeanNaren! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-02-02 14:18:05 UTC

tests/text/translation/test_model.py Outdated Show resolved Hide resolved
tests/text/translation/test_data.py Outdated Show resolved Hide resolved
tests/text/summarization/test_model.py Outdated Show resolved Hide resolved
tests/text/summarization/test_data.py Outdated Show resolved Hide resolved
flash_examples/predict/translate.py Outdated Show resolved Hide resolved
flash_examples/finetuning/summarization.py Outdated Show resolved Hide resolved
flash/text/seq2seq/__init__.py Outdated Show resolved Hide resolved
flash/text/seq2seq/summarization/data.py Outdated Show resolved Hide resolved
flash/text/seq2seq/core/data.py Outdated Show resolved Hide resolved
flash/text/seq2seq/core/data.py Outdated Show resolved Hide resolved
@SeanNaren SeanNaren marked this pull request as ready for review February 1, 2021 19:29
docs/source/reference/summarization.rst Show resolved Hide resolved
flash/text/seq2seq/core/model.py Outdated Show resolved Hide resolved
@SeanNaren SeanNaren requested a review from Borda February 2, 2021 13:15
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is flash_notebooks/.lock for?
why the strange notebook changes?

docs/source/reference/text_classification.rst Show resolved Hide resolved
docs/source/reference/translation.rst Show resolved Hide resolved
flash/text/seq2seq/core/model.py Outdated Show resolved Hide resolved
flash/text/seq2seq/core/model.py Outdated Show resolved Hide resolved
flash/text/seq2seq/summarization/metric.py Show resolved Hide resolved
flash/text/seq2seq/summarization/utils.py Outdated Show resolved Hide resolved
flash_examples/finetuning/translation.py Show resolved Hide resolved
flash_notebooks/generic_task.ipynb Show resolved Hide resolved
@SeanNaren SeanNaren enabled auto-merge (squash) February 2, 2021 14:24
@tchaton tchaton disabled auto-merge February 2, 2021 14:52
@tchaton tchaton enabled auto-merge (squash) February 2, 2021 14:52
@tchaton tchaton disabled auto-merge February 2, 2021 14:52
@tchaton tchaton merged commit 0446361 into master Feb 2, 2021
@tchaton tchaton deleted the feature/seq2seq branch February 2, 2021 14:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request task flash Task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants