Add `output` argument to `Trainer.predict` and remove `DataPipelineState` #1157

ethanwharris · 2022-02-07T22:56:30Z

What does this PR do?

Adds output argument to Trainer.predict
Adds collate_fn and input_transform properties to the DatasetProcessor base class
Removes ProcessState (refacors text data to apply tokenization in the collate function instead)
Removes DataPipelineState
Trims DataPipeline

With the state gone, there is now no more magic happening in the background. This means that properties such as labels and parameters are required to be passed to the task by the user.

Fixes #920

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests? [not needed for typos/docs]
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2022-02-09T19:12:00Z

Codecov Report

Merging #1157 (055cf7d) into master (defbace) will decrease coverage by 0.29%.
The diff coverage is 94.73%.

❗ Current head 055cf7d differs from pull request most recent head 71773c7. Consider uploading reports for the commit 71773c7 to get more accurate results

@@            Coverage Diff             @@
##           master    #1157      +/-   ##
==========================================
- Coverage   89.26%   88.96%   -0.30%     
==========================================
  Files         286      285       -1     
  Lines       13045    12745     -300     
==========================================
- Hits        11644    11339     -305     
- Misses       1401     1406       +5

Flag	Coverage Δ
unittests	`88.96% <94.73%> (-0.30%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
flash/audio/classification/cli.py	`100.00% <ø> (ø)`
flash/audio/speech_recognition/data.py	`100.00% <ø> (ø)`
flash/core/data/process.py	`64.00% <ø> (+1.93%)`	⬆️
flash/core/data/splits.py	`96.66% <ø> (-0.48%)`	⬇️
flash/core/integrations/icevision/data.py	`95.55% <ø> (-0.19%)`	⬇️
flash/core/regression.py	`86.36% <ø> (-0.60%)`	⬇️
flash/core/utilities/stages.py	`70.00% <ø> (-3.92%)`	⬇️
flash/graph/classification/cli.py	`92.30% <ø> (ø)`
flash/image/classification/adapters.py	`82.62% <ø> (ø)`
flash/image/classification/cli.py	`84.61% <ø> (ø)`
... and 99 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update defbace...71773c7. Read the comment docs.

ethanwharris added 2 commits February 7, 2022 22:52

Initial commit

f792dd5

Updates

ff297ab

ethanwharris added Data Pipeline V2 enhancement New feature or request Refactor (Functional) labels Feb 8, 2022

ethanwharris added 25 commits February 8, 2022 14:52

Fix video

99cc5a1

Fix tabular

4f701a9

Fix graph

0b8c24a

Fix audio classification

e42dbbc

Fix object detection

416bbb9

Fix pointcloud

301e1c5

Fixes

42cbc69

Fixes

8ffe5f6

CLI fixes

2aaf7c0

Merge branch 'master' into feature/predict_output

bbf2e83

Fixes

673e8c2

Fixes

db06f3e

Remove data pipeline state

f60aca9

Fixes

9347acb

Fixes

68dd199

Fixes

922863a

Fixes

87a94cd

Fixes

acb99fa

Fixes

aa1930a

Fixes

e46add9

Fixes

0fe3aef

Fixes

a701714

Try to reduce memory footprint

5c3bf98

Fix tabular serving

599b763

Batch tokenize

ab3480b

ethanwharris changed the title ~~Add output argument to Trainer.predict and remove DataPipelineState~~ [PoC] Add output argument to Trainer.predict and remove DataPipelineState Feb 9, 2022

ethanwharris added 5 commits February 9, 2022 17:16

Try to reduce memory footprint

b82d130

Drop broken test

1748f11

Drop broken test

8b0c11f

Fixes

77081f9

Docs fixes

550b246

ethanwharris marked this pull request as ready for review February 9, 2022 19:06

ethanwharris requested review from ananyahjha93, Borda, carmocca, edenlightning, justusschock, kaushikb11 and tchaton as code owners February 9, 2022 19:06

ethanwharris added this to the v0.7 milestone Feb 9, 2022

ethanwharris changed the title ~~[PoC] Add output argument to Trainer.predict and remove DataPipelineState~~ Add output argument to Trainer.predict and remove DataPipelineState Feb 9, 2022

mergify bot added the has conflicts label Feb 9, 2022

Merge branch 'master' into feature/predict_output

9f3a301

mergify bot added has conflicts and removed has conflicts labels Feb 10, 2022

Merge branch 'master' into feature/predict_output

4767d43

mergify bot removed the has conflicts label Feb 14, 2022

ethanwharris added 5 commits February 14, 2022 13:03

Fixes

29e358a

Trigger CI

aba7042

Trigger CI

81e9f6e

Trigger CI

055cf7d

Update CHANGELOG.md

71773c7

ethanwharris merged commit 796c9c8 into master Feb 14, 2022

ethanwharris deleted the feature/predict_output branch February 14, 2022 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `output` argument to `Trainer.predict` and remove `DataPipelineState` #1157

Add `output` argument to `Trainer.predict` and remove `DataPipelineState` #1157

ethanwharris commented Feb 7, 2022 •

edited

Loading

codecov bot commented Feb 9, 2022 •

edited

Loading

Add output argument to Trainer.predict and remove DataPipelineState #1157

Add output argument to Trainer.predict and remove DataPipelineState #1157

Conversation

ethanwharris commented Feb 7, 2022 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Feb 9, 2022 • edited Loading

Codecov Report

Add `output` argument to `Trainer.predict` and remove `DataPipelineState` #1157

Add `output` argument to `Trainer.predict` and remove `DataPipelineState` #1157

ethanwharris commented Feb 7, 2022 •

edited

Loading

codecov bot commented Feb 9, 2022 •

edited

Loading