Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

[refactor] DataPipeline 1/n #188

Merged
merged 41 commits into from
Mar 25, 2021
Merged

[refactor] DataPipeline 1/n #188

merged 41 commits into from
Mar 25, 2021

Conversation

tchaton
Copy link
Contributor

@tchaton tchaton commented Mar 22, 2021

What does this PR do?

This PR adds DataPipeline without any changes to the model.

However, as the model need to be changed, it will be done as a hard switch and the following PRs will contain all tasks at once.

        General flow:
                                             load_sample

                                   per_sample_pre_tensor_transform

                                    per_sample_to_tensor_transform

                                  per_sample_post_tensor_transform

                                ┌────────────────┴───────────────────┐
  Move Data to main worker -->  │                                    │
                    per_sample_transform_on_device                collate
                                │                                    │
                            collate                          per_batch_transform
                                │                                    │ <-- Move Data to main worker
                    per_batch_transform_on_device      per_batch_transform_on_device
                                │                                    │
                                └─────────────────┬──────────────────┘

                                          model.predict_step

                                          per_batch_transform

                                              uncollate

                                          per_sample_transform

@pep8speaks
Copy link

pep8speaks commented Mar 22, 2021

Hello @tchaton! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-03-25 19:32:55 UTC

@tchaton tchaton self-assigned this Mar 22, 2021
@tchaton tchaton added this to the 0.2 milestone Mar 22, 2021
@tchaton tchaton requested a review from edenlightning as a code owner March 22, 2021 19:15
flash/data/utils.py Outdated Show resolved Hide resolved
@aribornstein
Copy link
Contributor

LGTM Let's meger it

@tchaton tchaton enabled auto-merge (squash) March 25, 2021 11:47
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about the middle of reading

flash/data/auto_dataset.py Outdated Show resolved Hide resolved
flash/data/auto_dataset.py Outdated Show resolved Hide resolved
flash/data/auto_dataset.py Outdated Show resolved Hide resolved
flash/data/auto_dataset.py Outdated Show resolved Hide resolved
flash/data/auto_dataset.py Show resolved Hide resolved
flash/data/data_pipeline.py Outdated Show resolved Hide resolved
flash/data/data_pipeline.py Outdated Show resolved Hide resolved
flash/data/data_pipeline.py Outdated Show resolved Hide resolved
flash/data/data_pipeline.py Outdated Show resolved Hide resolved
flash/data/data_pipeline.py Outdated Show resolved Hide resolved
@Borda Borda disabled auto-merge March 25, 2021 17:59
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how much the test covers real cases as many return values are not checked...

flash/data/data_pipeline.py Outdated Show resolved Hide resolved
flash/data/data_pipeline.py Outdated Show resolved Hide resolved
flash/data/data_pipeline.py Outdated Show resolved Hide resolved
flash/data/data_pipeline.py Outdated Show resolved Hide resolved
flash/data/data_pipeline.py Show resolved Hide resolved
tests/data/test_data_pipeline.py Outdated Show resolved Hide resolved
tests/data/test_data_pipeline.py Show resolved Hide resolved
Comment on lines 626 to 638
# assert preprocess.train_load_data_called
# assert preprocess.train_per_sample_pre_tensor_transform_called
# assert preprocess.train_collate_called
assert preprocess.train_per_batch_transform_on_device_called
# assert preprocess.val_load_data_called
# assert preprocess.val_load_sample_called
# assert preprocess.val_per_sample_to_tensor_transform_called
# assert preprocess.val_collate_called
assert preprocess.val_per_batch_transform_on_device_called
# assert preprocess.test_load_data_called
# assert preprocess.test_per_sample_to_tensor_transform_called
# assert preprocess.test_per_sample_post_tensor_transform_called
# assert preprocess.predict_load_data_called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you do not use these assets, remove them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not, it is related to the previous bug. Just a reminder on how to update the test.

tests/data/test_data_pipeline.py Show resolved Hide resolved
tests/data/test_flash_datamodule.py Show resolved Hide resolved
@tchaton tchaton enabled auto-merge (squash) March 25, 2021 18:50
@tchaton tchaton disabled auto-merge March 25, 2021 18:54
@tchaton tchaton enabled auto-merge (squash) March 25, 2021 18:59
@tchaton tchaton merged commit e1a2d5d into master Mar 25, 2021
@tchaton tchaton deleted the data_pipeline_1_n branch March 25, 2021 19:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants