-
Notifications
You must be signed in to change notification settings - Fork 224
Dataset redesign for multi-gpu, multi-processing and multi-geometry #1057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
95319eb to
86105b3
Compare
|
okay, @ethanwhite , @jveitchmichaelis and @henrykironde, i've met the criteria I set out above. Clearly this is quite large and complex, but it is ready to be discussed. |
out on vacation, all pieces are resolved above, and if anything lingers we can address when he returns.
|
@jveitchmichaelis and @henrykironde this ready to be merged. I have confirmed and corrected edge cases. |
|
This should be passing now, @jveitchmichaelis let's get this merged, i'm worried we will start getting behind on other PRs and end up discouraging contributions if we let this massive thing hang. We can do follow up PRs if need. Once all tests pass let's have one more look and be done here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM unless @henrykironde has any further comment. Maybe just fix that mosiac typo.
| class_recall: a pandas dataframe of class level recall and precision with class sizes | ||
| """ | ||
|
|
||
| # If all empty ground truth, return 0 recall and precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"0 precision and undefined recall"?
|
|
||
|
|
||
| print(f"{mosaic_df.shape[0]} predictions kept after non-max suppression") | ||
| def mosiac(predictions, iou_threshold=0.1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename mosiac -> mosaic




Summary
This is a large PR aimed to create flexible dataset classes and predict_tile dataloader strategies. The deepforest.dataset.TreeDataset is one of the oldest parts of the codebase. Over time we have added other dataset classes, like TileDataset and RasterDataset, but there is no unifying structure or organization. There is a reason that single GPU predict_tile and the TreeDataset logic lasted 4 years, changing the structure required touching nearly every file in the codebase. It was far easier to redesign the datasets knowing that we have immediate need for refactoring for 2.0. Unpicking them in a halfway complete process would have made the next steps difficult for anyone else to contribute.
Motivation
Desired Dataset Functionality
Related
I tried #1047 and found that it didn't play with pytorch lightning, and was batch_size = 1.
Major improvements
Minor improvements
Co-pilot summary of code changes
This pull request introduces significant updates to the DeepForest project, including enhancements to prediction scaling, dataset handling, and configuration management. The changes focus on improving usability, performance, and modularity by refining documentation, restructuring datasets, and updating configuration files.
Enhancements to Prediction Scaling:
single,batch,window) for balancing CPU/GPU memory and utilization during inference. These strategies are configurable via thedataloader_strategyparameter.Dataset Refactoring:
TreeDataset,TileDataset, andRasterDatasetclasses fromsrc/deepforest/dataset.pyand modularized theBoundingBoxDatasetinto a new file,src/deepforest/datasets/cropmodel.py. This improves code organization and clarity. [1] [2]TileDatasetin the prediction example indocs/user_guide/16_prediction.mdto reflect the new file structure.Configuration Updates:
pin_imagesparameter topreload_imagesinsrc/deepforest/conf/config.yamlfor better clarity and added a newpredict.pin_memoryconfiguration option to control memory pinning during prediction. [1] [2]Code Simplification:
src/deepforest/callbacks.py, streamlining the file and reducing unnecessary dependencies.Faster tests
Refactored on_validation_epoch_end
Next steps
Additional issues need to be considered.