Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 7 additions & 9 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ assignees: ''
---

**Describe the bug**
A clear and concise description of what the bug is. For questions and community discussion, please create a discussion (https://github.com/weecology/DeepForest/discussions).
A clear and concise description of what the bug is. For questions and community discussion, please create a discussion (https://github.com/weecology/DeepForest/discussions).

**To Reproduce**
If possible provide a simple code example, using data from the package itself, that reproduces the behavior. The code block below is a starting point. Issues without reproducible code that we can use to explore the problem are much more difficult to understand and debug and so are much less likely to be addressed quickly. Spending some time creating a reproducible example makes it easier for us to help.
Expand All @@ -24,22 +24,20 @@ m = main.deepforest()
m.use_release()

# Use package data for simple training example
m.config["train"]["csv_file"] = get_data("example.csv")
m.config["train"]["root_dir"] = os.path.dirname(get_data("example.csv"))
m.config["train"]["fast_dev_run"] = True
m.config.train.csv_file = get_data("example.csv")
m.config.train.root_dir = os.path.dirname(get_data("example.csv"))
m.config.train.fast_dev_run = True
m.trainer.fit(m)
```

**Environment (please complete the following information):**
- OS:
- Python version and environment :
- OS:
- Python version and environment :

**Screenshots and Context**
If applicable, add screenshots to help explain your problem. Please paste entire code instead of a snippet!
If applicable, add screenshots to help explain your problem. Please paste entire code instead of a snippet!

**User Story**
Tell us about who you are and what you hope to achieve with DeepForest

“As a [type of user] I want [my goal] so that [my reason].”


2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

include src/deepforest/deepforest_config.yml
include src/deepforest/conf/config.yml
include src/deepforest/data/testfile_deepforest.csv
include src/deepforest/data/testfile_multi.csv
include src/deepforest/data/classes.csv
Expand Down
22 changes: 11 additions & 11 deletions docs/user_guide/05_model_architecture.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Extending DeepForest with Custom Models and Dataloaders

DeepForest allows users to specify custom model architectures if they follow certain guidelines.
DeepForest allows users to specify custom model architectures if they follow certain guidelines.
To create a compliant format, follow the recipe below.

## Subclass the model.Model() structure
Expand All @@ -14,7 +14,7 @@ import torch

class Model():
"""A architecture agnostic class that controls the basic train, eval and predict functions.
A model should optionally allow a backbone for pretraining. To add new architectures, simply create a new module in models/ and write a create_model.
A model should optionally allow a backbone for pretraining. To add new architectures, simply create a new module in models/ and write a create_model.
Then add the result to the if else statement below.
Args:
num_classes (int): number of classes in the model
Expand All @@ -30,11 +30,11 @@ class Model():

# Check input output format:
self.check_model()

def create_model():
"""This function converts a deepforest config file into a model. An architecture should have a list of nested arguments in config that match this function"""
raise ValueError("The create_model class method needs to be implemented. Take in args and return a pytorch nn module.")

def check_model(self):
"""
Ensure that model follows deepforest guidelines
Expand All @@ -44,7 +44,7 @@ class Model():
test_model = self.create_model()
test_model.eval()

# Create a dummy batch of 3 band data.
# Create a dummy batch of 3 band data.
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]

predictions = test_model(x)
Expand Down Expand Up @@ -80,17 +80,17 @@ For train/test
from deepforest import main

m = main.deepforest()
existing_loader = m.load_dataset(csv_file=m.config["train"]["csv_file"],
root_dir=m.config["train"]["root_dir"],
batch_size=m.config["batch_size"])
existing_loader = m.load_dataset(csv_file=m.config.train.csv_file,
root_dir=m.config.train.root_dir,
batch_size=m.config.batch_size)

# Can be passed directly to main.deepforest(existing_train_dataloader) or reassign to existing deepforest object
m.existing_train_dataloader_loader
m.create_trainer()
m.trainer.fit()
```

For prediction directly on a dataloader, we need a dataloader that yields images, see [TileDataset](https://deepforest.readthedocs.io/en/latest/source/deepforest.html#deepforest.dataset.TileDataset) for an example. Any dataloader could be supplied to m.trainer.predict as long as it meets this specification.
For prediction directly on a dataloader, we need a dataloader that yields images, see [TileDataset](https://deepforest.readthedocs.io/en/latest/source/deepforest.html#deepforest.dataset.TileDataset) for an example. Any dataloader could be supplied to m.trainer.predict as long as it meets this specification.

```python
import numpy as np
Expand All @@ -101,5 +101,5 @@ ds = dataset.TileDataset(tile=np.random.random((400,400,3)).astype("float32"), p
existing_loader = m.predict_dataloader(ds)

batches = m.trainer.predict(m, existing_loader)
len(batches[0]) == m.config["batch_size"] + 1
```
len(batches[0]) == m.config.batch_size + 1
```
40 changes: 20 additions & 20 deletions docs/user_guide/09_configuration_file.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

Deepforest uses a config.yml to control hyperparameters related to model training and evaluation. This allows all the relevant parameters to live in one location and be easily changed when exploring new models.

DeepForest includes a default sample config file named deepforest_config.yml. Users have the option to override this file by creating their own custom config file. Initially, DeepForest scans the current working directory for the file. If it's not found there, it automatically resorts to using the default configuration.
DeepForest includes a default sample config file named config.yml. Users have the option to override this file by creating their own custom config file. Initially, DeepForest scans the current working directory for the file. If it's not found there, it automatically resorts to using the default configuration.

You can edit this file to change settings while developing models. Please note that if you would like for deepforest to save the config file on reload (using deepforest.save_model),
You can edit this file to change settings while developing models. Please note that if you would like for deepforest to save the config file on reload (using deepforest.save_model),
the config.yml must be updated instead of updating the dictionary of an already loaded model.

```
Expand All @@ -30,7 +30,7 @@ retinanet:
train:
csv_file:
root_dir:

# Optimizer initial learning rate
lr: 0.001
scheduler:
Expand All @@ -39,7 +39,7 @@ train:
# Common parameters
T_max: 10
eta_min: 0.00001
lr_lambda: "lambda epoch: 0.95 ** epoch" # For lambdaLR and multiplicativeLR
lr_lambda: "0.95 ** epoch" # For lambdaLR and multiplicativeLR
step_size: 30 # For stepLR
gamma: 0.1 # For stepLR, multistepLR, and exponentialLR
milestones: [50, 100] # For multistepLR
Expand All @@ -60,10 +60,10 @@ train:
fast_dev_run: False
# pin images to GPU memory for fast training. This depends on GPU size and number of images.
preload_images: False

validation:
# callback args
csv_file:
csv_file:
root_dir:
# Intersection over union evaluation
iou_threshold: 0.4
Expand All @@ -72,22 +72,22 @@ validation:
```
## Passing config arguments at runtime using a dict

It can often be useful to pass config args directly to a model instead of editing the config file. By using a dict containing the config keys and their values. Values provided in this dict will override values provided in deepforest_config.yml.
It can often be useful to pass config args directly to a model instead of editing the config file. By using a dict containing the config keys and their values. Values provided in this dict will override values provided in config.yaml.

```python
from deepforest import main

# Default model has 1 class
m = main.deepforest()
print(m.config["num_classes"])
print(m.config.num_classes)

# But we can override using config args, make sure to specify a new label dict.
m = main.deepforest(config_args={"num_classes":2}, label_dict={"Alive":0,"Dead":1})
print(m.config["num_classes"])
print(m.config.num_classes)

# These can also be nested for train and val arguments
m = main.deepforest(config_args={"train":{"epochs":7}})
print(m.config["train"]["epochs"])
print(m.config.train.epochs)
```

## Dataloaders
Expand Down Expand Up @@ -128,7 +128,7 @@ Score threshold of predictions to keep. Predictions with less than this threshol
The score threshold can be updated anytime by modifying the config. For example, if you want predictions with boxes greater than 0.3, update the config

```python
m.config["score_thresh"] = 0.3
m.config.score_thresh = 0.3
```

This will be updated when you can predict_tile, predict_image, predict_file, or evaluate
Expand All @@ -137,7 +137,7 @@ This will be updated when you can predict_tile, predict_image, predict_file, or

### csv_file

Path to csv_file for training annotations. Annotations are `.csv` files with headers `image_path, xmin, ymin, xmax, ymax, label`. image_path are relative to the root_dir.
Path to csv_file for training annotations. Annotations are `.csv` files with headers `image_path, xmin, ymin, xmax, ymax, label`. image_path are relative to the root_dir.
For example this file should have entries like `myimage.tif` not `/path/to/myimage.tif`

### root_dir
Expand All @@ -151,18 +151,18 @@ Learning rate for the training optimization. By default the optimizer is stochas
```python
from torch import optim

optim.SGD(self.model.parameters(), lr=self.config["train"]["lr"], momentum=0.9)
optim.SGD(self.model.parameters(), lr=self.config.train.lr, momentum=0.9)
```

A learning rate scheduler is used to adjust the learning rate based on validation loss. The default scheduler is ReduceLROnPlateau:

```python
import torch
import torch

self.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(self.optimizer, mode='min',
factor=0.1, patience=10,
verbose=True, threshold=0.0001,
threshold_mode='rel', cooldown=0,
self.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(self.optimizer, mode='min',
factor=0.1, patience=10,
verbose=True, threshold=0.0001,
threshold_mode='rel', cooldown=0,
min_lr=0, eps=1e-08)
```
This default scheduler can be overridden by specifying a different scheduler in the config_args:
Expand Down Expand Up @@ -221,7 +221,7 @@ Optional validation dataloader to run during training.

### csv_file

Path to csv_file for validation annotations. Annotations are `.csv` files with headers `image_path, xmin, ymin, xmax, ymax, label`. image_path are relative to the root_dir.
Path to csv_file for validation annotations. Annotations are `.csv` files with headers `image_path, xmin, ymin, xmax, ymax, label`. image_path are relative to the root_dir.
For example this file should have entries like `myimage.tif` not `/path/to/myimage.tif`

### root_dir
Expand All @@ -230,5 +230,5 @@ Directory to search for images in the csv_file image_path column

### val_accuracy_interval

Compute and log the classification accuracy of the predicted results computed every X epochs.
Compute and log the classification accuracy of the predicted results computed every X epochs.
This incurs some reductions in speed of training and is most useful for multi-class models. To deactivate, set to an number larger than epochs.
23 changes: 10 additions & 13 deletions docs/user_guide/11_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,21 +30,18 @@ from deepforest import get_data
# Example run with short training
annotations_file = get_data("testfile_deepforest.csv")

# Initialize a DeepForest model instance to access configuration and training methods
m = main.deepforest()

m.config["epochs"] = 1
m.config["save-snapshot"] = False
m.config["train"]["csv_file"] = annotations_file
m.config["train"]["root_dir"] = os.path.dirname(annotations_file)
m.config.epochs = 1
m.config.save-snapshot = False
m.config.train.csv_file = annotations_file
m.config.train.root_dir = os.path.dirname(annotations_file)

m.create_trainer()
```

For debugging, its often useful to use the [fast_dev_run = True from pytorch lightning](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#fast-dev-run)

```python
m.config["train"]["fast_dev_run"] = True
m.config.train.fast_dev_run = True
```

See [config](https://deepforest.readthedocs.io/en/latest/ConfigurationFile.html) for full set of available arguments. You can also pass any [additional](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html) pytorch lightning argument to trainer.
Expand Down Expand Up @@ -249,22 +246,22 @@ see https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/
While it is impossible to anticipate the setup for all users, there are a few guidelines. First, a GPU-enabled processor is key. Training on a CPU can be done, but it will take much longer (100x) and is probably only done if needed. Using Google Colab can be beneficial but prone to errors. Once on the GPU, the configuration includes a "workers" argument. This connects to PyTorch's dataloader. As the number of workers increases, data is fed to the GPU in parallel. Increase the worker argument slowly, we have found that the optimal number of workers varies by system.

```
m.config["workers"] = 5
m.config.workers = 5
```

It is not foolproof, and occasionally 0 workers, in which data loading is run on the main thread, is optimal : https://stackoverflow.com/questions/73331758/can-ideal-num-workers-for-a-large-dataset-in-pytorch-be-0.

For large training runs, setting preload_images to True can be helpful.

```
m.config["preload_images"] = True
m.configpreload_images = True
```

This will load all data into GPU memory once, at the beginning of the run. This is great, but it requires you to have enough memory space to do so.
Similarly, increasing the batch size can speed up training. Like both of the options above, we have seen examples where performance (and accuracy) improves and decreases depending on batch size. Track experiment results carefully when altering batch size, since it directly [effects the speed of learning](https://www.baeldung.com/cs/learning-rate-batch-size).

```
m.config["batch_size"] = 10
m.config.batch_size = 10
```

Remember to call m.create_trainer() after updating the config dictionary.
Expand Down Expand Up @@ -311,9 +308,9 @@ from pytorch_lightning import Trainer
trainer = Trainer(
accelerator="gpu",
strategy="ddp",
devices=model.config["devices"],
devices=model.config.devices,
enable_checkpointing=False,
max_epochs=model.config["train"]["epochs"],
max_epochs=model.config.train.epochs,
logger=comet_logger
)
trainer.fit(m)
Expand Down
6 changes: 3 additions & 3 deletions docs/user_guide/12_evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ This was the original DeepForest metric, set to an IoU of 0.4. This means that a

There is an additional difference between ecological object detection methods like tree crowns and traditional computer vision methods. Instead of a single or set of easily differentiated ground truths, we could have 60 or 70 objects that overlap in an image. How do you best assign each prediction to each ground truth?

DeepForest uses the [hungarian matching algorithm](https://thinkautonomous.medium.com/computer-vision-for-tracking-8220759eee85) to assign predictions to ground truth based on maximum IoU overlap. This is slow compared to the methods above, and so isn't a good choice for running hundreds of times during model training see config["validation"]["val_accuracy_interval"] for setting the frequency of the evaluate callback for this metric.
DeepForest uses the [hungarian matching algorithm](https://thinkautonomous.medium.com/computer-vision-for-tracking-8220759eee85) to assign predictions to ground truth based on maximum IoU overlap. This is slow compared to the methods above, and so isn't a good choice for running hundreds of times during model training see config.validation.val_accuracy_interval for setting the frequency of the evaluate callback for this metric.

### Empty Frame Accuracy

Expand All @@ -46,8 +46,8 @@ These metrics are largely used during training to keep track of model performanc
m = main.deepforest()
csv_file = get_data("OSBS_029.csv")
root_dir = os.path.dirname(csv_file)
m.config["validation"]["csv_file"] = csv_file
m.config["validation"]["root_dir"] = root_dir
m.config.validation.csv_file = csv_file
m.config.validation.root_dir = root_dir
results = m.trainer.validate(m)
```
This creates a dictionary of the average IoU ('iou') as well as 'iou' for each class. Here there is just one class, 'Tree'. Then the COCO mAP scores. See Further Reading above for an explanation of mAP level scores. The val_bbox_regression is the loss function of the object detection box head, and the loss_classification is the loss function of the object classification head.
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ dependencies = [
"aiohttp",
"docformatter",
"huggingface_hub>=0.25.0",
"hydra-core",
"geopandas",
"matplotlib",
"nbqa",
Expand Down
4 changes: 2 additions & 2 deletions src/deepforest/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

def get_data(path):
"""Helper function to get package sample data."""
if path == "deepforest_config.yml":
return os.path.join(_ROOT, "deepforest_config.yml")
if path == "config.yml":
return os.path.join(_ROOT, "conf", "config.yml")
else:
return os.path.join(_ROOT, "data", path)
2 changes: 1 addition & 1 deletion src/deepforest/callbacks.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def log_images(self, pl_module):

# Add root_dir to the dataframe
if "root_dir" not in df.columns:
df["root_dir"] = pl_module.config["validation"]["root_dir"]
df["root_dir"] = pl_module.config.validation.root_dir

# Ensure color is correctly assigned
if self.color is None:
Expand Down
Loading