diff --git a/docs/source/deepforest.rst b/docs/source/deepforest.rst
index 1f3b5b732..9a2a89a20 100644
--- a/docs/source/deepforest.rst
+++ b/docs/source/deepforest.rst
@@ -8,6 +8,7 @@ Subpackages
    :maxdepth: 4
 
    deepforest.data
+   deepforest.datasets
 
 Submodules
 ----------
@@ -28,14 +29,6 @@ deepforest.callbacks module
    :undoc-members:
    :show-inheritance:
 
-deepforest.dataset module
--------------------------
-
-.. automodule:: deepforest.dataset
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
 deepforest.evaluate module
 --------------------------
 
diff --git a/docs/user_guide/07_scaling.md b/docs/user_guide/07_scaling.md
index 1118c2687..d7f3361e3 100644
--- a/docs/user_guide/07_scaling.md
+++ b/docs/user_guide/07_scaling.md
@@ -34,68 +34,28 @@ https://lightning.ai/docs/pytorch/latest/clouds/cluster_advanced.html#troublesho
 
 ## Prediction
 
-Often we have a large number of tiles we want to predict. DeepForest uses [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/) to scale inference. This gives us access to powerful tools for scaling without any changes to user code. DeepForest automatically detects whether you are running on GPU or CPU. The parallelization strategy is to run each tile on a separate GPU, we cannot parallelize crops from within the same tile across GPUs inside of main.predict_tile(). If you set m.create_trainer(accelerator="gpu", devices=4), and run predict_tile, you will only use 1 GPU per tile. This is because we need access to all crops to create a mosiac of the predictions.
+Often we have a large number of tiles we want to predict. DeepForest uses [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/) to scale inference. This gives us access to powerful tools for scaling without any changes to user code. DeepForest automatically detects whether you are running on GPU or CPU. 
 
-### Scaling prediction across multiple GPUs
+There are three dataset strategies that *balance cpu memory, gpu memory, and gpu utilization* using batch sizes. 
 
-There are a few situations in which it is useful to replicate the DeepForest module across many separate Python processes. This is especially helpful when we have a series of non-interacting tasks, often called 'embarrassingly parallel' processes. In these cases, no DeepForest instance needs to communicate with another instance. Rather than coordinating GPUs with the associated annoyance of overhead and backend errors, we can just launch separate jobs and let them finish on their own. One helpful tool in Python is [Dask](https://www.dask.org/). Dask is a wonderful open-source tool for coordinating large-scale jobs. Dask can be run locally, across multiple machines, and with an arbitrary set of resources.
+```python
+prediction_single = m.predict_tile(path=path, patch_size=300, dataloader_strategy="single")
+```
+The `dataloader_strategy` parameter has three options:
 
-### Example Dask and DeepForest integration using SLURM
+* **single**: Loads the entire image into CPU memory and passes individual windows to GPU.
 
-Imagine we have a list of images we want to predict using `deepforest.main.predict_tile()`. DeepForest does not allow multi-GPU inference within each tile, as it is too much of a headache to make sure the threads return the correct overlapping window. Instead, we can parallelize across tiles, such that each GPU takes a tile and performs an action. The general structure is to create a Dask client across multiple GPUs, submit each DeepForest `predict_tile()` instance, and monitor the results. In this example, we are using a SLURMCluster, a common job scheduler for large clusters. There are many similar ways to create a Dask client object that will be specific to a particular organization. The following arguments are specific to the University of Florida cluster, but will be largely similar to other SLURM naming conventions. We use the extra Dask package, `dask-jobqueue`, which helps format the call.
+* **batch**: Loads the entire image into GPU memory and creates views of the image as batches. Requires the entire tile to fit into GPU memory. CPU parallelization is possible for loading images.
 
+* **window**: Loads only the desired window of the image from the raster dataset. Most memory efficient option, but cannot parallelize across windows due to Python's Global Interpreter Lock, workers must be set to 0. 
 
-```python
-from dask_jobqueue import SLURMCluster
-from dask.distributed import Client
-
-cluster = SLURMCluster(processes=1,
-                        cores=10,
-                        memory="40 GB",
-                        walltime='24:00:00',
-                        job_extra=extra_args,
-                        extra=['--resources gpu=1'],
-                        nanny=False,
-                        scheduler_options={"dashboard_address": ":8787"},
-                        local_directory="/orange/idtrees-collab/tmp/",
-                        death_timeout=100)
-print(cluster.job_script())
-cluster.scale(10)
-
-dask_client = Client(cluster)
-```
+## Data Loading
 
-This job script gets a single GPUs with "40GB" of memory with 10 cpus. We then ask for 10 instances of this setup.
-Now that we have a dask client, we can send our custom function.
+DeepForest uses PyTorch's DataLoader for efficient data loading. One important parameter for scaling is the number of CPU workers, which controls parallel data loading using multiple CPU processes. This can be set 
 
-```python
-import os
-from deepforest import main
-
-def function_to_parallelize(tile):
-    m = main.deepforest()
-    m.load_model("weecology/deepforest-tree") # sub in the custom logic to load your own models
-    boxes = m.predict_tile(raster_path=tile)
-    # save the predictions using the tile pathname
-    filename = "{}.csv".format(os.path.splitext(os.path.basename(tile))[0])
-    filename = os.path.join(<savedir>,filename)
-    boxes.to_csv(filename)
-
-    return filename
 ```
-
-```python
-tiles = [<list of tiles to predict>]
-futures = []
-for tile in tiles:
-    future = client.submit(function_to_parallelize, tile)
-    futures.append(future)
+m.config["workers"] = 10
 ```
+0 workers runs without multiprocessing, workers > 1 runs with multiprocessing. Increase this value slowly, as IO constraints can lead to deadlocks among workers.
 
-We can wait to see the futures as they complete! Dask also has a beautiful visualization tool using bokeh.
 
-```python
-for x in futures:
-    completed_filename = x.result()
-    print(completed_filename)
-```
diff --git a/docs/user_guide/12_evaluation.md b/docs/user_guide/12_evaluation.md
index 20c56d12a..6490408fd 100644
--- a/docs/user_guide/12_evaluation.md
+++ b/docs/user_guide/12_evaluation.md
@@ -1,40 +1,71 @@
 # Evaluation
 
- We stress that evaluation data must be different from training data, as neural networks have millions of parameters and can easily memorize thousands of samples. Avoid random train-test splits, try to create test datasets that mimic downstream tasks. If you are predicting among temporal surveys or across imaging platforms, your train-test data should reflect these partitions. Random sampling is almost never the right choice, biological data often has high spatial, temporal or taxonomic correlation that makes it easier for your model to generalize, but will fail when pushed into new situations.
+DeepForest allows users to assess model performance compared to ground-truth data. 
 
-DeepForest provides several evaluation metrics. There is no one-size-fits all evaluation approach, and the user needs to consider which evaluation metric best fits the task. There is significant information online about the evaluation of object detection networks. Our philosophy is to provide a user with a range of statistics and visualizations. Always visualize results and trust your judgment. Never be guided by a single metric.
+## Summary
 
-## Further Reading
+1. Recall - the proportion of ground-truth objects correctly covered by predictions.
+2. Precision - the proportion of predictions that overlap ground-truth.
+3. Empty-frame accuracy - the proportion of ground-truth images that are currently predicted to have no objects of interest.
+4. iou - Intersection-over-Union, a computer vision metric that assesses how tightly a bounding box prediction overlaps with its matched ground-truth. 
+5. mAP - Mean-Average-Precision, a computer vision metric that assesses the performance of the model incoporating precision, recall and average score of true positives. See below.
 
-[MeanAveragePrecision in torchmetrics](https://medium.com/data-science-at-microsoft/how-to-smoothly-integrate-meanaverageprecision-into-your-training-loop-using-torchmetrics-7d6f2ce0a2b3)
+## Evaluation code
 
-[A general explanation of the mAP metric](https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173)
+The model's .evaluate method takes a set of labels in the form of a CSV file that includes paths to images and the coordinates of associated labels as well as thresholds to determine if a prediction is close enough to a label to be considered a match.
 
-[Comparing Object Detection Models](https://www.comet.com/site/blog/compare-object-detection-models-from-torchvision/)
+```python
+from deepforest import main, get_data
+m = main.deepforest()
+m.load_model("Weecology/deepforest-tree")
+# Sample data
+csv_file = get_data("OSBS_029.csv")
+results = m.evaluate(csv_file, iou_threshold=0.4)
+```
+
+This produces a dictionary that contains a detailed result comparison for each label, the aggregate metrics, the predictions data frame, and the ground truth data frame.
+
+## Evaluation Philosophy and Further Information
+
+ We stress that evaluation data must be different from training data, as neural networks have millions of parameters and can easily memorize thousands of samples. We also recommend creating test datasets that mimic downstream tasks instead of using random train-test splits. If you are predicting among temporal surveys or across imaging platforms, your train-test data should reflect these partitions. Random sampling is almost never the right choice, because biological data often has high spatial, temporal or taxonomic correlation that can result in overfitting or exaggerated evaluation metrics when using random splits.
+
+DeepForest provides several evaluation metrics. There is no one-size-fits all evaluation approach, and the user needs to consider which evaluation metric best fits the task. There is significant information online about the evaluation of object detection networks. Our philosophy is to provide users with a range of statistics and visualizations. Always visualize results and trust your judgment. Never be guided by a single metric.
 
-## Average Intersection over Union
+## Metrics
+
+### Average Intersection over Union
 DeepForest modules use torchmetric's [IntersectionOverUnion](https://torchmetrics.readthedocs.io/en/stable/detection/intersection_over_union.html) metric. This calculates the average overlap between predictions and ground truth boxes. This can be considered a general indicator of model performance but is not sufficient on its own for model evaluation. There are lots of reasons predictions might overlap with ground truth; for example, consider a model that covers an entire image with boxes. This would have a high IoU but a low value for model utility.
 
-## Mean-Average-Precision (mAP)
+### Mean-Average-Precision (mAP)
 mAP is the standard COCO evaluation metric and the most common for comparing computer vision models. It is useful as a summary statistic. However, it has several limitations for an ecological use case.
 
 1. Not intuitive and difficult to translate to ecological applications. Read the sections above and visualize the mAP metric, which is essentially the area under the precision-recall curve at a range of IoU values.
 2. The vast majority of biological applications use a fixed cutoff to determine an object of interest in an image. Perhaps in the future we will weight tree boxes by their confidence score, but currently we do things like, "All predictions > 0.4 score are considered positive detections". This does not connect well with the mAP metric.
 
-## Precision and Recall at a set IoU threshold.
+For information on how to calculate mAP, see the [torchmetrics documentation](https://torchmetrics.readthedocs.io/en/stable/detection/mean_average_precision.html) and further reading below.
+
+### Precision and Recall at a set IoU threshold.
 This was the original DeepForest metric, set to an IoU of 0.4. This means that all predictions that overlap a ground truth box at IoU > 0.4 are true positives. As opposed to the torchmetrics above, it is intuitive and matches downstream ecological tasks. The drawback is that it is slow, coarse, and does not fully reward the model for having high confidence scores on true positives.
 
 There is an additional difference between ecological object detection methods like tree crowns and traditional computer vision methods. Instead of a single or set of easily differentiated ground truths, we could have 60 or 70 objects that overlap in an image. How do you best assign each prediction to each ground truth?
 
-DeepForest uses the [hungarian matching algorithm](https://thinkautonomous.medium.com/computer-vision-for-tracking-8220759eee85) to assign predictions to ground truth based on maximum IoU overlap. This is slow compared to the methods above, and so isn't a good choice for running hundreds of times during model training see config.validation.val_accuracy_interval for setting the frequency of the evaluate callback for this metric.
+DeepForest uses the [hungarian matching algorithm](https://thinkautonomous.medium.com/computer-vision-for-tracking-8220759eee85) to assign predictions to ground truth based on maximum IoU overlap. This is slow compared to the methods above, and so isn't a good choice for running hundreds of times during model training see config.validation.val_accuracy_interval for setting the frequency of the evaluate callback for this metric. 
+
+When there are no true positives, this metric is undefined.
 
 ### Empty Frame Accuracy
 
 DeepForest allows the user to pass empty frames to evaluation by setting xmin, ymin, xmax, ymax to 0. This is useful for evaluating models on data that has empty frames. The empty frame accuracy is the proportion of empty frames that are contain no predictions. The 'label' column in this case is ignored, but must be one of the labels in the model to be included in the evaluation.
 
-# Calculating Evaluation Metrics
+### Further Reading
+
+[MeanAveragePrecision in torchmetrics](https://medium.com/data-science-at-microsoft/how-to-smoothly-integrate-meanaverageprecision-into-your-training-loop-using-torchmetrics-7d6f2ce0a2b3)
+
+[A general explanation of the mAP metric](https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173)
 
-## Torchmetrics and loss scores
+[Comparing Object Detection Models](https://www.comet.com/site/blog/compare-object-detection-models-from-torchvision/)
+
+### Evaluation loss and map scores
 
 These metrics are largely used during training to keep track of model performance. They are relatively fast and will be automatically run during training.
 
@@ -75,7 +106,7 @@ This creates a dictionary of the average IoU ('iou') as well as 'iou' for each c
 
 > **_Advanced tip:_**  Users can set the frequency of pytorch lightning evaluation using kwargs passed to main.deepforest.create_trainer(). For example [check_val_every_n_epochs](https://lightning.ai/docs/pytorch/stable/common/trainer.html#check-val-every-n-epoch).
 
-## Recall and Precision at a fixed IoU Score
+### Recall and Precision at a fixed IoU Score
 To get a recall and precision at a set IoU evaluation score, specify an annotations' file using the m.evaluate method.
 
 ```python
@@ -113,7 +144,7 @@ results["box_precision"]
 0.781
 ```
 
-### Worked example of calculating IoU and recall/precision values
+## Worked example of calculating IoU and recall/precision values
 To convert overlap among predicted and ground truth bounding boxes into measures of accuracy and precision, the most common approach is to compare the overlap using the intersection-over-union metric (IoU).
 IoU is the ratio between the area of the overlap between the predicted polygon box and the ground truth polygon box divided by the area of the combined bounding box region.
 
@@ -186,9 +217,9 @@ true_positive = sum(result["match"])
 recall = true_positive / result.shape[0]
 precision = true_positive / predictions.shape[0]
 recall
-0.819672131147541
+0.81967
 precision
-0.5494505494505495
+0.54945
 ```
 
 This can be stated as 81.97% of the ground truth boxes are correctly matched to a predicted box at IoU threshold of 0.4, and 54.94% of predicted boxes match a ground truth box.
@@ -203,7 +234,7 @@ This is a dictionary with keys
 
 ```
 result.keys()
-dict_keys(['results', 'box_precision', 'box_recall', 'class_recall'])
+dict_keys(['results', 'box_precision', 'box_recall', 'class_recall','predictions','ground_df'])
 ```
 
 The added class_recall dataframe is mostly relevant for multi-class problems, in which the recall and precision per class is given.
@@ -214,7 +245,8 @@ result["class_recall"]
 0  Tree     1.0    0.67033    61
 ```
 
-### How to average evaluation metrics across images?
+## How to average evaluation metrics across images?
+
 One important decision was how to average precision and recall across multiple images. Two reasonable options might be to take all predictions and all ground truth and compute the statistic on the entire dataset. This strategy makes more sense for evaluation data that is relatively homogenous across images. We prefer to take the average of per-image precision and recall. This helps balance the dataset if some images have many objects and others have few objects, such as when you are comparing multiple habitat types.
 Users are welcome to calculate their own statistics directly from the results dataframe.
 
@@ -228,7 +260,7 @@ result["results"].head()
 34             34         4  0.595862  ...       Tree  OSBS_029.tif   True
 ```
 
-### Evaluating tiles too large for memory
+## Evaluating tiles too large for memory
 
 The evaluation method uses deepforest.predict_image for each of the paths supplied in the image_path column. This means that the entire image is passed for prediction. This will not work for large images. The deepforest.predict_tile method does a couple things under the hood that need to be repeated for evaluation.
 
diff --git a/docs/user_guide/16_prediction.md b/docs/user_guide/16_prediction.md
index 73717390f..39084ceae 100644
--- a/docs/user_guide/16_prediction.md
+++ b/docs/user_guide/16_prediction.md
@@ -1,11 +1,16 @@
 # Prediction
 
 There are atleast four ways to make predictions with DeepForest.
-1. Predict an image using the command line
-2. Predict an image using [model.predict_image](https://deepforest.readthedocs.io/en/latest/source/deepforest.html#deepforest.main.deepforest.predict_image)
-3. Predict a tile using [model.predict_tile](https://deepforest.readthedocs.io/en/latest/source/deepforest.html#deepforest.main.deepforest.predict_tile)
-4. Predict a directory of using a csv file using [model.predict_file](https://deepforest.readthedocs.io/en/latest/source/deepforest.html#deepforest.main.deepforest.predict_file)
-5. Predict a batch of images using [model.predict_batch](https://deepforest.readthedocs.io/en/latest/source/deepforest.html#deepforest.main.deepforest.predict_batch)
+
+1. Predict an image using [model.predict_image](https://deepforest.readthedocs.io/en/latest/source/deepforest.html#deepforest.main.deepforest.predict_image). The entire image is passed to the model.
+
+2. Predict a large number, which we call a 'tile', using [model.predict_tile](https://deepforest.readthedocs.io/en/latest/source/deepforest.html#deepforest.main.deepforest.predict_tile). The tile is cut into smaller windows and each window is predicted. 
+
+3. Predict a directory of images using a csv file using [model.predict_file](https://deepforest.readthedocs.io/en/latest/source/deepforest.html#deepforest.main.deepforest.predict_file). Each unique image listed in a csv file is predicted.
+
+4. Predict a batch of images using [model.predict_batch](https://deepforest.readthedocs.io/en/latest/source/deepforest.html#deepforest.main.deepforest.predict_batch). This is useful when you have an existing dataloader from outside DeepForest that yields data in batches.
+
+In general, during inference, for large images it is most common to use predict_tile.
 
 ## Predict an image using the command line
 
@@ -40,6 +45,8 @@ To see the default configuration and to check what options you can set, you can
 
 ## Predict an image using model.predict_image
 
+This is most commonly used for small images or pre-cropped windows of large tiles. Passing a large tile to predict_image will lead to poor performance, use predict_tile. 
+
 ```python
 from deepforest import main
 from deepforest import get_data
@@ -81,6 +88,25 @@ predicted_raster = model.predict_tile(raster_path, patch_size=300, patch_overlap
 plot_results(predicted_raster)
 ```
 
+### dataloader-strategy
+
+An optional argument to predict_tile allows the user to control how to scale prediction of tiles and how the windows are created within tiles.
+
+```python
+prediction_single = m.predict_tile(path=path, patch_size=300, dataloader_strategy="single")
+```
+The `dataloader_strategy` parameter has three options:
+
+* **single**: Loads the entire image into CPU memory and passes individual windows to GPU.
+
+* **batch**: Loads the entire image into GPU memory and creates views of the image as batches. Requires the entire tile to fit into GPU memory. CPU parallelization is possible for loading images.
+
+* **window**: Loads only the desired window of the image from the raster dataset. Most memory efficient option, but cannot parallelize across windows due to Python's Global Interpreter Lock, workers must be set to 0. 
+
+![](../../www/dataloader-strategy.png)
+
+The image shows that the speed of the predict_tile function is related to the strategy, the number of images, and the number of dataloader workers, which is set in the deepforest config file. 
+
 ### Patch Size
 
    The *predict_tile* function is sensitive to *patch_size*, especially when using the prebuilt model on new data.
@@ -112,14 +138,14 @@ df = m.predict_file(csv_file, root_dir=os.path.dirname(csv_file))
 For existing dataloaders, the `predict_batch` function will return a list of dataframes, one for each batch. This is more efficient than using predict_image since multiple images can be processed in a single forward pass.
 
 ```python
-from deepforest import dataset
+from deepforest.datasets.training import BoxDataset
 from torch.utils.data import DataLoader
 import numpy as np
 from PIL import Image
 
 raster_path = get_data("OSBS_029.tif")
 tile = np.array(Image.open(raster_path))
-ds = dataset.TileDataset(tile=tile, patch_overlap=0.1, patch_size=100)
+ds = BoxDataset(tile=tile, patch_overlap=0.1, patch_size=100)
 dl = DataLoader(ds, batch_size=3)
 
 # Perform prediction
diff --git a/src/deepforest/callbacks.py b/src/deepforest/callbacks.py
index a8669c1de..cf972ae43 100644
--- a/src/deepforest/callbacks.py
+++ b/src/deepforest/callbacks.py
@@ -3,20 +3,11 @@
 and epoch kwargs."""
 
 from deepforest import visualize
-from matplotlib import pyplot as plt
-import pandas as pd
 import numpy as np
 import glob
-import tempfile
-import os
 import supervision as sv
 
 from pytorch_lightning import Callback
-from deepforest import dataset
-from deepforest import utilities
-from deepforest import predict
-
-import torch
 
 
 class images_callback(Callback):
@@ -51,7 +42,7 @@ def __init__(self,
 
     def log_images(self, pl_module):
         # It is not clear if this is per device, or per batch. If per batch, then this will not work.
-        df = pl_module.predictions[0]
+        df = pl_module.predictions
 
         # Limit to n images, potentially randomly selected
         if self.select_random:
@@ -88,7 +79,7 @@ def log_images(self, pl_module):
                   "skipping upload, images were saved to {}, "
                   "error was raised {}".format(self.savedir, e))
 
-    def on_validation_epoch_end(self, trainer, pl_module):
+    def on_validation_end(self, trainer, pl_module):
         if trainer.sanity_checking:  # optional skip
             return
 
diff --git a/src/deepforest/conf/config.yaml b/src/deepforest/conf/config.yaml
index b0e6204fa..693fb7d5c 100644
--- a/src/deepforest/conf/config.yaml
+++ b/src/deepforest/conf/config.yaml
@@ -60,14 +60,19 @@ train:
     epochs: 1
     # Useful debugging flag in pytorch lightning, set to True to get a single batch of training to test settings.
     fast_dev_run: False
-    # pin images to GPU memory for fast training. This depends on GPU size and number of images.
+    # preload images to GPU memory for fast training. This depends on GPU size and number of images.
     preload_images: False
 
 validation:
     # callback args
     csv_file:
     root_dir:
+    preload_images: False
+    size:
 
     # Intersection over union evaluation
     iou_threshold: 0.4
     val_accuracy_interval: 20
+
+predict:
+    pin_memory: False
diff --git a/src/deepforest/dataset.py b/src/deepforest/dataset.py
deleted file mode 100644
index 4b64e2314..000000000
--- a/src/deepforest/dataset.py
+++ /dev/null
@@ -1,324 +0,0 @@
-"""Dataset model.
-
-https://pytorch.org/docs/stable/torchvision/models.html#object-detection-instance-segmentation-and-person-keypoint-detection
-
-During training, the model expects both the input tensors, as well as a
-targets (list of dictionary), containing:
-
-boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2]
-format, with values between 0 and H and 0 and W
-
-labels (Int64Tensor[N]): the class label for each ground-truth box
-
-https://colab.research.google.com/github/benihime91/pytorch_retinanet/blob/master/demo.ipynb#scrollTo=0zNGhr6D7xGN
-"""
-import os
-import pandas as pd
-import numpy as np
-from torch.utils.data import Dataset
-import albumentations as A
-from albumentations.augmentations import functional as F
-from albumentations.pytorch import ToTensorV2
-import torch
-import typing
-from PIL import Image
-import rasterio as rio
-from deepforest import preprocess
-from rasterio.windows import Window
-from torchvision import transforms
-import slidingwindow
-import warnings
-import shapely
-
-
-def get_transform(augment):
-    """Albumentations transformation of bounding boxs."""
-    if augment:
-        transform = A.Compose(
-            [A.HorizontalFlip(p=0.5), ToTensorV2()],
-            bbox_params=A.BboxParams(format='pascal_voc', label_fields=["category_ids"]))
-
-    else:
-        transform = A.Compose([ToTensorV2()],
-                              bbox_params=A.BboxParams(format='pascal_voc',
-                                                       label_fields=["category_ids"]))
-
-    return transform
-
-
-class TreeDataset(Dataset):
-
-    def __init__(self,
-                 csv_file,
-                 root_dir,
-                 transforms=None,
-                 label_dict={"Tree": 0},
-                 train=True,
-                 preload_images=False):
-        """
-
-        Args:
-            csv_file (string): Path to a single csv file with annotations.
-            root_dir (string): Directory with all the images.
-            transform (callable, optional): Optional transform to be applied
-                on a sample.
-            label_dict: a dictionary where keys are labels from the csv column and values are numeric labels "Tree" -> 0
-
-        Returns:
-            If train, path, image, targets else image
-        """
-        self.annotations = pd.read_csv(csv_file)
-        self.root_dir = root_dir
-        if transforms is None:
-            self.transform = get_transform(augment=train)
-        else:
-            self.transform = transforms
-        self.image_names = self.annotations.image_path.unique()
-        self.label_dict = label_dict
-        self.train = train
-        self.image_converter = A.Compose([ToTensorV2()])
-        self.preload_images = preload_images
-
-        # Pin data to memory if desired
-        if self.preload_images:
-            print("Pinning dataset to GPU memory")
-            self.image_dict = {}
-            for idx, x in enumerate(self.image_names):
-                img_name = os.path.join(self.root_dir, x)
-                image = np.array(Image.open(img_name).convert("RGB")) / 255
-                self.image_dict[idx] = image.astype("float32")
-
-    def __len__(self):
-        return len(self.image_names)
-
-    def __getitem__(self, idx):
-
-        # Read image if not in memory
-        if self.preload_images:
-            image = self.image_dict[idx]
-        else:
-            img_name = os.path.join(self.root_dir, self.image_names[idx])
-            image = np.array(Image.open(img_name).convert("RGB")) / 255
-            image = image.astype("float32")
-
-        if self.train:
-            # select annotations
-            image_annotations = self.annotations[self.annotations.image_path ==
-                                                 self.image_names[idx]]
-            targets = {}
-
-            if "geometry" in image_annotations.columns:
-                targets["boxes"] = np.array([
-                    shapely.wkt.loads(x).bounds for x in image_annotations.geometry
-                ]).astype("float32")
-            else:
-                targets["boxes"] = image_annotations[["xmin", "ymin", "xmax",
-                                                      "ymax"]].values.astype("float32")
-
-            # Labels need to be encoded
-            targets["labels"] = image_annotations.label.apply(
-                lambda x: self.label_dict[x]).values.astype(np.int64)
-
-            # If image has no annotations, don't augment
-            if np.sum(targets["boxes"]) == 0:
-                boxes = torch.zeros((0, 4), dtype=torch.float32)
-                labels = torch.zeros(0, dtype=torch.int64)
-                # channels last
-                image = np.rollaxis(image, 2, 0)
-                image = torch.from_numpy(image).float()
-                targets = {"boxes": boxes, "labels": labels}
-
-                return self.image_names[idx], image, targets
-
-            augmented = self.transform(image=image,
-                                       bboxes=targets["boxes"],
-                                       category_ids=targets["labels"].astype(np.int64))
-            image = augmented["image"]
-
-            boxes = np.array(augmented["bboxes"])
-            boxes = torch.from_numpy(boxes).float()
-            labels = np.array(augmented["category_ids"])
-            labels = torch.from_numpy(labels.astype(np.int64))
-            targets = {"boxes": boxes, "labels": labels}
-
-            return self.image_names[idx], image, targets
-
-        else:
-            # Mimic the train augmentation
-            converted = self.image_converter(image=image)
-            return converted["image"]
-
-
-class TileDataset(Dataset):
-
-    def __init__(self,
-                 tile: typing.Optional[np.ndarray],
-                 preload_images: bool = False,
-                 patch_size: int = 400,
-                 patch_overlap: float = 0.05):
-        """
-
-        Args:
-            tile: an in memory numpy array.
-            patch_size (int): The size for the crops used to cut the input raster into smaller pieces. This is given in pixels, not any geographic unit.
-            patch_overlap (float): The horizontal and vertical overlap among patches
-            preload_images (bool): If true, the entire dataset is loaded into memory. This is useful for small datasets, but not recommended for large datasets since both the tile and the crops are stored in memory.
-
-        Returns:
-            ds: a pytorch dataset
-        """
-        if not tile.shape[2] == 3:
-            raise ValueError(
-                "Only three band raster are accepted. Channels should be the final dimension. Input tile has shape {}. Check for transparent alpha channel and remove if present"
-                .format(tile.shape))
-
-        self.image = tile
-        self.preload_images = preload_images
-        self.windows = preprocess.compute_windows(self.image, patch_size, patch_overlap)
-
-        if self.preload_images:
-            self.crops = []
-            for window in self.windows:
-                crop = self.image[window.indices()]
-                crop = preprocess.preprocess_image(crop)
-                self.crops.append(crop)
-
-    def __len__(self):
-        return len(self.windows)
-
-    def __getitem__(self, idx):
-        # Read image if not in memory
-        if self.preload_images:
-            crop = self.crops[idx]
-        else:
-            crop = self.image[self.windows[idx].indices()]
-            crop = preprocess.preprocess_image(crop)
-
-        return crop
-
-
-class RasterDataset:
-    """Dataset for predicting on raster windows.
-
-    Args:
-        raster_path (str): Path to raster file
-        patch_size (int): Size of windows to predict on
-        patch_overlap (float): Overlap between windows as fraction (0-1)
-    Returns:
-        A dataset of raster windows
-    """
-
-    def __init__(self, raster_path, patch_size, patch_overlap):
-        self.raster_path = raster_path
-        self.patch_size = patch_size
-        self.patch_overlap = patch_overlap
-
-        # Get raster shape without keeping file open
-        with rio.open(raster_path) as src:
-            width = src.shape[0]
-            height = src.shape[1]
-
-            # Check is tiled
-            if not src.is_tiled:
-                raise ValueError(
-                    "Out-of-memory dataset is selected, but raster is not tiled, "
-                    "leading to entire raster being read into memory and defeating "
-                    "the purpose of an out-of-memory dataset. "
-                    "\nPlease run: "
-                    "\ngdal_translate -of GTiff -co TILED=YES <input> <output> "
-                    "to create a tiled raster")
-        # Generate sliding windows
-        self.windows = slidingwindow.generateForSize(
-            height,
-            width,
-            dimOrder=slidingwindow.DimOrder.ChannelHeightWidth,
-            maxWindowSize=patch_size,
-            overlapPercent=patch_overlap)
-        self.n_windows = len(self.windows)
-
-    def __len__(self):
-        return self.n_windows
-
-    def __getitem__(self, idx):
-        """Get a window of the raster.
-
-        Args:
-            idx (int): Index of window to get
-
-        Returns:
-            crop (torch.Tensor): A tensor of shape (3, height, width)
-        """
-        window = self.windows[idx]
-
-        # Open, read window, and close for each operation
-        with rio.open(self.raster_path) as src:
-            window_data = src.read(window=Window(window.x, window.y, window.w, window.h))
-
-        # Convert to torch tensor and rearrange dimensions
-        window_data = torch.from_numpy(window_data).float()  # Convert to torch tensor
-        window_data = window_data / 255.0  # Normalize
-
-        return window_data  # Already in (C, H, W) format from rasterio
-
-
-def bounding_box_transform(augment=False):
-    data_transforms = []
-    data_transforms.append(transforms.ToTensor())
-    data_transforms.append(resnet_normalize)
-    data_transforms.append(transforms.Resize([224, 224]))
-    if augment:
-        data_transforms.append(transforms.RandomHorizontalFlip(0.5))
-    return transforms.Compose(data_transforms)
-
-
-resnet_normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
-                                        std=[0.229, 0.224, 0.225])
-
-
-class BoundingBoxDataset(Dataset):
-    """An in memory dataset for bounding box predictions.
-
-    Args:
-        df: a pandas dataframe with image_path and xmin,xmax,ymin,ymax columns
-        transform: a function to apply to the image
-        root_dir: the directory where the image is stored
-
-    Returns:
-        rgb: a tensor of shape (3, height, width)
-    """
-
-    def __init__(self, df, root_dir, transform=None, augment=False):
-        self.df = df
-
-        if transform is None:
-            self.transform = bounding_box_transform(augment=augment)
-        else:
-            self.transform = transform
-
-        unique_image = self.df['image_path'].unique()
-        assert len(unique_image
-                  ) == 1, "There should be only one unique image for this class object"
-
-        # Open the image using rasterio
-        self.src = rio.open(os.path.join(root_dir, unique_image[0]))
-
-    def __len__(self):
-        return len(self.df)
-
-    def __getitem__(self, idx):
-        row = self.df.iloc[idx]
-        xmin = row['xmin']
-        xmax = row['xmax']
-        ymin = row['ymin']
-        ymax = row['ymax']
-
-        # Read the RGB data
-        box = self.src.read(window=Window(xmin, ymin, xmax - xmin, ymax - ymin))
-        box = np.rollaxis(box, 0, 3)
-
-        if self.transform:
-            image = self.transform(box)
-        else:
-            image = box
-
-        return image
diff --git a/src/deepforest/datasets/cropmodel.py b/src/deepforest/datasets/cropmodel.py
new file mode 100644
index 000000000..0d0bdb6ab
--- /dev/null
+++ b/src/deepforest/datasets/cropmodel.py
@@ -0,0 +1,72 @@
+# Standard library imports
+import os
+
+# Third party imports
+import numpy as np
+import rasterio as rio
+from rasterio.windows import Window
+from torch.utils.data import Dataset
+from torchvision import transforms
+
+
+def bounding_box_transform(augment=False):
+    data_transforms = []
+    data_transforms.append(transforms.ToTensor())
+    data_transforms.append(resnet_normalize)
+    data_transforms.append(transforms.Resize([224, 224]))
+    if augment:
+        data_transforms.append(transforms.RandomHorizontalFlip(0.5))
+    return transforms.Compose(data_transforms)
+
+
+resnet_normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
+                                        std=[0.229, 0.224, 0.225])
+
+
+class BoundingBoxDataset(Dataset):
+    """An in memory dataset for bounding box predictions.
+
+    Args:
+        df: a pandas dataframe with image_path and xmin,xmax,ymin,ymax columns
+        transform: a function to apply to the image
+        root_dir: the directory where the image is stored
+
+    Returns:
+        rgb: a tensor of shape (3, height, width)
+    """
+
+    def __init__(self, df, root_dir, transform=None, augment=False):
+        self.df = df
+
+        if transform is None:
+            self.transform = bounding_box_transform(augment=augment)
+        else:
+            self.transform = transform
+
+        unique_image = self.df['image_path'].unique()
+        assert len(unique_image
+                  ) == 1, "There should be only one unique image for this class object"
+
+        # Open the image using rasterio
+        self.src = rio.open(os.path.join(root_dir, unique_image[0]))
+
+    def __len__(self):
+        return len(self.df)
+
+    def __getitem__(self, idx):
+        row = self.df.iloc[idx]
+        xmin = row['xmin']
+        xmax = row['xmax']
+        ymin = row['ymin']
+        ymax = row['ymax']
+
+        # Read the RGB data
+        box = self.src.read(window=Window(xmin, ymin, xmax - xmin, ymax - ymin))
+        box = np.rollaxis(box, 0, 3)
+
+        if self.transform:
+            image = self.transform(box)
+        else:
+            image = box
+
+        return image
diff --git a/src/deepforest/datasets/prediction.py b/src/deepforest/datasets/prediction.py
new file mode 100644
index 000000000..4f7733d20
--- /dev/null
+++ b/src/deepforest/datasets/prediction.py
@@ -0,0 +1,464 @@
+# Standard library imports
+import os
+from typing import List
+
+# Third party imports
+import numpy as np
+import rasterio as rio
+import slidingwindow
+import torch
+from PIL import Image
+from rasterio.windows import Window
+from torch.nn import functional as F
+from torch.utils.data import Dataset
+from torch.utils.data import default_collate
+import pandas as pd
+
+from deepforest.utilities import format_geometry
+
+# Local imports
+from deepforest import preprocess
+
+
+# Base prediction class
+class PredictionDataset(Dataset):
+    """This is the base class for all prediction datasets. It defines the
+    interface for all prediction datasets. It flexibly accepts a single image
+    or a list of images, a single path or a list of paths, and a patch_size and
+    patch_overlap.
+
+    Args:
+        image (PIL.Image.Image): A single image.
+        path (str): A single image path.
+        images (List[PIL.Image.Image]): A list of images.
+        paths (List[str]): A list of image paths.
+        patch_size (int): The size of the patches to extract.
+        patch_overlap (float): The overlap between patches.
+        size (int): The size of the image to resize to. Optional, if not provided, the image is not resized.
+    """
+
+    def __init__(self,
+                 image=None,
+                 path=None,
+                 images=None,
+                 paths=None,
+                 patch_size=400,
+                 patch_overlap=0,
+                 size=None):
+        self.image = image
+        self.images = images
+        self.path = path
+        self.paths = paths
+        self.patch_size = patch_size
+        self.patch_overlap = patch_overlap
+        self.size = size
+        self.items = self.prepare_items()
+
+    def _load_and_preprocess_image(self, image_path, image=None, size=None):
+        """Load and preprocess an image.
+
+        Datasets should load using PIL and transpose the image to (C, H,
+        W) before main.model.forward() is called.
+        """
+        if image is None:
+            image = Image.open(image_path)
+        else:
+            image = image
+        image = np.array(image)
+        if not image.shape[2] == 3:
+            raise ValueError(
+                "Only three band raster are accepted. Input tile has shape {}. Check for transparent alpha channel and remove if present"
+                .format(image.shape))
+
+        image = np.transpose(image, (2, 0, 1))
+        image = self.preprocess_crop(image, size)
+
+        return image
+
+    def preprocess_crop(self, image, size=None):
+        """Preprocess a crop to a float32 tensor between 0 and 1."""
+        image = np.array(image)
+        image = image / 255.0
+        image = image.astype(np.float32)
+
+        if size is not None:
+            image = self.resize_image(image, size)
+
+        return image
+
+    def resize_image(self, image, size):
+        """Resize an image to a new size."""
+        image = np.resize(image, (image.shape[0], size, size))
+        return image
+
+    def prepare_items(self):
+        """Prepare the items for the dataset.
+
+        This is used for special cases before the main.model.forward()
+        is called.
+        """
+        raise NotImplementedError("Subclasses must implement this method")
+
+    def __len__(self):
+        return len(self.items)
+
+    def __getitem__(self, idx):
+        """Get the item at the given index."""
+        return self.get_crop(idx)
+
+    def collate_fn(self, batch):
+        """Collate the batch into a single tensor."""
+        # Check if all images in batch have same dimensions
+        try:
+            return default_collate(batch)
+        except RuntimeError as e:
+            raise RuntimeError(
+                "Images in batch have different dimensions. Set validation.size in config.yaml to resize all images to a common size."
+            )
+
+    def get_crop_bounds(self, idx):
+        """Get the crop bounds at the given index, needed to mosaic
+        predictions."""
+        raise NotImplementedError("Subclasses must implement this method")
+
+    def get_crop(self, idx):
+        """Get the crop of the image at the given index."""
+        raise NotImplementedError("Subclasses must implement this method")
+
+    def get_image_basename(self, idx):
+        """Get the basename of the image at the given index."""
+        raise NotImplementedError("Subclasses must implement this method")
+
+    def determine_geometry_type(self, batched_result):
+        """Determine the geometry type of the batched result."""
+        # Assumes that all geometries are the same in a batch
+        if "boxes" in batched_result.keys():
+            geom_type = "box"
+        elif "points" in batched_result.keys():
+            geom_type = "point"
+        elif "polygons" in batched_result.keys():
+            geom_type = "polygon"
+        else:
+            raise ValueError("Unknown geometry type, prediction keys are {}".format(
+                batched_result.keys()))
+
+        return geom_type
+
+    def format_batch(self, batch, idx, sub_idx=None):
+        """Format the batch into a single dataframe.
+
+        Args:
+            batch (list): The batch to format.
+            idx (int): The index of the batch.
+            sub_idx (int): The index of the subbatch. If None, the index is the subbatch index.
+        """
+        if sub_idx is None:
+            sub_idx = idx
+        geom_type = self.determine_geometry_type(batch)
+        result = format_geometry(batch, geom_type=geom_type)
+        if result is None:
+            return None
+        result["window_xmin"] = self.get_crop_bounds(sub_idx)[0]
+        result["window_ymin"] = self.get_crop_bounds(sub_idx)[1]
+        result["image_path"] = self.get_image_basename(idx)
+
+        return result
+
+    def postprocess(self, batched_result):
+        """Postprocess the batched result into a single dataframe.
+
+        In the case of sub-batches, the index is the sub-batch index.
+        """
+        formatted_result = []
+        for idx, batch in enumerate(batched_result):
+            if isinstance(batch, list):
+                for sub_idx, sub_batch in enumerate(batch):
+                    result = self.format_batch(sub_batch, idx, sub_idx)
+                    if result is not None:
+                        formatted_result.append(result)
+            else:
+                result = self.format_batch(batch, idx)
+                if result is not None:
+                    formatted_result.append(result)
+
+        if len(formatted_result) > 0:
+            formatted_result = pd.concat(formatted_result)
+        else:
+            formatted_result = pd.DataFrame()
+
+        # reset index
+        formatted_result = formatted_result.reset_index(drop=True)
+
+        return formatted_result
+
+
+class SingleImage(PredictionDataset):
+    """Take in a single image path, preprocess and batch together."""
+
+    def __init__(self, path=None, image=None, patch_size=400, patch_overlap=0):
+        super().__init__(path=path,
+                         image=image,
+                         patch_size=patch_size,
+                         patch_overlap=patch_overlap)
+
+    def prepare_items(self):
+        self.image = self._load_and_preprocess_image(self.path, self.image)
+        self.windows = preprocess.compute_windows(self.image, self.patch_size,
+                                                  self.patch_overlap)
+
+    def __len__(self):
+        return len(self.windows)
+
+    def window_list(self):
+        return [x.getRect() for x in self.windows]
+
+    def get_crop(self, idx):
+        crop = self.image[self.windows[idx].indices()]
+
+        return crop
+
+    def get_image_basename(self, idx):
+        if self.path is not None:
+            return os.path.basename(self.path)
+        else:
+            return None
+
+    def get_crop_bounds(self, idx):
+        return self.windows[idx].getRect()
+
+
+class FromCSVFile(PredictionDataset):
+    """Take in a csv file with image paths and preprocess and batch
+    together."""
+
+    def __init__(self, csv_file: str, root_dir: str, size: int = None):
+        self.csv_file = csv_file
+        self.root_dir = root_dir
+        super().__init__(size=size)
+        self.prepare_items()
+
+    def prepare_items(self):
+        self.annotations = pd.read_csv(self.csv_file)
+        self.image_names = self.annotations.image_path.unique()
+        self.image_paths = [os.path.join(self.root_dir, x) for x in self.image_names]
+
+    def __len__(self):
+        return len(self.image_paths)
+
+    def get_crop(self, idx):
+        image = self._load_and_preprocess_image(self.image_paths[idx], size=self.size)
+        return image
+
+    def get_image_basename(self, idx):
+        return os.path.basename(self.image_paths[idx])
+
+    def get_crop_bounds(self, idx):
+        return None
+
+    def format_batch(self, batch, idx, sub_idx=None):
+        """Format the batch into a single dataframe.
+
+        Args:
+            batch (list): The batch to format.
+            idx (int): The index of the batch.
+            sub_idx (int): The index of the subbatch. If None, the index is the subbatch index.
+        """
+        if sub_idx is None:
+            sub_idx = idx
+        geom_type = self.determine_geometry_type(batch)
+        result = format_geometry(batch, geom_type=geom_type)
+        if result is None:
+            return None
+        result["image_path"] = self.get_image_basename(idx)
+
+        return result
+
+
+class MultiImage(PredictionDataset):
+    """Take in a list of image paths, preprocess and batch together."""
+
+    def __init__(self, paths: List[str], patch_size: int, patch_overlap: float):
+        """
+        Args:
+            paths (List[str]): List of image paths.
+            patch_size (int): Size of the patches to extract.
+            patch_overlap (float): Overlap between patches.
+        """
+        # Runtime type checking
+        if not isinstance(paths, list):
+            raise TypeError(f"paths must be a list, got {type(paths)}")
+
+        self.paths = paths
+        self.patch_size = patch_size
+        self.patch_overlap = patch_overlap
+
+        image = self._load_and_preprocess_image(self.paths[0])
+        self.image_height = image.shape[1]
+        self.image_width = image.shape[2]
+
+    def create_overlapping_views(self, input_tensor, size, overlap):
+        """Creates overlapping views of a 4D tensor.
+
+        Args:
+            input_tensor (torch.Tensor): A 4D tensor of shape [N, C, H, W].
+            size (int): The size of the sliding window (square).
+            overlap (int): The overlap between adjacent windows.
+
+        Returns:
+            torch.Tensor: A tensor containing all the overlapping views.
+                          The output shape is [N * num_windows, C, size, size].
+        """
+        # Get the input tensor shape
+        N, C, H, W = input_tensor.shape
+
+        # Calculate step size based on overlap
+        step = size - overlap
+
+        # Calculate number of patches needed in each dimension
+        n_patches_h = (H - overlap) // step + 1
+        n_patches_w = (W - overlap) // step + 1
+
+        # Calculate total padded dimensions needed
+        H_padded = n_patches_h * step + overlap
+        W_padded = n_patches_w * step + overlap
+
+        # Calculate padding needed
+        padding_h = max(0, H_padded - H)
+        padding_w = max(0, W_padded - W)
+
+        # Pad the input tensor
+        padded_tensor = F.pad(input_tensor, (0, padding_w, 0, padding_h))
+
+        # Use unfold to create views of the tensor
+        # This creates views rather than copies
+        unfolded_h = padded_tensor.unfold(2, size, step)  # unfold height dimension
+        unfolded = unfolded_h.unfold(3, size, step)  # unfold width dimension
+
+        # Reshape to [N * num_windows, C, size, size]
+        # This is still a view operation
+        output = unfolded.permute(0, 2, 3, 1, 4, 5).reshape(-1, C, size, size)
+
+        return output
+
+    def _create_patches(self, image):
+        image_tensor = torch.tensor(image).unsqueeze(0)  # Convert to (N, C, H, W)
+        patch_overlap_size = int(self.patch_size * self.patch_overlap)
+        patches = self.create_overlapping_views(image_tensor, self.patch_size,
+                                                patch_overlap_size)
+
+        return patches
+
+    def window_list(self):
+        """Get the original positions of patches in the image.
+
+        Returns:
+            list: List of tuples containing (x, y, w, h) coordinates of each patch
+        """
+        H = self.image_height
+        W = self.image_width
+
+        patch_overlap_size = int(self.patch_size * self.patch_overlap)
+        step = self.patch_size - patch_overlap_size
+
+        # Calculate number of patches needed in each dimension
+        n_patches_h = (H - patch_overlap_size) // step + 1
+        n_patches_w = (W - patch_overlap_size) // step + 1
+
+        # Generate window coordinates matching the unfolded tensor views
+        windows = []
+        for i in range(n_patches_h):
+            for j in range(n_patches_w):
+                y = i * step
+                x = j * step
+                # Only add window if it contains any real data
+                if (x < W and y < H):
+                    windows.append((x, y, self.patch_size, self.patch_size))
+
+        return windows
+
+    def collate_fn(self, batch):
+        # Comes pre-batched
+        return batch
+
+    def __len__(self):
+        return len(self.paths)
+
+    def get_crop(self, idx):
+        image = self._load_and_preprocess_image(self.paths[idx])
+        return self._create_patches(image)
+
+    def get_image_basename(self, idx):
+        return os.path.basename(self.paths[idx])
+
+    def get_crop_bounds(self, idx):
+        return self.window_list()[idx]
+
+
+class TiledRaster(PredictionDataset):
+    """Dataset for predicting on raster windows.
+
+    This dataset is useful for predicting on a large raster that is too large to fit into memory.
+
+    Args:
+        path (str): Path to raster file
+        patch_size (int): Size of windows to predict on
+        patch_overlap (float): Overlap between windows as fraction (0-1)
+    Returns:
+        A dataset of raster windows
+    """
+
+    def __init__(self, path, patch_size, patch_overlap):
+        self.path = path
+        self.patch_size = patch_size
+        self.patch_overlap = patch_overlap
+        self.prepare_items()
+
+        if path is None:
+            raise ValueError("path is required for a memory raster dataset")
+
+    def prepare_items(self):
+        # Get raster shape without keeping file open
+        with rio.open(self.path) as src:
+            width = src.shape[0]
+            height = src.shape[1]
+
+            # Check is tiled
+            if not src.is_tiled:
+                raise ValueError(
+                    "Out-of-memory dataset is selected, but raster is not tiled, "
+                    "leading to entire raster being read into memory and defeating "
+                    "the purpose of an out-of-memory dataset. "
+                    "\nPlease run: "
+                    "\ngdal_translate -of GTiff -co TILED=YES <input> <output> "
+                    "to create a tiled raster")
+
+        # Generate sliding windows
+        self.windows = slidingwindow.generateForSize(
+            height,
+            width,
+            dimOrder=slidingwindow.DimOrder.ChannelHeightWidth,
+            maxWindowSize=self.patch_size,
+            overlapPercent=self.patch_overlap)
+
+    def __len__(self):
+        return len(self.windows)
+
+    def window_list(self):
+        return [x.getRect() for x in self.windows]
+
+    def get_crop(self, idx):
+        window = self.windows[idx]
+        with rio.open(self.path) as src:
+            window_data = src.read(window=Window(window.x, window.y, window.w, window.h))
+
+        # Convert to torch tensor and rearrange dimensions
+        window_data = torch.from_numpy(window_data).float()  # Convert to torch tensor
+        window_data = window_data / 255.0  # Normalize
+
+        return window_data
+
+    def get_image_basename(self, idx):
+        return os.path.basename(self.path)
+
+    def get_crop_bounds(self, idx):
+        return self.window_list()[idx]
diff --git a/src/deepforest/datasets/training.py b/src/deepforest/datasets/training.py
new file mode 100644
index 000000000..b09331048
--- /dev/null
+++ b/src/deepforest/datasets/training.py
@@ -0,0 +1,132 @@
+"""Dataset model for object detection tasks."""
+
+# Standard library imports
+import os
+from typing import Dict, List, Optional, Union
+
+# Third party imports
+import numpy as np
+import pandas as pd
+import torch
+from torch.utils.data import Dataset
+from PIL import Image
+import albumentations as A
+from albumentations.pytorch import ToTensorV2
+import shapely
+
+
+def get_transform(augment: bool) -> A.Compose:
+    """Create Albumentations transformation for bounding boxes."""
+    bbox_params = A.BboxParams(format='pascal_voc', label_fields=["category_ids"])
+
+    if augment:
+        return A.Compose([A.HorizontalFlip(p=0.5), ToTensorV2()], bbox_params=bbox_params)
+    else:
+        return A.Compose([ToTensorV2()], bbox_params=bbox_params)
+
+
+class BoxDataset(Dataset):
+
+    def __init__(self,
+                 csv_file,
+                 root_dir,
+                 transforms=None,
+                 augment=True,
+                 label_dict={"Tree": 0},
+                 preload_images=False):
+        """
+        Args:
+            csv_file (string): Path to a single csv file with annotations.
+            root_dir (string): Directory with all the images.
+            transform (callable, optional): Optional transform to be applied
+                on a sample.
+            label_dict: a dictionary where keys are labels from the csv column and values are numeric labels "Tree" -> 0
+            augment: if True, apply augmentations to the images
+            preload_images: if True, preload the images into memory
+        Returns:
+            List of images and targets. Targets are dictionaries with keys "boxes" and "labels". Boxes are numpy arrays with shape (N, 4) and labels are numpy arrays with shape (N,).
+        """
+        self.annotations = pd.read_csv(csv_file)
+        self.root_dir = root_dir
+        if transforms is None:
+            self.transform = get_transform(augment=augment)
+        else:
+            self.transform = transforms
+        self.image_names = self.annotations.image_path.unique()
+        self.label_dict = label_dict
+        self.preload_images = preload_images
+
+        # Pin data to memory if desired
+        if self.preload_images:
+            print("Pinning dataset to GPU memory")
+            self.image_dict = {}
+            for idx, x in enumerate(self.image_names):
+                self.image_dict[idx] = self.load_image(idx)
+
+    def __len__(self):
+        return len(self.image_names)
+
+    def collate_fn(self, batch):
+        """Collate function for DataLoader."""
+        images = [item[0] for item in batch]
+        targets = [item[1] for item in batch]
+        image_names = [item[2] for item in batch]
+
+        return images, targets, image_names
+
+    def load_image(self, idx):
+        img_name = os.path.join(self.root_dir, self.image_names[idx])
+        image = np.array(Image.open(img_name).convert("RGB")) / 255
+        image = image.astype("float32")
+        return image
+
+    def __getitem__(self, idx):
+
+        # Read image if not in memory
+        if self.preload_images:
+            image = self.image_dict[idx]
+        else:
+            image = self.load_image(idx)
+
+        # select annotations
+        image_annotations = self.annotations[self.annotations.image_path ==
+                                             self.image_names[idx]]
+        targets = {}
+
+        if "geometry" in image_annotations.columns:
+            targets["boxes"] = np.array([
+                shapely.wkt.loads(x).bounds for x in image_annotations.geometry
+            ]).astype("float32")
+        else:
+            targets["boxes"] = image_annotations[["xmin", "ymin", "xmax",
+                                                  "ymax"]].values.astype("float32")
+
+        # Labels need to be encoded
+        targets["labels"] = image_annotations.label.apply(
+            lambda x: self.label_dict[x]).values.astype(np.int64)
+
+        # If image has no annotations, don't augment
+        if np.sum(targets["boxes"]) == 0:
+            boxes = torch.zeros((0, 4), dtype=torch.float32)
+            labels = torch.zeros(0, dtype=torch.int64)
+            # channels last
+            image = np.rollaxis(image, 2, 0)
+            image = torch.from_numpy(image).float()
+            targets = {"boxes": boxes, "labels": labels}
+
+            return image, targets, self.image_names[idx]
+
+        # Apply augmentations
+        augmented = self.transform(image=image,
+                                   bboxes=targets["boxes"],
+                                   category_ids=targets["labels"].astype(np.int64))
+        image = augmented["image"]
+
+        # Convert boxes to tensor
+        boxes = np.array(augmented["bboxes"])
+        boxes = torch.from_numpy(boxes).float()
+        labels = np.array(augmented["category_ids"])
+        labels = torch.from_numpy(labels.astype(np.int64))
+        targets = {"boxes": boxes, "labels": labels}
+
+        return image, targets, self.image_names[idx]
diff --git a/src/deepforest/evaluate.py b/src/deepforest/evaluate.py
index fe2fb8ca1..b77a6f1cd 100644
--- a/src/deepforest/evaluate.py
+++ b/src/deepforest/evaluate.py
@@ -92,7 +92,9 @@ def __evaluate_wrapper__(predictions, ground_df, iou_threshold, numeric_to_label
             "results": None,
             "box_recall": 0,
             "box_precision": np.nan,
-            "class_recall": None
+            "class_recall": None,
+            "predictions": predictions,
+            "ground_df": ground_df
         }
         return results
 
@@ -113,7 +115,7 @@ def __evaluate_wrapper__(predictions, ground_df, iou_threshold, numeric_to_label
             "Geometry type {} not implemented".format(prediction_geometry))
 
     # replace classes if not NUll
-    if not results is None:
+    if not results["results"] is None:
         results["results"]["predicted_label"] = results["results"][
             "predicted_label"].apply(lambda x: numeric_to_label_dict[x]
                                      if not pd.isnull(x) else x)
@@ -141,6 +143,18 @@ def evaluate_boxes(predictions, ground_df, iou_threshold=0.4):
         box_precision: proportion of predictions that are true positive, regardless of class
         class_recall: a pandas dataframe of class level recall and precision with class sizes
     """
+
+    # If all empty ground truth, return 0 recall and precision
+    if ground_df.empty:
+        return {
+            "results": None,
+            "box_recall": None,
+            "box_precision": 0,
+            "class_recall": None,
+            "predictions": predictions,
+            "ground_df": ground_df
+        }
+
     # Run evaluation on all plots
     results = []
     box_recalls = []
@@ -191,7 +205,9 @@ def evaluate_boxes(predictions, ground_df, iou_threshold=0.4):
         "results": results,
         "box_precision": box_precision,
         "box_recall": box_recall,
-        "class_recall": class_recall
+        "class_recall": class_recall,
+        "predictions": predictions,
+        "ground_df": ground_df
     }
 
 
diff --git a/src/deepforest/main.py b/src/deepforest/main.py
index fa31012a0..d3e662deb 100644
--- a/src/deepforest/main.py
+++ b/src/deepforest/main.py
@@ -7,17 +7,23 @@
 import numpy as np
 import pandas as pd
 import pytorch_lightning as pl
-import rasterio as rio
 import torch
 from PIL import Image
 from pytorch_lightning.callbacks import LearningRateMonitor
+from copy import deepcopy
+import pytorch_lightning as L
 from torch import optim
 from torchmetrics.detection import IntersectionOverUnion, MeanAveragePrecision
 from torchmetrics.classification import BinaryAccuracy
 
 from huggingface_hub import PyTorchModelHubMixin
-from deepforest import dataset, visualize, get_data, utilities, predict
+from deepforest import utilities, predict
+
 from deepforest import evaluate as evaluate_iou
+from deepforest.datasets import prediction, training
+from deepforest.utilities import format_geometry
+
+import geopandas as gpd
 
 from omegaconf import DictConfig
 
@@ -44,8 +50,8 @@ def __init__(
         self,
         num_classes: int = 1,
         label_dict: dict = {"Tree": 0},
-        transforms=None,
         model=None,
+        transforms=None,
         existing_train_dataloader=None,
         existing_val_dataloader=None,
         config: DictConfig = None,
@@ -106,7 +112,7 @@ def __init__(
 
         # Add user supplied transforms
         if transforms is None:
-            self.transforms = dataset.get_transform
+            self.transforms = None
         else:
             self.transforms = transforms
 
@@ -136,6 +142,7 @@ def load_model(self, model_name="weecology/deepforest-tree", revision='main'):
         self.label_dict = loaded_model.label_dict
         self.model = loaded_model.model
         self.numeric_to_label_dict = loaded_model.numeric_to_label_dict
+
         # Set bird-specific settings if loading the bird model
         if model_name == "weecology/deepforest-bird":
             self.config.retinanet.score_thresh = 0.3
@@ -223,6 +230,7 @@ def create_trainer(self, logger=None, callbacks=[], **kwargs):
             # Disable validation, don't use trainer defaults
             limit_val_batches = 0
             num_sanity_val_steps = 0
+
         # Check for model checkpoint object
         checkpoint_types = [type(x).__qualname__ for x in callbacks]
         if 'ModelCheckpoint' in checkpoint_types:
@@ -264,12 +272,13 @@ def save_model(self, path):
     def load_dataset(self,
                      csv_file,
                      root_dir=None,
-                     augment=False,
                      shuffle=True,
-                     batch_size=1,
-                     train=False):
-        """Create a tree dataset for inference Csv file format is .csv file
-        with the columns "image_path", "xmin","ymin","xmax","ymax" for the
+                     transforms=None,
+                     augment=True,
+                     preload_images=False,
+                     batch_size=1):
+        """Create a dataset for inference or training. Csv file format is .csv
+        file with the columns "image_path", "xmin","ymin","xmax","ymax" for the
         image name and bounding box position. Image_path is the relative
         filename, not absolute path, which is in the root_dir directory. One
         bounding box per line.
@@ -277,16 +286,19 @@ def load_dataset(self,
         Args:
             csv_file: path to csv file
             root_dir: directory of images. If none, uses "image_dir" in config
-            augment: Whether to create a training dataset, this activates data augmentations
-
+            transforms: Albumentations transforms
+            batch_size: batch size
+            preload_images: if True, preload the images into memory
+            augment: if True, apply augmentations to the images
         Returns:
             ds: a pytorch dataset
         """
-        ds = dataset.TreeDataset(csv_file=csv_file,
+        ds = training.BoxDataset(csv_file=csv_file,
                                  root_dir=root_dir,
-                                 transforms=self.transforms(augment=augment),
+                                 transforms=transforms,
                                  label_dict=self.label_dict,
-                                 preload_images=self.config.train.preload_images)
+                                 augment=augment,
+                                 preload_images=preload_images)
         if len(ds) == 0:
             raise ValueError(
                 f"Dataset from {csv_file} is empty. Check CSV for valid entries and columns."
@@ -296,7 +308,7 @@ def load_dataset(self,
             ds,
             batch_size=batch_size,
             shuffle=shuffle,
-            collate_fn=utilities.collate_fn,
+            collate_fn=ds.collate_fn,
             num_workers=self.config.workers,
         )
 
@@ -314,7 +326,9 @@ def train_dataloader(self):
         loader = self.load_dataset(csv_file=self.config.train.csv_file,
                                    root_dir=self.config.train.root_dir,
                                    augment=True,
+                                   preload_images=self.config.train.preload_images,
                                    shuffle=True,
+                                   transforms=self.transforms,
                                    batch_size=self.config.batch_size)
 
         return loader
@@ -331,15 +345,19 @@ def val_dataloader(self):
 
         if self.existing_val_dataloader:
             return self.existing_val_dataloader
+
         if self.config.validation.csv_file is not None:
-            loader = self.load_dataset(csv_file=self.config.validation.csv_file,
-                                       root_dir=self.config.validation.root_dir,
-                                       augment=False,
-                                       shuffle=False,
-                                       batch_size=self.config.batch_size)
+            loader = self.load_dataset(
+                csv_file=self.config.validation.csv_file,
+                root_dir=self.config.validation.root_dir,
+                augment=False,
+                shuffle=False,
+                preload_images=self.config.validation.preload_images,
+                batch_size=self.config.batch_size)
+
         return loader
 
-    def predict_dataloader(self, ds):
+    def predict_dataloader(self, ds, batch_size=None):
         """Create a PyTorch dataloader for prediction.
 
         Args:
@@ -348,36 +366,29 @@ def predict_dataloader(self, ds):
         Returns:
             torch.utils.data.DataLoader: A dataloader object that can be used for prediction.
         """
+        if batch_size is None:
+            batch_size = self.config.batch_size
+        else:
+            batch_size = batch_size
         loader = torch.utils.data.DataLoader(ds,
-                                             batch_size=self.config.batch_size,
+                                             batch_size=batch_size,
                                              shuffle=False,
-                                             num_workers=self.config.workers)
-
+                                             num_workers=self.config.workers,
+                                             collate_fn=ds.collate_fn)
         return loader
 
     def predict_image(self,
                       image: typing.Optional[np.ndarray] = None,
-                      path: typing.Optional[str] = None,
-                      return_plot: bool = False,
-                      color: typing.Optional[tuple] = (0, 165, 255),
-                      thickness: int = 1):
+                      path: typing.Optional[str] = None):
         """Predict a single image with a deepforest model.
 
-        Deprecation warning: The 'return_plot', and related 'color' and 'thickness' arguments
-        are deprecated and will be removed in 2.0. Use visualize.plot_results on the result instead.
-
         Args:
             image: a float32 numpy array of a RGB with channels last format
             path: optional path to read image from disk instead of passing image arg
-            return_plot: return a plot of the image with predictions overlaid (deprecated)
-            color: color of the bounding box as a tuple of BGR color (deprecated)
-            thickness: thickness of the rectangle border line in px (deprecated)
 
         Returns:
             result: A pandas dataframe of predictions (Default)
-            img: The input with predictions overlaid (Optional)
         """
-
         # Ensure we are in eval mode
         self.model.eval()
 
@@ -397,31 +408,16 @@ def predict_image(self,
                           f"(image.astype('float32')")
             image = image.astype("float32")
 
-        image = torch.tensor(image, device=self.device).permute(2, 0, 1)
-        image = image / 255
-
         result = predict._predict_image_(model=self.model,
                                          image=image,
                                          path=path,
-                                         nms_thresh=self.config.nms_thresh,
-                                         return_plot=return_plot,
-                                         thickness=thickness,
-                                         color=color)
-
-        if return_plot:
-            # Add deprecated warning
-            warnings.warn(
-                "return_plot is deprecated and will be removed in 2.0. Use visualize.plot_results on the result instead."
-            )
+                                         nms_thresh=self.config.nms_thresh)
 
-            return result
+        #If there were no predictions, return None
+        if result is None:
+            return None
         else:
-            #If there were no predictions, return None
-            if result is None:
-                return None
-            else:
-                result["label"] = result.label.apply(
-                    lambda x: self.numeric_to_label_dict[x])
+            result["label"] = result.label.apply(lambda x: self.numeric_to_label_dict[x])
 
         if path is None:
             result = utilities.read_file(result)
@@ -434,85 +430,62 @@ def predict_image(self,
 
         return result
 
-    def predict_file(self, csv_file, root_dir, savedir=None, color=None, thickness=1):
-        """Create a dataset and predict entire annotation file Csv file format
+    def predict_file(self,
+                     csv_file,
+                     root_dir,
+                     crop_model=None,
+                     size=None,
+                     batch_size=None):
+        """Create a dataset and predict entire annotation file CSV file format
         is .csv file with the columns "image_path", "xmin","ymin","xmax","ymax"
         for the image name and bounding box position. Image_path is the
         relative filename, not absolute path, which is in the root_dir
         directory. One bounding box per line.
 
-        Deprecation warning: The return_plot argument is deprecated and will be removed in 2.0. Use visualize.plot_results on the result instead.
-
         Args:
             csv_file: path to csv file
             root_dir: directory of images. If none, uses "image_dir" in config
-            (deprecated) savedir: directory to save images with bounding boxes
-            (deprecated) color: color of the bounding box as a tuple of BGR color, e.g. orange annotations is (0, 165, 255)
-            (deprecated) thickness: thickness of the rectangle border line in px
-
+            crop_model: a deepforest.model.CropModel object to predict on crops
+            size: the size of the image to resize to. Optional, if not provided, the image is not resized.
         Returns:
             df: pandas dataframe with bounding boxes, label and scores for each image in the csv file
         """
 
-        df = utilities.read_file(csv_file)
-        ds = dataset.TreeDataset(csv_file=csv_file,
-                                 root_dir=root_dir,
-                                 transforms=None,
-                                 train=False)
-        dataloader = self.predict_dataloader(ds)
-
+        ds = prediction.FromCSVFile(csv_file=csv_file, root_dir=root_dir, size=size)
+        dataloader = self.predict_dataloader(ds, batch_size=batch_size)
         results = predict._dataloader_wrapper_(model=self,
+                                               crop_model=crop_model,
                                                trainer=self.trainer,
-                                               annotations=df,
                                                dataloader=dataloader,
-                                               root_dir=root_dir,
-                                               nms_thresh=self.config.nms_thresh,
-                                               color=color,
-                                               savedir=savedir,
-                                               thickness=thickness)
+                                               root_dir=root_dir)
 
         results.root_dir = root_dir
 
         return results
 
     def predict_tile(self,
-                     raster_path=None,
                      path=None,
                      image=None,
                      patch_size=400,
                      patch_overlap=0.05,
                      iou_threshold=0.15,
-                     in_memory=True,
-                     return_plot=False,
-                     mosaic=True,
-                     sigma=0.5,
-                     thresh=0.001,
-                     color=None,
-                     thickness=1,
-                     crop_model=None,
-                     crop_transform=None,
-                     crop_augment=False):
+                     dataloader_strategy="single",
+                     crop_model=None):
         """For images too large to input into the model, predict_tile cuts the
         image into overlapping windows, predicts trees on each window and
         reassambles into a single array.
 
         Args:
-            raster_path: [Deprecated] Use 'path' instead
-            path: Path to image on disk
-            image (array): Numpy image array in BGR channel order following openCV convention
+            path: Path or list of paths to images on disk. If a single string is provided, it will be converted to a list.
+            image (array): Numpy image array in BGR channel order following openCV convention. Not possible in combination with dataloader_strategy='batch'.
             patch_size: patch size for each window
             patch_overlap: patch overlap among windows
             iou_threshold: Minimum iou overlap among predictions between windows to be suppressed
-            in_memory: If true, the entire dataset is loaded into memory
-            mosaic: Return a single prediction dataframe (True) or a tuple of image crops and predictions (False)
-            sigma: variance of Gaussian function used in Gaussian Soft NMS
-            thresh: the score thresh used to filter bboxes after soft-nms performed
+            dataloader_strategy: "single", "batch", or "window".
+                - "Single" loads the entire image into memory and passes individual windows to GPU and cannot be parallelized.
+                - "batch" loads the entire image into GPU memory and creates views of an image as batch, requires in the entire tile to fit into GPU memory. CPU parallelization is possible for loading images.
+                - "window" loads only the desired window of the image from the raster dataset. Most memory efficient option, but cannot parallelize across windows.
             crop_model: a deepforest.model.CropModel object to predict on crops
-            crop_transform: a torchvision.transforms object to apply to crops
-            crop_augment: a boolean to apply augmentations to crops
-            return_plot: return a plot of the image with predictions overlaid (deprecated)
-            color: color of the bounding box as a tuple of BGR color (deprecated)
-            thickness: thickness of the rectangle border line in px (deprecated)
 
         Returns:
             pd.DataFrame or tuple: Predictions dataframe or (predictions, crops) tuple
@@ -520,132 +493,118 @@ def predict_tile(self,
         self.model.eval()
         self.model.nms_thresh = self.config.nms_thresh
 
-        # if 'raster_path' is used, give a deprecation warning and use 'path' instead
-        if raster_path is not None:
-            warnings.warn(
-                "The 'raster_path' argument is deprecated and will be removed in 2.0. Use 'path' instead.",
-                DeprecationWarning)
-            path = raster_path
-
-        # if more than one GPU present, use only a the first available gpu
-        if torch.cuda.device_count() > 1:
-            # Get available gpus and regenerate trainer
-            warnings.warn(
-                "More than one GPU detected. Using only the first GPU for predict_tile.")
-            self.config.devices = 1
-            self.create_trainer()
-
-        if (path is None) and (image is None):
-            raise ValueError(
-                "Both tile and tile_path are None. Either supply a path to a tile on disk, or read one into memory!"
-            )
-
-        if in_memory:
-            if path is None:
-                image = image
-            else:
-                image = rio.open(path).read()
-                image = np.moveaxis(image, 0, 2)
+        # Check if path or image is provided
+        if dataloader_strategy == "single":
+            if path is None and image is None:
+                raise ValueError(
+                    "Either path or image must be provided for single tile prediction")
 
-            ds = dataset.TileDataset(tile=image,
-                                     patch_overlap=patch_overlap,
-                                     patch_size=patch_size)
-        else:
+        if dataloader_strategy == "batch":
             if path is None:
-                raise ValueError("path is required if in_memory is False")
-
-            # Check for workers config when using out of memory dataset
-            if self.config.workers > 0:
                 raise ValueError(
-                    "workers must be 0 when using out-of-memory dataset (in_memory=False). Set config['workers']=0 and recreate trainer self.create_trainer()."
+                    "path argument must be provided when using dataloader_strategy='batch'"
                 )
 
-            ds = dataset.RasterDataset(raster_path=path,
+        # Convert single path to list for consistent handling
+        if isinstance(path, str):
+            paths = [path]
+        elif path is None:
+            paths = [None]
+        else:
+            paths = path
+
+        image_results = []
+        if dataloader_strategy in ["single", "window"]:
+            for image_path in paths:
+                if dataloader_strategy == "single":
+                    ds = prediction.SingleImage(path=image_path,
+                                                image=image,
+                                                patch_overlap=patch_overlap,
+                                                patch_size=patch_size)
+                else:
+                    # Check for workers config when using out of memory dataset
+                    if self.config.workers > 0:
+                        raise ValueError(
+                            "workers must be 0 when using out-of-memory dataset (dataloader_strategy='window'). Set config['workers']=0 and recreate trainer self.create_trainer()."
+                        )
+                    ds = prediction.TiledRaster(path=image_path,
+                                                patch_overlap=patch_overlap,
+                                                patch_size=patch_size)
+
+                batched_results = self.trainer.predict(self, self.predict_dataloader(ds))
+
+                # Flatten list from batched prediction
+                prediction_list = []
+                for batch in batched_results:
+                    for images in batch:
+                        prediction_list.append(images)
+                image_results.append(ds.postprocess(prediction_list))
+
+            results = pd.concat(image_results)
+
+        elif dataloader_strategy == "batch":
+            ds = prediction.MultiImage(paths=paths,
                                        patch_overlap=patch_overlap,
                                        patch_size=patch_size)
 
-        batched_results = self.trainer.predict(self, self.predict_dataloader(ds))
+            batched_results = self.trainer.predict(self, self.predict_dataloader(ds))
 
-        # Flatten list from batched prediction
-        results = []
-        for batch in batched_results:
-            for boxes in batch:
-                results.append(boxes)
-
-        if mosaic:
-            results = predict.mosiac(results,
-                                     ds.windows,
-                                     sigma=sigma,
-                                     thresh=thresh,
-                                     iou_threshold=iou_threshold)
-            results["label"] = results.label.apply(
-                lambda x: self.numeric_to_label_dict[x])
-            if path:
-                results["image_path"] = os.path.basename(path)
-            if return_plot:
-                # Add deprecated warning
-                warnings.warn("return_plot is deprecated and will be removed in 2.0. "
-                              "Use visualize.plot_results on the result instead.")
-                # Draw predictions on BGR
-                if path:
-                    tile = rio.open(path).read()
-                else:
-                    tile = image
-                drawn_plot = tile[:, :, ::-1]
-                drawn_plot = visualize.plot_predictions(tile,
-                                                        results,
-                                                        color=color,
-                                                        thickness=thickness)
-                return drawn_plot
-        else:
-            for df in results:
-                df["label"] = df.label.apply(lambda x: self.numeric_to_label_dict[x])
+            # Flatten list from batched prediction
+            prediction_list = []
+            for batch in batched_results:
+                for images in batch:
+                    prediction_list.append(images)
+            image_results.append(ds.postprocess(prediction_list))
+            results = pd.concat(image_results)
 
-            # TODO this is the 2nd time the crops are generated? Could be more efficient, but memory intensive
-            self.crops = []
-            if path is None:
-                image = image
-            else:
-                image = rio.open(path).read()
-                image = np.moveaxis(image, 0, 2)
-
-            for window in ds.windows:
-                crop = image[window.indices()]
-                self.crops.append(crop)
-
-            return list(zip(results, self.crops))
-
-        if crop_model is not None and not isinstance(crop_model, list):
-            crop_model = [crop_model]
-
-        if crop_model:
-            is_single_model = len(
-                crop_model) == 1  # Flag to check if only one model is passed
-            for i, crop_model in enumerate(crop_model):
-                results = predict._predict_crop_model_(crop_model=crop_model,
-                                                       results=results,
-                                                       raster_path=path,
-                                                       trainer=self.trainer,
-                                                       transform=crop_transform,
-                                                       augment=crop_augment,
-                                                       model_index=i,
-                                                       is_single_model=is_single_model)
+        else:
+            raise ValueError(f"Invalid dataloader_strategy: {dataloader_strategy}")
 
         if results.empty:
             warnings.warn("No predictions made, returning None")
             return None
 
-        if path is None:
-            warnings.warn(
-                "An image was passed directly to predict_tile, the results.root_dir attribute will be None in the output dataframe, to use visualize.plot_results, please assign results.root_dir = <directory name>"
+        # Perform mosaic for each image_path, or all if image_path is None
+        mosaic_results = []
+        if results["image_path"].isnull().all():
+            mosaic_results.append(predict.mosiac(results, iou_threshold=iou_threshold))
+        else:
+            for image_path in results["image_path"].unique():
+                image_results = results[results["image_path"] == image_path]
+                image_mosaic = predict.mosiac(image_results, iou_threshold=iou_threshold)
+                image_mosaic["image_path"] = image_path
+                mosaic_results.append(image_mosaic)
+
+        mosaic_results = pd.concat(mosaic_results)
+        mosaic_results["label"] = mosaic_results.label.apply(
+            lambda x: self.numeric_to_label_dict[x])
+
+        if paths[0] is not None:
+            root_dir = os.path.dirname(paths[0])
+        else:
+            print(
+                "No image path provided, root_dir will be None, since either images were directly provided or there were multiple image paths"
             )
-            results = utilities.read_file(results)
-
+            root_dir = None
+
+        if crop_model is not None:
+            cropmodel_results = []
+            for path in paths:
+                image_result = mosaic_results[mosaic_results.image_path ==
+                                              os.path.basename(path)]
+                if image_result.empty:
+                    continue
+                image_result.root_dir = os.path.dirname(path)
+                cropmodel_result = predict._crop_models_wrapper_(
+                    crop_model, self.trainer, image_result)
+                cropmodel_results.append(cropmodel_result)
+            cropmodel_results = pd.concat(cropmodel_results)
         else:
-            root_dir = os.path.dirname(path)
-            results = utilities.read_file(results, root_dir=root_dir)
+            cropmodel_results = mosaic_results
 
-        return results
+        formatted_results = utilities.read_file(cropmodel_results, root_dir=root_dir)
+
+        return formatted_results
 
     def training_step(self, batch, batch_idx):
         """Train on a loaded dataset."""
@@ -653,7 +612,7 @@ def training_step(self, batch, batch_idx):
         self.model.train()
 
         # allow for empty data if data augmentation is generated
-        path, images, targets = batch
+        images, targets, image_names = batch
         loss_dict = self.model.forward(images, targets)
 
         # sum of regression and classification loss
@@ -670,11 +629,7 @@ def training_step(self, batch, batch_idx):
 
     def validation_step(self, batch, batch_idx):
         """Evaluate a batch."""
-        try:
-            path, images, targets = batch
-        except:
-            print("Empty batch encountered, skipping")
-            return None
+        images, targets, image_names = batch
 
         # Get loss from "train" mode, but don't allow optimization. Torchvision has a 'train' mode that returns a loss and a 'eval' mode that returns predictions. The names are confusing, but this is the correct way to get the loss.
         self.model.train()
@@ -684,13 +639,19 @@ def validation_step(self, batch, batch_idx):
         # sum of regression and classification loss
         losses = sum([loss for loss in loss_dict.values()])
 
-        self.model.eval()
-        # Can we avoid another forward pass here? https://discuss.pytorch.org/t/how-to-get-losses-and-predictions-at-the-same-time/167223
-        preds = self.model.forward(images)
+        # Log loss
+        for key, value in loss_dict.items():
+            try:
+                self.log("val_{}".format(key), value, on_epoch=True)
+            except MisconfigurationException:
+                pass
+
+        # In eval model, return predictions to calculate prediction metrics
+        preds = self.model.eval()
+        with torch.no_grad():
+            preds = self.model.forward(images, targets)
 
-        # Calculate intersection-over-union
         if len(targets) > 0:
-            # Remove empty targets
             # Remove empty targets and corresponding predictions
             filtered_preds = []
             filtered_targets = []
@@ -702,30 +663,12 @@ def validation_step(self, batch, batch_idx):
             self.iou_metric.update(filtered_preds, filtered_targets)
             self.mAP_metric.update(filtered_preds, filtered_targets)
 
-        # Log loss
-        for key, value in loss_dict.items():
-            try:
-                self.log("val_{}".format(key), value, on_epoch=True)
-            except MisconfigurationException:
-                pass
-
-        for index, result in enumerate(preds):
-            # Skip empty predictions
-            if result["boxes"].shape[0] == 0:
-                self.predictions.append(
-                    pd.DataFrame({
-                        "image_path": [path[index]],
-                        "xmin": [None],
-                        "ymin": [None],
-                        "xmax": [None],
-                        "ymax": [None],
-                        "label": [None],
-                        "score": [None]
-                    }))
-            else:
-                boxes = visualize.format_geometry(result)
-                boxes["image_path"] = path[index]
-                self.predictions.append(boxes)
+        # Log the predictions if you want to use them for evaluation logs
+        for i, result in enumerate(preds):
+            formatted_result = format_geometry(result)
+            if formatted_result is not None:
+                formatted_result["image_path"] = image_names[i]
+                self.predictions.append(formatted_result)
 
         return losses
 
@@ -754,161 +697,142 @@ def calculate_empty_frame_accuracy(self, ground_df, predictions_df):
         if len(empty_images) == 0:
             return None
 
-        # Get non-empty predictions for empty images
-        non_empty_predictions = predictions_df.loc[predictions_df.xmin.notnull()]
-        predictions_for_empty_images = non_empty_predictions.loc[
-            non_empty_predictions.image_path.isin(empty_images)]
+        if predictions_df.empty:
+            # if there are empty ground truth, but no predictions, there is 100% accuracy empty frames
+            empty_accuracy = 1
+        else:
+            # Get non-empty predictions for empty images
+            non_empty_predictions = predictions_df.loc[predictions_df.xmin.notnull()]
+            predictions_for_empty_images = non_empty_predictions.loc[
+                non_empty_predictions.image_path.isin(empty_images)]
+
+            # Create prediction tensor - 1 if model predicted objects, 0 if predicted empty
+            predictions = torch.zeros(len(empty_images))
+            for index, image in enumerate(empty_images):
+                if len(predictions_for_empty_images.loc[
+                        predictions_for_empty_images.image_path == image]) > 0:
+                    predictions[index] = 1
+
+            # Ground truth tensor - all zeros since these are empty frames
+            gt = torch.zeros(len(empty_images))
+            predictions = torch.tensor(predictions)
+
+            # Calculate accuracy using metric
+            self.empty_frame_accuracy.update(predictions, gt)
+            empty_accuracy = self.empty_frame_accuracy.compute()
+
+        # Log empty frame accuracy
+        try:
+            self.log("empty_frame_accuracy", empty_accuracy)
+        except MisconfigurationException:
+            pass
 
-        # Create prediction tensor - 1 if model predicted objects, 0 if predicted empty
-        predictions = torch.zeros(len(empty_images))
-        for index, image in enumerate(empty_images):
-            if len(predictions_for_empty_images.loc[
-                    predictions_for_empty_images.image_path == image]) > 0:
-                predictions[index] = 1
+        return empty_accuracy
 
-        # Ground truth tensor - all zeros since these are empty frames
-        gt = torch.zeros(len(empty_images))
-        predictions = torch.tensor(predictions)
+    def log_epoch_metrics(self):
+        if len(self.iou_metric.groundtruth_labels) > 0:
+            output = self.iou_metric.compute()
+            try:
+                # This is a bug in lightning, it claims this is a warning but it is not. https://github.com/Lightning-AI/pytorch-lightning/pull/9733/files
+                self.log_dict(output)
+            except:
+                pass
 
-        # Calculate accuracy using metric
-        self.empty_frame_accuracy.update(predictions, gt)
-        empty_accuracy = self.empty_frame_accuracy.compute()
+            self.iou_metric.reset()
+            output = self.mAP_metric.compute()
 
-        return empty_accuracy
+            # Remove classes from output dict
+            output = {key: value for key, value in output.items() if not key == "classes"}
+            try:
+                self.log_dict(output)
+            except MisconfigurationException:
+                pass
+            self.mAP_metric.reset()
+
+        # Log empty frame accuracy if it has been updated
+        if self.empty_frame_accuracy._update_called:
+            empty_accuracy = self.empty_frame_accuracy.compute()
+
+            # Log empty frame accuracy
+            try:
+                self.log("empty_frame_accuracy", empty_accuracy)
+            except MisconfigurationException:
+                pass
 
     def on_validation_epoch_end(self):
-        """Compute metrics."""
+        """Compute metrics and predictions at the end of the validation
+        epoch."""
+        if self.trainer.sanity_checking:  # optional skip
+            return
 
-        #Evaluate every n epochs
         if self.current_epoch % self.config.validation.val_accuracy_interval == 0:
-
-            if len(self.predictions) == 0:
-                return None
+            if len(self.predictions) > 0:
+                self.predictions = pd.concat(self.predictions)
             else:
-                self.predictions_df = pd.concat(self.predictions)
-
-            # If non-empty ground truth, evaluate IoU and mAP
-            if len(self.iou_metric.groundtruth_labels) > 0:
-                output = self.iou_metric.compute()
-                try:
-                    # This is a bug in lightning, it claims this is a warning but it is not. https://github.com/Lightning-AI/pytorch-lightning/pull/9733/files
-                    self.log_dict(output)
-                except:
-                    pass
+                self.predictions = pd.DataFrame()
 
-                self.iou_metric.reset()
-                output = self.mAP_metric.compute()
-
-                # Remove classes from output dict
-                output = {
-                    key: value for key, value in output.items() if not key == "classes"
-                }
-                try:
-                    self.log_dict(output)
-                except MisconfigurationException:
-                    pass
-                self.mAP_metric.reset()
+            results = self.evaluate(self.config.validation.csv_file,
+                                    root_dir=self.config.validation.root_dir,
+                                    size=self.config.validation.size,
+                                    predictions=self.predictions)
 
-            #Create a geospatial column
-            ground_df = utilities.read_file(self.config.validation.csv_file)
-            ground_df["label"] = ground_df.label.apply(lambda x: self.label_dict[x])
+            # Log epoch metrics
+            self.log_epoch_metrics()
+            self.__evaluation_logs__(results)
 
-            # If there are empty frames, evaluate empty frame accuracy separately
-            empty_accuracy = self.calculate_empty_frame_accuracy(
-                ground_df, self.predictions_df)
-
-            if empty_accuracy is not None:
-                try:
-                    self.log("empty_frame_accuracy", empty_accuracy)
-                except:
-                    pass
-
-            # Remove empty predictions from the rest of the evaluation
-            self.predictions_df = self.predictions_df.loc[
-                self.predictions_df.xmin.notnull()]
-            if self.predictions_df.empty:
-                warnings.warn("No predictions made, skipping detection evaluation")
-                geom_type = utilities.determine_geometry_type(ground_df)
-                if geom_type == "box":
-                    result = {
-                        "box_recall": 0,
-                        "box_precision": 0,
-                        "class_recall": pd.DataFrame()
-                    }
-            else:
-                # Remove empty ground truth
-                ground_df = ground_df.loc[~(ground_df.xmin == 0)]
-                if ground_df.empty:
-                    results = {}
-                    results["empty_frame_accuracy"] = empty_accuracy
-                    return results
-
-                results = evaluate_iou.__evaluate_wrapper__(
-                    predictions=self.predictions_df,
-                    ground_df=ground_df,
-                    iou_threshold=self.config.validation.iou_threshold,
-                    numeric_to_label_dict=self.numeric_to_label_dict)
-
-                if empty_accuracy is not None:
-                    results["empty_frame_accuracy"] = empty_accuracy
-
-                # Log each key value pair of the results dict
-                if not results["class_recall"] is None:
-                    for key, value in results.items():
-                        if key in ["class_recall"]:
-                            for index, row in value.iterrows():
-                                try:
-                                    self.log(
-                                        "{}_Recall".format(
-                                            self.numeric_to_label_dict[row["label"]]),
-                                        row["recall"])
-                                    self.log(
-                                        "{}_Precision".format(
-                                            self.numeric_to_label_dict[row["label"]]),
-                                        row["precision"])
-                                except MisconfigurationException:
-                                    pass
-                        elif key in ["predictions", "results"]:
-                            # Don't log dataframes of predictions or IoU results per epoch
-                            pass
-                        else:
-                            try:
-                                self.log(key, value)
-                            except MisconfigurationException:
-                                pass
+            return results
 
     def predict_step(self, batch, batch_idx):
-        batch_results = self.model(batch)
+        """Predict a batch of images with the deepforest model. If batch is a
+        list, concatenate the images, predict and then split the results,
+        useful for main.predict_tile.
 
-        results = []
-        for result in batch_results:
-            boxes = visualize.format_boxes(result)
-            results.append(boxes)
-        return results
+        Args:
+            batch (torch.Tensor or np.ndarray): A batch of images with shape (B, C, H, W).
+            batch_idx (int): The index of the batch.
+
+        Returns:
+        """
+        split_results = False
+        # If batch is a list, concatenate the images, predict and then split the results
+        if isinstance(batch, list):
+            original_list_length = len(batch)
+            combined_batch = torch.cat(batch, dim=0)
+            split_results = True
+        else:
+            combined_batch = batch
+
+        batch_results = self.model(combined_batch)
+
+        # If batch is a list, split the results
+        if split_results:
+            results = []
+            batch_size = len(batch_results) // original_list_length
+            for i in range(original_list_length):
+                start_idx = i * batch_size
+                end_idx = start_idx + batch_size
+                results.append(batch_results[start_idx:end_idx])
+            return results
+        else:
+            return batch_results
 
     def predict_batch(self, images, preprocess_fn=None):
         """Predict a batch of images with the deepforest model.
 
         Args:
-            images (torch.Tensor or np.ndarray): A batch of images with shape (B, C, H, W) or (B, H, W, C).
+            images (torch.Tensor or np.ndarray): A batch of images with shape (B, C, H, W).
             preprocess_fn (callable, optional): A function to preprocess images before prediction.
                 If None, assumes images are preprocessed.
 
         Returns:
             List[pd.DataFrame]: A list of dataframes with predictions for each image.
         """
-
         self.model.eval()
 
         #conver to tensor if input is array
         if isinstance(images, np.ndarray):
             images = torch.tensor(images, device=self.device)
 
-        #check input format
-        if images.dim() == 4 and images.shape[-1] == 3:
-            #Convert channels_last (B, H, W, C) to channels_first (B, C, H, W)
-            images = images.permute(0, 3, 1, 2)
-
         #appy preprocessing if available
         if preprocess_fn:
             images = preprocess_fn(images)
@@ -918,7 +842,13 @@ def predict_batch(self, images, preprocess_fn=None):
             predictions = self.predict_step(images, 0)
 
         #convert predictions to dataframes
-        results = [utilities.read_file(pred) for pred in predictions if pred is not None]
+        results = []
+        for pred in predictions:
+            if len(pred["boxes"]) == 0:
+                continue
+            geom_type = utilities.determine_geometry_type(pred)
+            result = utilities.format_geometry(pred, geom_type=geom_type)
+            results.append(utilities.read_file(result))
 
         return results
 
@@ -982,21 +912,39 @@ def configure_optimizers(self):
         else:
             return optimizer
 
-    def evaluate(self, csv_file, iou_threshold=None):
+    def evaluate(self,
+                 csv_file,
+                 iou_threshold=None,
+                 root_dir=None,
+                 size=None,
+                 batch_size=None,
+                 predictions=None):
         """Compute intersection-over-union and precision/recall for a given
         iou_threshold.
 
         Args:
             csv_file: location of a csv file with columns "name","xmin","ymin","xmax","ymax","label"
             iou_threshold: float [0,1] intersection-over-union threshold for true positive
+            batch_size: int, the batch size to use for prediction. If None, uses the batch size of the model.
+            size: int, the size to resize the images to. If None, no resizing is done.
+            predictions: list of predictions to use for evaluation. If None, predictions are generated from the model.
 
         Returns:
             dict: Results dictionary containing precision, recall and other metrics
         """
+        self.model.eval()
         ground_df = utilities.read_file(csv_file)
         ground_df["label"] = ground_df.label.apply(lambda x: self.label_dict[x])
-        predictions = self.predict_file(csv_file=csv_file,
-                                        root_dir=os.path.dirname(csv_file))
+
+        if root_dir is None:
+            root_dir = os.path.dirname(csv_file)
+
+        if predictions is None:
+            # Get the predict dataloader and use predict_batch
+            predictions = self.predict_file(csv_file,
+                                            root_dir,
+                                            size=size,
+                                            batch_size=batch_size)
 
         if iou_threshold is None:
             iou_threshold = self.config.validation.iou_threshold
@@ -1007,4 +955,51 @@ def evaluate(self, csv_file, iou_threshold=None):
             iou_threshold=iou_threshold,
             numeric_to_label_dict=self.numeric_to_label_dict)
 
+        # empty frame accuracy
+        empty_accuracy = self.calculate_empty_frame_accuracy(ground_df, predictions)
+        results["empty_frame_accuracy"] = empty_accuracy
+
+        self.__evaluation_logs__(results)
+
         return results
+
+    def __evaluation_logs__(self, results):
+        """Log metrics from evaluation results."""
+        # Log metrics
+        for key, value in results.items():
+            if type(value) in [pd.DataFrame, gpd.GeoDataFrame]:
+                pass
+            elif value is None:
+                pass
+            else:
+                try:
+                    self.log(key, value)
+                except MisconfigurationException:
+                    pass
+
+        # Log each key value pair of the results dict
+        if not results["class_recall"] is None:
+            for key, value in results.items():
+                if key in ["class_recall"]:
+                    for index, row in value.iterrows():
+                        try:
+                            self.log(
+                                "{}_Recall".format(
+                                    self.numeric_to_label_dict[row["label"]]),
+                                row["recall"])
+                            self.log(
+                                "{}_Precision".format(
+                                    self.numeric_to_label_dict[row["label"]]),
+                                row["precision"])
+                        except MisconfigurationException:
+                            pass
+                elif key in ["predictions", "results", "ground_df"]:
+                    # Don't log dataframes of predictions or IoU results per epoch
+                    pass
+                elif value is None:
+                    pass
+                else:
+                    try:
+                        self.log(key, value)
+                    except MisconfigurationException:
+                        pass
diff --git a/src/deepforest/model.py b/src/deepforest/model.py
index 277e1de93..67407a9a0 100644
--- a/src/deepforest/model.py
+++ b/src/deepforest/model.py
@@ -8,7 +8,6 @@
 from torchvision.datasets import ImageFolder
 import numpy as np
 import rasterio
-from torch.utils.data import Dataset
 import torch.nn.functional as F
 import cv2
 
@@ -33,9 +32,6 @@ def __init__(self, config):
         # Check for required properties and formats
         self.config = config
 
-        # Check input output format:
-        self.check_model()
-
     def create_model(self):
         """This function converts a deepforest config file into a model.
 
@@ -114,30 +110,19 @@ def __init__(self,
         self.num_classes = num_classes
         self.num_workers = num_workers
         self.label_dict = label_dict
+
         if label_dict is not None:
             self.numeric_to_label_dict = {v: k for k, v in label_dict.items()}
         else:
             self.numeric_to_label_dict = None
         self.save_hyperparameters()
-
-        if num_classes is not None:
-            if model is None:
-                self.model = simple_resnet_50(num_classes=num_classes)
+        if model is None:
+            if num_classes is not None:
+                self.create_model(num_classes)
             else:
-                self.model = model
-
-            self.accuracy = torchmetrics.Accuracy(average='none',
-                                                  num_classes=num_classes,
-                                                  task="multiclass")
-            self.total_accuracy = torchmetrics.Accuracy(num_classes=num_classes,
-                                                        task="multiclass")
-            self.precision_metric = torchmetrics.Precision(num_classes=num_classes,
-                                                           task="multiclass")
-            self.metrics = torchmetrics.MetricCollection({
-                "Class Accuracy": self.accuracy,
-                "Accuracy": self.total_accuracy,
-                "Precision": self.precision_metric
-            })
+                print(
+                    "No model created if model or num_classes is not provided, use load_from_disk to create a model from data directory."
+                )
         else:
             self.model = model
 
@@ -145,6 +130,23 @@ def __init__(self,
         self.batch_size = batch_size
         self.lr = lr
 
+    def create_model(self, num_classes):
+        """Create a model with the given number of classes."""
+        self.accuracy = torchmetrics.Accuracy(average='none',
+                                              num_classes=num_classes,
+                                              task="multiclass")
+        self.total_accuracy = torchmetrics.Accuracy(num_classes=num_classes,
+                                                    task="multiclass")
+        self.precision_metric = torchmetrics.Precision(num_classes=num_classes,
+                                                       task="multiclass")
+        self.metrics = torchmetrics.MetricCollection({
+            "Class Accuracy": self.accuracy,
+            "Accuracy": self.total_accuracy,
+            "Precision": self.precision_metric
+        })
+
+        self.model = simple_resnet_50(num_classes=num_classes)
+
     def on_save_checkpoint(self, checkpoint):
         checkpoint['label_dict'] = self.label_dict
 
@@ -173,15 +175,29 @@ def on_load_checkpoint(self, checkpoint):
         self.label_dict = checkpoint['label_dict']
         self.numeric_to_label_dict = {v: k for k, v in self.label_dict.items()}
 
-    def load_from_disk(self, train_dir, val_dir):
+    def load_from_disk(self, train_dir, val_dir, recreate_model=False):
+        """Load the training and validation datasets from disk.
+
+        Args:
+            train_dir (str): The directory containing the training dataset.
+            val_dir (str): The directory containing the validation dataset.
+            recreate_model (bool): Whether to recreate the model with the new number of classes.
+
+        Returns:
+            None
+        """
         self.train_ds = ImageFolder(root=train_dir,
                                     transform=self.get_transform(augment=True))
         self.val_ds = ImageFolder(root=val_dir,
                                   transform=self.get_transform(augment=False))
         self.label_dict = self.train_ds.class_to_idx
+
         # Create a reverse mapping from numeric indices to class labels
         self.numeric_to_label_dict = {v: k for k, v in self.label_dict.items()}
 
+        if recreate_model:
+            self.create_model(num_classes=len(self.label_dict))
+
     def get_transform(self, augment):
         """Returns the data transformation pipeline for the model.
 
diff --git a/src/deepforest/predict.py b/src/deepforest/predict.py
index 32b7837d4..d2e4db0d9 100644
--- a/src/deepforest/predict.py
+++ b/src/deepforest/predict.py
@@ -7,17 +7,15 @@
 from torchvision.ops import nms
 import typing
 
-from deepforest import visualize, dataset
+from deepforest import utilities
+from deepforest.datasets import cropmodel
 from deepforest.utilities import read_file
 
 
 def _predict_image_(model,
                     image: typing.Optional[np.ndarray] = None,
                     path: typing.Optional[str] = None,
-                    nms_thresh: float = 0.15,
-                    return_plot: bool = False,
-                    thickness: int = 1,
-                    color: typing.Optional[tuple] = (0, 165, 255)):
+                    nms_thresh: float = 0.15):
     """Predict a single image with a deepforest model.
 
     Args:
@@ -25,13 +23,14 @@ def _predict_image_(model,
         image: a tensor of shape (channels, height, width)
         path: optional path to read image from disk instead of passing image arg
         nms_thresh: Non-max suppression threshold, see config.nms_thresh
-        return_plot: Return image with plotted detections
-        thickness: thickness of the rectangle border line in px
-        color: color of the bounding box as a tuple of BGR color, e.g. orange annotations is (0, 165, 255)
     Returns:
         df: A pandas dataframe of predictions (Default)
         img: The input with predictions overlaid (Optional)
     """
+
+    image = torch.tensor(image).permute(2, 0, 1)
+    image = image / 255
+
     with torch.no_grad():
         prediction = model(image.unsqueeze(0))
 
@@ -39,53 +38,54 @@ def _predict_image_(model,
     if len(prediction[0]["boxes"]) == 0:
         return None
 
-    df = visualize.format_boxes(prediction[0])
-    df = across_class_nms(df, iou_threshold=nms_thresh)
+    df = utilities.format_boxes(prediction[0])
 
-    if return_plot:
-        # Bring to gpu
-        image = image.cpu()
+    if df.label.nunique() > 1:
+        df = across_class_nms(df, iou_threshold=nms_thresh)
 
-        # Cv2 likes no batch dim, BGR image and channels last, 0-255
-        image = np.array(image.squeeze(0))
-        image = np.rollaxis(image, 0, 3)
-        image = image[:, :, ::-1] * 255
-        image = image.astype("uint8")
-        image = visualize.plot_predictions(image, df, color=color, thickness=thickness)
-
-        return image
-    else:
-        if path:
-            df["image_path"] = os.path.basename(path)
+    # Add image path if provided
+    if path is not None:
+        df["image_path"] = os.path.basename(path)
 
     return df
 
 
-def mosiac(boxes, windows, sigma=0.5, thresh=0.001, iou_threshold=0.1):
-    # transform the coordinates to original system
-    for index, _ in enumerate(boxes):
-        xmin, ymin, xmax, ymax = windows[index].getRect()
-        boxes[index].xmin += xmin
-        boxes[index].xmax += xmin
-        boxes[index].ymin += ymin
-        boxes[index].ymax += ymin
+def transform_coordinates(boxes):
+    """Transform box coordinates from window space to original image space.
 
-    predicted_boxes = pd.concat(boxes)
-    print(
-        f"{predicted_boxes.shape[0]} predictions in overlapping windows, applying non-max suppression"
-    )
-    # move prediciton to tensor
-    boxes = torch.tensor(predicted_boxes[["xmin", "ymin", "xmax", "ymax"]].values,
-                         dtype=torch.float32)
-    scores = torch.tensor(predicted_boxes.score.values, dtype=torch.float32)
-    labels = predicted_boxes.label.values
-    # Performs non-maximum suppression (NMS) on the boxes according to
-    # their intersection-over-union (IoU).
-    bbox_left_idx = nms(boxes=boxes, scores=scores, iou_threshold=iou_threshold)
+    Args:
+        boxes: DataFrame of predictions with xmin, ymin, xmax, ymax, window_xmin, window_ymin columns
 
+    Returns:
+        DataFrame with transformed coordinates
+    """
+    boxes = boxes.copy()
+    boxes["xmin"] += boxes["window_xmin"]
+    boxes["xmax"] += boxes["window_xmin"]
+    boxes["ymin"] += boxes["window_ymin"]
+    boxes["ymax"] += boxes["window_ymin"]
+
+    return boxes
+
+
+def apply_nms(boxes, scores, labels, iou_threshold):
+    """Apply non-maximum suppression to boxes.
+
+    Args:
+        boxes: tensor of shape (N, 4) containing box coordinates
+        scores: tensor of shape (N,) containing confidence scores
+        labels: array of shape (N,) containing labels
+        iou_threshold: IoU threshold for NMS
+
+    Returns:
+        DataFrame with filtered boxes
+    """
+    bbox_left_idx = nms(boxes=boxes, scores=scores, iou_threshold=iou_threshold)
     bbox_left_idx = bbox_left_idx.numpy()
-    new_boxes, new_labels, new_scores = boxes[bbox_left_idx].type(
-        torch.int), labels[bbox_left_idx], scores[bbox_left_idx]
+
+    new_boxes = boxes[bbox_left_idx].type(torch.int)
+    new_labels = labels[bbox_left_idx]
+    new_scores = scores[bbox_left_idx]
 
     # Recreate box dataframe
     image_detections = np.concatenate([
@@ -95,12 +95,36 @@ def mosiac(boxes, windows, sigma=0.5, thresh=0.001, iou_threshold=0.1):
     ],
                                       axis=1)
 
-    mosaic_df = pd.DataFrame(image_detections,
-                             columns=["xmin", "ymin", "xmax", "ymax", "label", "score"])
+    return pd.DataFrame(image_detections,
+                        columns=["xmin", "ymin", "xmax", "ymax", "label", "score"])
+
+
+def mosiac(predictions, iou_threshold=0.1):
+    """Mosaic predictions from overlapping windows.
+
+    Args:
+        predictions: A pandas dataframe containing predictions from overlapping windows from a single image.
+        iou_threshold: The IoU threshold for non-max suppression.
+
+    Returns:
+        A pandas dataframe of predictions.
+    """
+    predicted_boxes = transform_coordinates(predictions)
+    print(
+        f"{predicted_boxes.shape[0]} predictions in overlapping windows, applying non-max suppression"
+    )
+
+    # Convert to tensors
+    boxes = torch.tensor(predicted_boxes[["xmin", "ymin", "xmax", "ymax"]].values,
+                         dtype=torch.float32)
+    scores = torch.tensor(predicted_boxes.score.values, dtype=torch.float32)
+    labels = predicted_boxes.label.values
 
-    print(f"{mosaic_df.shape[0]} predictions kept after non-max suppression")
+    # Apply NMS
+    filtered_boxes = apply_nms(boxes, scores, labels, iou_threshold)
+    print(f"{filtered_boxes.shape[0]} predictions kept after non-max suppression")
 
-    return mosaic_df
+    return filtered_boxes
 
 
 def across_class_nms(predicted_boxes, iou_threshold=0.15):
@@ -133,34 +157,19 @@ def across_class_nms(predicted_boxes, iou_threshold=0.15):
     return new_df
 
 
-def _dataloader_wrapper_(model,
-                         trainer,
-                         dataloader,
-                         root_dir,
-                         annotations,
-                         nms_thresh,
-                         savedir=None,
-                         color=None,
-                         thickness=1):
-    """Create a dataset and predict entire annotation file.
-
-    Csv file format is .csv file with the columns "image_path", "xmin","ymin","xmax","ymax" for the image name and bounding box position.
-    Image_path is the relative filename, not absolute path, which is in the root_dir directory. One bounding box per line.
+def _dataloader_wrapper_(model, trainer, dataloader, root_dir, crop_model):
+    """
 
     Args:
         model: deepforest.main object
         trainer: a pytorch lightning trainer object
         dataloader: pytorch dataloader object
         root_dir: directory of images. If none, uses "image_dir" in config
-        annotations: a pandas dataframe of annotations
         nms_thresh: Non-max suppression threshold, see config.nms_thresh
-        savedir: Optional. Directory to save image plots.
-        color: color of the bounding box as a tuple of BGR color, e.g. orange annotations is (0, 165, 255)
-        thickness: thickness of the rectangle border line in px
+        crop_model: Optional. A list of crop models to be used for prediction.
     Returns:
         results: pandas dataframe with bounding boxes, label and scores for each image in the csv file
     """
-    paths = annotations.image_path.unique()
     batched_results = trainer.predict(model, dataloader)
 
     # Flatten list from batched prediction
@@ -169,28 +178,32 @@ def _dataloader_wrapper_(model,
         for images in batch:
             prediction_list.append(images)
 
-    results = []
-    for index, prediction in enumerate(prediction_list):
-        # If there is more than one class, apply NMS Loop through images and apply cross
-        if len(prediction.label.unique()) > 1:
-            prediction = across_class_nms(prediction, iou_threshold=nms_thresh)
-
-        prediction["image_path"] = paths[index]
-        results.append(prediction)
+    # Postprocess predictions
+    results = dataloader.dataset.postprocess(prediction_list)
 
-    results = pd.concat(results, ignore_index=True)
     if results.empty:
-        results["geometry"] = None
         return results
 
-    results = read_file(results, root_dir)
+    # Apply across class NMS for each image
+    processed_results = []
+    for image_path in results.image_path.unique():
+        image_results = results[results.image_path == image_path].copy()
 
-    if savedir:
-        visualize.plot_prediction_dataframe(results,
-                                            root_dir=root_dir,
-                                            savedir=savedir,
-                                            color=color,
-                                            thickness=thickness)
+        if crop_model:
+            # Flag to check if only one model is passed
+            is_single_model = len(crop_model) == 1
+
+            for i, crop_model in enumerate(crop_model):
+                crop_model_results = _predict_crop_model_(crop_model=crop_model,
+                                                          results=image_results,
+                                                          path=image_path,
+                                                          trainer=trainer,
+                                                          model_index=i,
+                                                          is_single_model=is_single_model)
+
+            processed_results.append(crop_model_results)
+
+    results = read_file(results, root_dir)
 
     return results
 
@@ -198,7 +211,7 @@ def _dataloader_wrapper_(model,
 def _predict_crop_model_(crop_model,
                          trainer,
                          results,
-                         raster_path,
+                         path,
                          transform=None,
                          augment=False,
                          model_index=0,
@@ -209,7 +222,7 @@ def _predict_crop_model_(crop_model,
         crop_model: The crop model to be used for prediction.
         trainer: The PyTorch Lightning trainer object for prediction.
         results: The results dataframe to store the predicted labels and scores.
-        raster_path: The path to the raster file.
+        path: The path to the raster file.
         is_single_model: Boolean flag to determine column naming.
 
     Returns:
@@ -224,11 +237,10 @@ def _predict_crop_model_(crop_model,
     results = results[results.ymin != results.ymax]
 
     # Create dataset
-    bounding_box_dataset = dataset.BoundingBoxDataset(
-        results,
-        root_dir=os.path.dirname(raster_path),
-        transform=transform,
-        augment=augment)
+    bounding_box_dataset = cropmodel.BoundingBoxDataset(results,
+                                                        root_dir=os.path.dirname(path),
+                                                        transform=transform,
+                                                        augment=augment)
 
     # Create dataloader
     crop_dataloader = crop_model.predict_dataloader(bounding_box_dataset)
@@ -259,3 +271,31 @@ def _predict_crop_model_(crop_model,
     results[score_column] = score
 
     return results
+
+
+def _crop_models_wrapper_(crop_models, trainer, results, transform=None, augment=False):
+    if crop_models is not None and not isinstance(crop_models, list):
+        crop_models = [crop_models]
+
+    # Run predictions
+    crop_results = []
+    if crop_models:
+        is_single_model = len(
+            crop_models) == 1  # Flag to check if only one model is passed
+        for i, crop_model in enumerate(crop_models):
+            for path in results.image_path.unique():
+                path = os.path.join(results.root_dir, path)
+                crop_result = _predict_crop_model_(crop_model=crop_model,
+                                                   results=results,
+                                                   path=path,
+                                                   trainer=trainer,
+                                                   model_index=i,
+                                                   transform=transform,
+                                                   augment=augment,
+                                                   is_single_model=is_single_model)
+                crop_results.append(crop_result)
+
+    # Concatenate results
+    crop_results = pd.concat(crop_results)
+
+    return crop_results
diff --git a/src/deepforest/preprocess.py b/src/deepforest/preprocess.py
index aa3aa95e0..04657f113 100644
--- a/src/deepforest/preprocess.py
+++ b/src/deepforest/preprocess.py
@@ -11,7 +11,6 @@
 from PIL import Image
 import torch
 import warnings
-import rasterio
 import geopandas as gpd
 from deepforest.utilities import read_file, determine_geometry_type
 from shapely import geometry
@@ -38,7 +37,7 @@ def compute_windows(numpy_image, patch_size, patch_overlap):
     """Create a sliding window object from a raster tile.
 
     Args:
-        numpy_image (array): Raster object as numpy array to cut into crops
+        numpy_image (array): Raster object as numpy array to cut into crops, channels first
 
     Returns:
         windows (list): a sliding windows object
@@ -47,9 +46,14 @@ def compute_windows(numpy_image, patch_size, patch_overlap):
     if patch_overlap > 1:
         raise ValueError("Patch overlap {} must be between 0 - 1".format(patch_overlap))
 
+    # Check that image is channels first
+    if numpy_image.shape[0] != 3:
+        raise ValueError("Image is not channels first, shape is {}".format(
+            numpy_image.shape))
+
     # Generate overlapping sliding windows
     windows = slidingwindow.generate(numpy_image,
-                                     slidingwindow.DimOrder.HeightWidthChannel,
+                                     slidingwindow.DimOrder.ChannelHeightWidth,
                                      patch_size, patch_overlap)
 
     return (windows)
@@ -118,7 +122,8 @@ def save_crop(base_dir, image_name, index, crop):
     if not os.path.exists(base_dir):
         os.makedirs(base_dir)
 
-    # Convert NumPy array to PIL image
+    # Convert NumPy array to PIL image, PIL expects channels last
+    crop = np.moveaxis(crop, 0, 2)
     im = Image.fromarray(crop)
 
     # Extract the basename of the image
@@ -152,7 +157,6 @@ def split_raster(annotations_file=None,
         path_to_raster: (str): Path to a tile that can be read by rasterio on disk
         annotations_file (str or pd.DataFrame): A pandas dataframe or path to annotations csv file to transform to cropped images. In the format -> image_path, xmin, ymin, xmax, ymax, label. If None, allow_empty is ignored and the function will only return the cropped images.
         save_dir (str): Directory to save images
-        base_dir (str): Directory to save images
         patch_size (int): Maximum dimensions of square window
         patch_overlap (float): Percent of overlap among windows 0->1
         allow_empty: If True, include images with no annotations
@@ -164,12 +168,6 @@ def split_raster(annotations_file=None,
         If annotations_file is provided, a pandas dataframe with annotations file for training. A copy of this file is written to save_dir as a side effect.
         If not, a list of filenames of the cropped images.
     """
-    # Set deprecation warning for base_dir and set to save_dir
-    if base_dir:
-        warnings.warn(
-            "base_dir argument will be deprecated in 2.0. The naming is confusing, the rest of the API uses 'save_dir' to refer to location of images. Please use 'save_dir' argument.",
-            DeprecationWarning)
-        save_dir = base_dir
 
     # Load raster as image
     if numpy_image is None and path_to_raster is None:
@@ -177,39 +175,39 @@ def split_raster(annotations_file=None,
                       "from existing in-memory numpy object, as numpy_image=")
 
     if path_to_raster:
-        numpy_image = rasterio.open(path_to_raster).read()
-        numpy_image = np.moveaxis(numpy_image, 0, 2)
+        numpy_image = Image.open(path_to_raster)
+        numpy_image = np.array(numpy_image)
     else:
         if image_name is None:
             raise IOError("If passing a numpy_image, please also specify an image_name"
                           " to match the column in the annotation.csv file")
 
-    # Confirm that raster is H x W x C, if not, convert, assuming image is wider/taller than channels
-    if numpy_image.shape[0] < numpy_image.shape[-1]:
-        warnings.warn(
-            "Input rasterio had shape {}, assuming channels first. Converting to channels last"
-            .format(numpy_image.shape), UserWarning)
-        numpy_image = np.moveaxis(numpy_image, 0, 2)
+    # If its channels last H x W x C, convert to channels first C x H x W
+    if numpy_image.shape[2] in [3, 4]:
+        print(
+            "Image shape is {}, assuming this is channels last, converting to channels first"
+            .format(numpy_image.shape[2]))
+        numpy_image = numpy_image.transpose(2, 0, 1)
 
     # Check that it's 3 bands
     bands = numpy_image.shape[2]
     if not bands == 3:
         warnings.warn(
-            "Input rasterio had non-3 band shape of {}, ignoring "
-            "alpha channel".format(numpy_image.shape), UserWarning)
+            "Input image had non-3 band shape of {}, selecting first 3 bands".format(
+                numpy_image.shape), UserWarning)
         try:
-            numpy_image = numpy_image[:, :, :3].astype("uint8")
+            numpy_image = numpy_image[:3, :, :].astype("uint8")
         except:
             raise IOError("Input file {} has {} bands. "
                           "DeepForest only accepts 3 band RGB rasters in the order "
-                          "(height, width, channels). "
+                          "(channels, height, width). "
                           "Selecting the first three bands failed, "
                           "please reshape manually. If the image was cropped and "
-                          "saved as a .jpg, please ensure that no alpha channel "
+                          "saved, please ensure that no alpha channel "
                           "was used.".format(path_to_raster, bands))
 
     # Check that patch size is greater than image size
-    height, width = numpy_image.shape[0], numpy_image.shape[1]
+    height, width = numpy_image.shape[1], numpy_image.shape[2]
     if any(np.array([height, width]) < patch_size):
         raise ValueError("Patch size of {} is larger than the image dimensions {}".format(
             patch_size, [height, width]))
@@ -274,7 +272,7 @@ def validate_annotations(annotations, numpy_image, path_to_raster):
             )
 
     if hasattr(annotations, 'total_bounds'):
-        raster_height, raster_width = numpy_image.shape[0], numpy_image.shape[1]
+        raster_height, raster_width = numpy_image.shape[1], numpy_image.shape[2]
         ann_bounds = annotations.total_bounds
 
         if (ann_bounds[0] < -raster_width * 0.1 or  # xmin
diff --git a/src/deepforest/utilities.py b/src/deepforest/utilities.py
index c0bd740ba..655542feb 100644
--- a/src/deepforest/utilities.py
+++ b/src/deepforest/utilities.py
@@ -6,16 +6,12 @@
 import rasterio
 import shapely
 import xmltodict
-import yaml
 from tqdm import tqdm
 from typing import Union
 
 from PIL import Image
 from deepforest import _ROOT
 import json
-import urllib.request
-from huggingface_hub import hf_hub_download
-from huggingface_hub.errors import RevisionNotFoundError, HfHubHTTPError
 from omegaconf import DictConfig, OmegaConf
 
 
@@ -288,6 +284,58 @@ def determine_geometry_type(df):
     return geometry_type
 
 
+def format_geometry(predictions, scores=True, geom_type=None):
+    """Format a retinanet prediction into a pandas dataframe for a batch of images
+    Args:
+        predictions: a list of dictionaries with keys 'boxes' and 'labels' coming from a retinanet
+        scores: Whether boxes come with scores, during prediction, or without scores, as in during training.
+    Returns:
+        df: a pandas dataframe
+        None if the dataframe is empty
+    """
+
+    # Detect geometry type
+    if geom_type is None:
+        geom_type = determine_geometry_type(predictions)
+
+    if geom_type == "box":
+        df = format_boxes(predictions, scores=scores)
+        if df is None:
+            return None
+
+    elif geom_type == "polygon":
+        raise ValueError("Polygon predictions are not yet supported for formatting")
+    elif geom_type == "point":
+        raise ValueError("Point predictions are not yet supported for formatting")
+
+    return df
+
+
+def format_boxes(prediction, scores=True):
+    """Format a retinanet prediction into a pandas dataframe for a single
+    image.
+
+    Args:
+        prediction: a dictionary with keys 'boxes' and 'labels' coming from a retinanet
+        scores: Whether boxes come with scores, during prediction, or without scores, as in during training.
+    Returns:
+        df: a pandas dataframe
+    """
+    if len(prediction["boxes"]) == 0:
+        return None
+
+    df = pd.DataFrame(prediction["boxes"].cpu().detach().numpy(),
+                      columns=["xmin", "ymin", "xmax", "ymax"])
+    df["label"] = prediction["labels"].cpu().detach().numpy()
+
+    if scores:
+        df["score"] = prediction["scores"].cpu().detach().numpy()
+
+    df['geometry'] = df.apply(
+        lambda x: shapely.geometry.box(x.xmin, x.ymin, x.xmax, x.ymax), axis=1)
+    return df
+
+
 def read_coco(json_file):
     """Read a COCO format JSON file and return a pandas dataframe.
 
@@ -612,7 +660,6 @@ def image_to_geo_coordinates(gdf, root_dir=None, flip_y_axis=False):
     with rasterio.open(rgb_path) as dataset:
         bounds = dataset.bounds
         left, bottom, right, top = bounds
-        pixelSizeX, pixelSizeY = dataset.res
         crs = dataset.crs
         transform = dataset.transform
 
diff --git a/src/deepforest/visualize.py b/src/deepforest/visualize.py
index a10ade1ff..5a86bc970 100644
--- a/src/deepforest/visualize.py
+++ b/src/deepforest/visualize.py
@@ -10,129 +10,9 @@
 import random
 import warnings
 import supervision as sv
-import shapely
 from deepforest.utilities import determine_geometry_type
 
 
-def view_dataset(ds, savedir=None, color=None, thickness=1):
-    """Plot annotations on images for debugging purposes.
-
-    Args:
-        ds: a deepforest pytorch dataset, see deepforest.dataset or deepforest.load_dataset() to start from a csv file
-        savedir: optional path to save figures. If none (default) images will be interactively plotted
-        color: color of the bounding box as a tuple of BGR color, e.g. orange annotations is (0, 165, 255)
-        thickness: thickness of the rectangle border line in px
-    """
-    for i in iter(ds):
-        image_path, image, targets = i
-        df = format_boxes(targets[0], scores=False)
-        image = np.moveaxis(image[0].numpy(), 0, 2)
-        image = plot_predictions(image, df, color=color, thickness=thickness)
-
-    if savedir:
-        cv2.imwrite("{}/{}".format(savedir, image_path[0]), image)
-    else:
-        plt.imshow(image)
-
-
-def format_geometry(predictions, scores=True):
-    """Format a retinanet prediction into a pandas dataframe for a batch of images
-    Args:
-        predictions: a list of dictionaries with keys 'boxes' and 'labels' coming from a retinanet
-        scores: Whether boxes come with scores, during prediction, or without scores, as in during training.
-    Returns:
-        df: a pandas dataframe
-    """
-    # Detect geometry type
-    geom_type = determine_geometry_type(predictions)
-
-    if geom_type == "box":
-        df = format_boxes(predictions, scores=scores)
-        df['geometry'] = df.apply(
-            lambda x: shapely.geometry.box(x.xmin, x.ymin, x.xmax, x.ymax), axis=1)
-    elif geom_type == "polygon":
-        raise ValueError("Polygon predictions are not yet supported for formatting")
-    elif geom_type == "point":
-        raise ValueError("Point predictions are not yet supported for formatting")
-
-    return df
-
-
-def format_boxes(prediction, scores=True):
-    """Format a retinanet prediction into a pandas dataframe for a single
-    image.
-
-    Args:
-        prediction: a dictionary with keys 'boxes' and 'labels' coming from a retinanet
-        scores: Whether boxes come with scores, during prediction, or without scores, as in during training.
-    Returns:
-        df: a pandas dataframe
-    """
-    df = pd.DataFrame(prediction["boxes"].cpu().detach().numpy(),
-                      columns=["xmin", "ymin", "xmax", "ymax"])
-    df["label"] = prediction["labels"].cpu().detach().numpy()
-
-    if scores:
-        df["score"] = prediction["scores"].cpu().detach().numpy()
-
-    return df
-
-
-def plot_prediction_and_targets(image, predictions, targets, image_name, savedir):
-    """Plot an image, its predictions, and its ground truth targets for
-    debugging.
-
-    Args:
-        image: torch tensor, RGB color order
-        targets: torch tensor
-    Returns:
-        figure_path: path on disk with saved figure
-    """
-    image = np.array(image)[:, :, ::-1].copy()
-    prediction_df = format_boxes(predictions)
-    image = plot_predictions(image, prediction_df)
-    target_df = format_boxes(targets, scores=False)
-    image = plot_predictions(image, target_df)
-    figure_path = "{}/{}.png".format(savedir, image_name)
-    cv2.imwrite(figure_path, image)
-
-    return figure_path
-
-
-def plot_prediction_dataframe(df,
-                              root_dir,
-                              savedir,
-                              color=None,
-                              thickness=1,
-                              ground_truth=None):
-    """For each row in dataframe, call plot predictions and save plot files to
-    disk. For multi-class labels, boxes will be colored by labels. Ground truth
-    boxes will all be same color, regardless of class.
-
-    Args:
-        df: a pandas dataframe with image_path, xmin, xmax, ymin, ymax and label columns. The image_path column should be the relative path from root_dir, not the full path.
-        root_dir: relative dir to look for image names from df.image_path
-        ground_truth: an optional pandas dataframe in same format as df holding ground_truth boxes
-        savedir: save the plot to an optional directory path.
-    Returns:
-        written_figures: list of filenames written
-    """
-    written_figures = []
-    for name, group in df.groupby("image_path"):
-        image = np.array(Image.open("{}/{}".format(root_dir, name)))[:, :, ::-1].copy()
-        image = plot_predictions(image, group)
-
-        if ground_truth is not None:
-            annotations = ground_truth[ground_truth.image_path == name]
-            image = plot_predictions(image, annotations, color=color, thickness=thickness)
-
-        figure_name = "{}/{}.png".format(savedir, os.path.splitext(name)[0])
-        written_figures.append(figure_name)
-        cv2.imwrite(figure_name, image)
-
-    return written_figures
-
-
 def plot_points(image, points, color=None, radius=5, thickness=1):
     """Plot points on an image
     Args:
diff --git a/tests/conftest.py b/tests/conftest.py
index 45e8a4ab7..9a33e8c5e 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -34,7 +34,7 @@ def ROOT():
     return _ROOT
 
 
-@pytest.fixture()
+@pytest.fixture(scope="session")
 def two_class_m():
     m = main.deepforest(num_classes=2, label_dict={"Alive": 0, "Dead": 1})
     m.config.train.csv_file = get_data("testfile_multi.csv")
@@ -50,7 +50,7 @@ def two_class_m():
     return m
 
 
-@pytest.fixture()
+@pytest.fixture(scope="session")
 def m(download_release):
     m = main.deepforest()
     m.config.train.csv_file = get_data("example.csv")
diff --git a/tests/profile_predict_tile.py b/tests/profile_predict_tile.py
new file mode 100644
index 000000000..977670186
--- /dev/null
+++ b/tests/profile_predict_tile.py
@@ -0,0 +1,111 @@
+import os
+import time
+import torch
+import numpy as np
+from deepforest import main
+import psutil
+import gc
+import glob
+from tabulate import tabulate
+
+def get_memory_usage():
+    """Get current memory usage in MB"""
+    process = psutil.Process(os.getpid())
+    return process.memory_info().rss / 1024 / 1024  # Convert to MB
+
+def profile_predict_tile(model, paths, device, workers=0, patch_size=1500, patch_overlap=0.05, num_runs=2, dataloader_strategy="single"):
+    """Profile predict_tile function for a given device and worker configuration"""
+    print(f"\nProfiling predict_tile on {device} with {workers} workers using {dataloader_strategy} strategy...")
+    
+    # Update worker configuration
+    model.config["workers"] = workers
+    
+    # Time profiling
+    times = []
+    for i in range(num_runs):
+        start_time = time.time()
+        if dataloader_strategy == "batch":
+            #change batch size to 1 for batch strategy
+            model.config["batch_size"] = 2
+            model.predict_tile(
+                paths=paths, 
+                patch_size=patch_size, 
+                patch_overlap=patch_overlap,
+                dataloader_strategy=dataloader_strategy
+            )
+        else:
+            for path in paths:
+                model.predict_tile(
+                    path=path, 
+                    patch_size=patch_size, 
+                    patch_overlap=patch_overlap,
+                    dataloader_strategy=dataloader_strategy
+                )    
+        end_time = time.time()
+        times.append(end_time - start_time)
+        print(f"Run {i+1}/{num_runs}: {times[-1]:.2f} seconds")
+    
+    # Clean up
+    gc.collect()
+    if device == "cuda":
+        torch.cuda.empty_cache()
+    
+    return {
+        "device": device,
+        "workers": workers,
+        "strategy": dataloader_strategy,
+        "mean_time": np.mean(times),
+        "std_time": np.std(times),
+    }
+
+def run():
+    # Initialize model
+    m = main.deepforest()
+    m.create_model()
+    m.load_model("Weecology/deepforest-bird")
+    m.config["train"]["fast_dev_run"] = False
+    m.config["batch_size"] = 3
+    strategies = ["single", "batch"]
+    
+    # Get test data
+    paths = glob.glob("/blue/ewhite/b.weinstein/BOEM/JPG_20241220_145900/*.jpg")[:20]
+    
+    # Test configurations
+    worker_configs = [0, 5]
+    devices = ["cuda"]
+    
+    # Run all configurations
+    results = []
+    for strategy in strategies:
+        for device in devices:
+            if strategy == "single":
+                # Only run workers = 0 for single strategy
+                m.create_trainer()  # Recreate trainer for each configuration
+                result = profile_predict_tile(m, paths, device, workers=0, dataloader_strategy=strategy)
+                results.append(result)
+            else:
+                for workers in worker_configs:
+                    m.create_trainer()  # Recreate trainer for each configuration
+                    result = profile_predict_tile(m, paths, device, workers, dataloader_strategy=strategy)
+                    results.append(result)
+    # Create comparison table
+    table_data = []
+    headers = ["Device", "Workers", "Strategy", "Mean Time (s)", "Std Time (s)"]
+    
+    for result in results:
+        table_data.append([
+            result["device"],
+            result["workers"],
+            result["strategy"],
+            f"{result['mean_time']:.2f}",
+            f"{result['std_time']:.2f}",
+        ])
+    
+    # Print results
+    print("\nProfiling Results Comparison:")
+    print("persistant workers")
+    print("=" * 140)
+    print(tabulate(table_data, headers=headers, tablefmt="grid"))
+    
+if __name__ == "__main__":
+    run()
\ No newline at end of file
diff --git a/tests/test_FasterRCNN.py b/tests/test_FasterRCNN.py
index 793e31509..a8c86db72 100644
--- a/tests/test_FasterRCNN.py
+++ b/tests/test_FasterRCNN.py
@@ -38,6 +38,9 @@ def test_load_backbone(config):
     x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
     prediction = resnet_backbone(x)
 
+def test_check_model(config):
+    r = FasterRCNN.Model(config)
+    r.check_model()
 
 # This test still fails, do we want a way to pass kwargs directly to method,
 # instead of being limited by config structure?
diff --git a/tests/test_IoU.py b/tests/test_IoU.py
index 20eda8a1d..f08644f00 100644
--- a/tests/test_IoU.py
+++ b/tests/test_IoU.py
@@ -10,8 +10,7 @@
 import geopandas as gpd
 import pandas as pd
 
-
-def test_compute_IoU(m, tmpdir):
+def test_compute_IoU(m):
     csv_file = get_data("OSBS_029.csv")
     predictions = m.predict_file(csv_file=csv_file, root_dir=os.path.dirname(csv_file))
     ground_truth = pd.read_csv(csv_file)
@@ -25,11 +24,7 @@ def test_compute_IoU(m, tmpdir):
 
     ground_truth.label = 0
     predictions.label = 0
-    visualize.plot_prediction_dataframe(df=predictions,
-                                        ground_truth=ground_truth,
-                                        root_dir=os.path.dirname(csv_file),
-                                        savedir=tmpdir)
 
     result = IoU.compute_IoU(ground_truth, predictions)
     assert result.shape[0] == ground_truth.shape[0]
-    assert sum(result.IoU) > 10
+    assert sum(result.IoU) > 10
\ No newline at end of file
diff --git a/tests/test_callbacks.py b/tests/test_callbacks.py
index 1e897e734..283304ad7 100644
--- a/tests/test_callbacks.py
+++ b/tests/test_callbacks.py
@@ -1,11 +1,7 @@
 # test callbacks
-from deepforest import main
 from deepforest import callbacks
 import glob
-import os
-import pytest
 from pytorch_lightning.callbacks import ModelCheckpoint
-from deepforest import get_data
 
 
 def test_log_images(m, tmpdir):
diff --git a/tests/test_datasets_cropmodel.py b/tests/test_datasets_cropmodel.py
new file mode 100644
index 000000000..de228eab4
--- /dev/null
+++ b/tests/test_datasets_cropmodel.py
@@ -0,0 +1,20 @@
+import pandas as pd
+import os
+from deepforest import get_data
+from deepforest.datasets.cropmodel import BoundingBoxDataset
+
+def test_bounding_box_dataset():
+    # Create a sample dataframe
+    df = pd.read_csv(get_data("OSBS_029.csv"))
+
+    # Create the BoundingBoxDataset object
+    ds = BoundingBoxDataset(df, root_dir=os.path.dirname(get_data("OSBS_029.png")))
+
+    # Check the length of the dataset
+    assert len(ds) == df.shape[0]
+
+    # Get an item from the dataset
+    item = ds[0]
+
+    # Check the shape of the RGB tensor
+    assert item.shape == (3, 224, 224)
\ No newline at end of file
diff --git a/tests/test_datasets_prediction.py b/tests/test_datasets_prediction.py
new file mode 100644
index 000000000..7b9788ad8
--- /dev/null
+++ b/tests/test_datasets_prediction.py
@@ -0,0 +1,39 @@
+# test dataset model
+from deepforest import get_data
+from deepforest.datasets.prediction import TiledRaster, SingleImage, MultiImage, FromCSVFile
+import os
+
+def test_TiledRaster():
+    tile_path = get_data("test_tiled.tif")
+    ds = TiledRaster(path=tile_path,
+                             patch_size=300,
+                             patch_overlap=0)
+    assert len(ds) == 16
+
+    # assert crop shape
+    assert ds[1].shape == (3, 300, 300)
+
+def test_SingleImage_path():
+    ds = SingleImage(
+        path=get_data("OSBS_029.png"),
+        patch_size=300,
+        patch_overlap=0)
+    
+    assert len(ds) == 4
+    assert ds[0].shape == (3, 300, 300)
+
+    for i in range(len(ds)):
+        assert ds.get_crop(i).shape == (3, 300, 300)
+
+def test_MultiImage():
+    ds = MultiImage(paths=[get_data("OSBS_029.png"), get_data("OSBS_029.png")],
+                    patch_size=300,
+                    patch_overlap=0)
+    assert len(ds) == 2
+    # 2 windows each image 2 * 2 = 4
+    assert ds[0].shape == (4, 3, 300, 300)
+
+def test_FromCSVFile():
+    ds = FromCSVFile(csv_file=get_data("example.csv"),
+                     root_dir=os.path.dirname(get_data("example.csv")))
+    assert len(ds) == 1
diff --git a/tests/test_dataset.py b/tests/test_datasets_training.py
similarity index 55%
rename from tests/test_dataset.py
rename to tests/test_datasets_training.py
index e971b7211..75e80c1c7 100644
--- a/tests/test_dataset.py
+++ b/tests/test_datasets_training.py
@@ -1,6 +1,5 @@
 # test dataset model
 from deepforest import get_data
-from deepforest import dataset
 from deepforest import utilities
 import os
 import pytest
@@ -8,12 +7,8 @@
 import pandas as pd
 import numpy as np
 import tempfile
-import rasterio as rio
-from deepforest.dataset import BoundingBoxDataset
-from deepforest.dataset import RasterDataset
-from torch.utils.data import DataLoader
-
 
+from deepforest.datasets.training import BoxDataset
 
 def single_class():
     csv_file = get_data("example.csv")
@@ -31,16 +26,16 @@ def raster_path():
     return get_data(path='OSBS_029.tif')
 
 @pytest.mark.parametrize("csv_file,label_dict", [(single_class(), {"Tree": 0}), (multi_class(), {"Alive": 0, "Dead": 1})])
-def test_tree_dataset(csv_file, label_dict):
+def test_BoxDataset(csv_file, label_dict):
     root_dir = os.path.dirname(get_data("OSBS_029.png"))
-    ds = dataset.TreeDataset(csv_file=csv_file, root_dir=root_dir, label_dict=label_dict)
+    ds = BoxDataset(csv_file=csv_file, root_dir=root_dir, label_dict=label_dict)
     raw_data = pd.read_csv(csv_file)
 
     assert len(ds) == len(raw_data.image_path.unique())
 
     for i in range(len(ds)):
         # Between 0 and 1
-        path, image, targets = ds[i]
+        image, targets, paths = ds[i]
         assert image.max() <= 1
         assert image.min() >= 0
         assert targets["boxes"].shape == (raw_data.shape[0], 4)
@@ -48,7 +43,6 @@ def test_tree_dataset(csv_file, label_dict):
         assert targets["labels"].dtype == torch.int64
         assert len(np.unique(targets["labels"])) == len(raw_data.label.unique())
 
-
 def test_single_class_with_empty(tmpdir):
     """Add fake empty annotations to test parsing """
     csv_file1 = get_data("example.csv")
@@ -66,27 +60,27 @@ def test_single_class_with_empty(tmpdir):
     df.to_csv("{}_test_empty.csv".format(tmpdir))
 
     root_dir = os.path.dirname(get_data("OSBS_029.png"))
-    ds = dataset.TreeDataset(csv_file="{}_test_empty.csv".format(tmpdir),
+    ds = BoxDataset(csv_file="{}_test_empty.csv".format(tmpdir),
                              root_dir=root_dir,
                              label_dict={"Tree": 0})
     assert len(ds) == 2
     # First image has annotations
-    assert not torch.sum(ds[0][2]["boxes"]) == 0
+    assert not torch.sum(ds[0][1]["boxes"]) == 0
     # Second image has no annotations
-    assert torch.sum(ds[1][2]["boxes"]) == 0
+    assert torch.sum(ds[1][1]["boxes"]) == 0
 
 
 @pytest.mark.parametrize("augment", [True, False])
-def test_tree_dataset_transform(augment):
+def test_BoxDataset_transform(augment):
     csv_file = get_data("example.csv")
     root_dir = os.path.dirname(csv_file)
-    ds = dataset.TreeDataset(csv_file=csv_file,
+    ds = BoxDataset(csv_file=csv_file,
                              root_dir=root_dir,
-                             transforms=dataset.get_transform(augment=augment))
+                             augment=augment)
 
     for i in range(len(ds)):
         # Between 0 and 1
-        path, image, targets = ds[i]
+        image, targets, path = ds[i]
         assert image.max() <= 1
         assert image.min() >= 0
         assert targets["boxes"].shape == (79, 4)
@@ -102,9 +96,8 @@ def test_collate():
     """Due to data augmentations the dataset class may yield empty bounding box annotations"""
     csv_file = get_data("example.csv")
     root_dir = os.path.dirname(csv_file)
-    ds = dataset.TreeDataset(csv_file=csv_file,
-                             root_dir=root_dir,
-                             transforms=dataset.get_transform(augment=False))
+    ds = BoxDataset(csv_file=csv_file,
+                             root_dir=root_dir)
 
     for i in range(len(ds)):
         # Between 0 and 1
@@ -117,9 +110,8 @@ def test_empty_collate():
     """Due to data augmentations the dataset class may yield empty bounding box annotations"""
     csv_file = get_data("example.csv")
     root_dir = os.path.dirname(csv_file)
-    ds = dataset.TreeDataset(csv_file=csv_file,
-                             root_dir=root_dir,
-                             transforms=dataset.get_transform(augment=False))
+    ds = BoxDataset(csv_file=csv_file,
+                             root_dir=root_dir)
 
     for i in range(len(ds)):
         # Between 0 and 1
@@ -128,15 +120,15 @@ def test_empty_collate():
         len(collated_batch[0]) == 2
 
 
-def test_dataloader():
+def test_BoxDataset_format():
     csv_file = get_data("example.csv")
     root_dir = os.path.dirname(csv_file)
-    ds = dataset.TreeDataset(csv_file=csv_file, root_dir=root_dir, train=False)
-    image = next(iter(ds))
+    ds = BoxDataset(csv_file=csv_file, root_dir=root_dir)
+    image, targets, path = next(iter(ds))
+    
     # Assert image is channels first format
     assert image.shape[0] == 3
 
-
 def test_multi_image_warning():
     tmpdir = tempfile.gettempdir()
     csv_file1 = get_data("example.csv")
@@ -148,66 +140,11 @@ def test_multi_image_warning():
     df.to_csv(csv_file)
 
     root_dir = os.path.dirname(csv_file1)
-    ds = dataset.TreeDataset(csv_file=csv_file,
-                             root_dir=root_dir,
-                             transforms=dataset.get_transform(augment=False))
+    ds = BoxDataset(csv_file=csv_file,
+                             root_dir=root_dir)
 
     for i in range(len(ds)):
         # Between 0 and 1
         batch = ds[i]
         collated_batch = utilities.collate_fn([None, batch, batch])
-        len(collated_batch[0]) == 2
-
-
-@pytest.mark.parametrize("preload_images", [True, False])
-def test_tile_dataset(preload_images):
-    tile_path = get_data("2019_YELL_2_528000_4978000_image_crop2.png")
-    tile = rio.open(tile_path).read()
-    tile = np.moveaxis(tile, 0, 2)
-    ds = dataset.TileDataset(tile=tile,
-                             preload_images=preload_images,
-                             patch_size=100,
-                             patch_overlap=0)
-    assert len(ds) > 0
-
-    # assert crop shape
-    assert ds[1].shape == (3, 100, 100)
-
-
-def test_bounding_box_dataset():
-    # Create a sample dataframe
-    df = pd.read_csv(get_data("OSBS_029.csv"))
-
-    # Create the BoundingBoxDataset object
-    ds = BoundingBoxDataset(df, root_dir=os.path.dirname(get_data("OSBS_029.png")))
-
-    # Check the length of the dataset
-    assert len(ds) == df.shape[0]
-
-    # Get an item from the dataset
-    item = ds[0]
-
-    # Check the shape of the RGB tensor
-    assert item.shape == (3, 224, 224)
-
-def test_raster_dataset():
-    """Test the RasterDataset class"""
-
-    # Test initialization and context manager
-    ds = RasterDataset(get_data("test_tiled.tif"), patch_size=256, patch_overlap=0.1)
-
-    # Test basic properties
-    assert hasattr(ds, 'windows')
-
-    # Test first window
-    first_crop = ds[0]
-    assert isinstance(first_crop, torch.Tensor)
-    assert first_crop.dtype == torch.float32
-    assert first_crop.shape[0] == 3  # RGB channels first
-    assert 0 <= first_crop.min() <= first_crop.max() <= 1.0  # Check normalization
-
-    # Test with DataLoader
-    dataloader = DataLoader(ds, batch_size=2, num_workers=0)
-    batch = next(iter(dataloader))
-    assert batch.shape[0] == 2  # Batch size
-    assert batch.shape[1] == 3  # Channels first
+        len(collated_batch[0]) == 2
\ No newline at end of file
diff --git a/tests/test_evaluate.py b/tests/test_evaluate.py
index 6d6b1e4ac..6e3279a8c 100644
--- a/tests/test_evaluate.py
+++ b/tests/test_evaluate.py
@@ -147,4 +147,4 @@ def test_point_recall():
 
     results = evaluate.point_recall(ground_df=ground_df, predictions=predictions)
     assert results["box_recall"] == 0.5
-    assert results["class_recall"].recall[0] == 1
+    assert results["class_recall"].recall[0] == 1
\ No newline at end of file
diff --git a/tests/test_main.py b/tests/test_main.py
index bd68048e1..8033127ed 100644
--- a/tests/test_main.py
+++ b/tests/test_main.py
@@ -14,9 +14,10 @@
 import albumentations as A
 from albumentations.pytorch import ToTensorV2
 
-from deepforest import main, get_data, dataset, model
-from deepforest.visualize import format_geometry
-from deepforest.utilities import read_file
+from deepforest import main, get_data, model
+from deepforest.utilities import read_file, format_geometry
+from deepforest.datasets import prediction
+from deepforest.visualize import plot_results 
 
 from pytorch_lightning import Trainer
 from pytorch_lightning.callbacks import Callback
@@ -69,6 +70,7 @@ def m(download_release):
 
     m.create_trainer()
     m.load_model("weecology/deepforest-tree")
+
     return m
 
 
@@ -95,10 +97,10 @@ def path():
     return get_data(path='OSBS_029.tif')
 
 
+@pytest.fixture()
 def big_file():
     tmpdir = tempfile.gettempdir()
     csv_file = get_data("OSBS_029.csv")
-    image_path = get_data("OSBS_029.png")
     df = pd.read_csv(csv_file)
 
     big_frame = []
@@ -115,6 +117,9 @@ def big_file():
 
     return "{}/annotations.csv".format(tmpdir)
 
+def test_m_has_tree_model_loaded(m):
+    boxes = m.predict_image(path=get_data("OSBS_029.tif"))
+    assert not boxes.empty
 
 def test_tensorboard_logger(m, tmpdir):
     # Check if TensorBoard is installed
@@ -202,9 +207,7 @@ def test_validation_step_empty():
     val_dataloader = m.val_dataloader()
     batch = next(iter(val_dataloader))
     m.predictions = []
-    val_loss = m.validation_step(batch, 0)
-    assert len(m.predictions) == 1
-    assert m.predictions[0].xmin.isna().all()
+    val_predictions = m.validation_step(batch, 0)
     assert m.iou_metric.compute()["iou"] == 0
 
 def test_validate(m):
@@ -295,6 +298,7 @@ def test_predict_image_fromfile(m):
     assert set(prediction.columns) == {
         "xmin", "ymin", "xmax", "ymax", "label", "score", "image_path", "geometry"
     }
+    assert not prediction.empty
 
 
 def test_predict_image_fromarray(m):
@@ -313,20 +317,17 @@ def test_predict_image_fromarray(m):
     assert set(prediction.columns) == {"xmin", "ymin", "xmax", "ymax", "label", "score", "geometry"}
     assert not hasattr(prediction, 'root_dir')
 
-def test_predict_big_file(m, tmpdir):
+def test_predict_big_file(m, big_file):
     m.config.train.fast_dev_run = False
     m.create_trainer()
-    csv_file = big_file()
-    original_file = pd.read_csv(csv_file)
-    df = m.predict_file(csv_file=csv_file,
-                        root_dir=os.path.dirname(csv_file))
+    df = m.predict_file(csv_file=big_file,
+                        root_dir=os.path.dirname(big_file))
     assert set(df.columns) == {
         'label', 'score', 'image_path', 'geometry', "xmin", "ymin", "xmax", "ymax"
     }
 
-def test_predict_small_file(m, tmpdir):
+def test_predict_small_file(m):
     csv_file = get_data("OSBS_029.csv")
-    original_file = pd.read_csv(csv_file)
     df = m.predict_file(csv_file, root_dir=os.path.dirname(csv_file))
     assert set(df.columns) == {
         'label', 'score', 'image_path', 'geometry', "xmin", "ymin", "xmax", "ymax"
@@ -336,11 +337,10 @@ def test_predict_small_file(m, tmpdir):
 def test_predict_dataloader(m, batch_size, path):
     m.config.batch_size = batch_size
     tile = np.array(Image.open(path))
-    ds = dataset.TileDataset(tile=tile, patch_overlap=0.1, patch_size=100)
+    ds = prediction.SingleImage(image=tile, path=path, patch_overlap=0.1, patch_size=100)
     dl = m.predict_dataloader(ds)
     batch = next(iter(dl))
-    batch.shape[0] == batch_size
-
+    assert batch.shape[0] == batch_size
 
 def test_predict_tile_empty(path):
     # Random weights
@@ -348,20 +348,23 @@ def test_predict_tile_empty(path):
     predictions = m.predict_tile(path=path, patch_size=300, patch_overlap=0)
     assert predictions is None
 
-@pytest.mark.parametrize("in_memory", [True, False])
-def test_predict_tile(m, path, in_memory):
+@pytest.mark.parametrize("dataloader_strategy", ["single", "window", "batch"])
+def test_predict_tile(m, path, dataloader_strategy):
     m.create_model()
     m.config.train.fast_dev_run = False
     m.create_trainer()
+    m.load_model("weecology/deepforest-tree")
 
-    if in_memory:
-        path = path
+    if dataloader_strategy == "single":
+        image_path = path
+    elif dataloader_strategy == "window":
+        image_path = get_data("test_tiled.tif")
     else:
-        path = get_data("test_tiled.tif")
+        image_path = [path]
 
-    prediction = m.predict_tile(path=path,
+    prediction = m.predict_tile(path=image_path,
                                 patch_size=300,
-                                in_memory=in_memory,
+                                dataloader_strategy=dataloader_strategy,
                                 patch_overlap=0.1)
 
     assert isinstance(prediction, pd.DataFrame)
@@ -370,36 +373,51 @@ def test_predict_tile(m, path, in_memory):
     }
     assert not prediction.empty
 
-# test equivalence for in_memory=True and False
-def test_predict_tile_equivalence(m):
-    path = get_data("test_tiled.tif")
-    in_memory_prediction = m.predict_tile(path=path, patch_size=300, patch_overlap=0, in_memory=True)
-    not_in_memory_prediction = m.predict_tile(path=path, patch_size=300, patch_overlap=0, in_memory=False)
-    assert in_memory_prediction.equals(not_in_memory_prediction)
+    # Assert there are predictions in each corner of the image
+    assert prediction.xmin.min() < 50
+    assert prediction.xmin.max() > 350
+    assert prediction.ymin.min() < 50
+    assert prediction.ymin.max() > 350
 
-def test_predict_tile_from_array(m, path):
-    # test predict numpy image
-    image = np.array(Image.open(path))
+    plot_results(prediction)
+
+
+# Add predict_tile for serial single dataloader strategy
+def test_predict_tile_serial_single(m):
+    path1 = get_data("OSBS_029.png")
+    path2 = get_data("SOAP_031.png")
+    m.create_model()
     m.config.train.fast_dev_run = False
     m.create_trainer()
-    prediction = m.predict_tile(image=image,
-                                patch_size=300)
+    m.load_model("weecology/deepforest-tree")
+    prediction = m.predict_tile(path=[path1, path2], patch_size=300, patch_overlap=0, dataloader_strategy="batch")
+    assert prediction.image_path.unique().tolist() == [os.path.basename(path1), os.path.basename(path2)]
 
-    assert not prediction.empty
+    # view the predictions of each image
+    prediction_1 = prediction[prediction.image_path == os.path.basename(path1)]
+    prediction_1.root_dir = os.path.dirname(path1)
+    prediction_2 = prediction[prediction.image_path == os.path.basename(path2)]
+    prediction_2.root_dir = os.path.dirname(path2)
+
+    plot_results(prediction_1)
+    plot_results(prediction_2)
+
+# test equivalence for within and out of memory dataset strategies
+def test_predict_tile_equivalence(m):
+    path = get_data("test_tiled.tif")
+    in_memory_prediction = m.predict_tile(path=path, patch_size=300, patch_overlap=0, dataloader_strategy="single")
+    not_in_memory_prediction = m.predict_tile(path=path, patch_size=300, patch_overlap=0, dataloader_strategy="window")
 
+    # Assert same number of predictions
+    assert len(in_memory_prediction) == len(not_in_memory_prediction)
 
-def test_predict_tile_no_mosaic(m, path):
-    # test no mosaic, return a tuple of crop and prediction
+def test_predict_tile_from_array(m, path):
+    image = np.array(Image.open(path))
     m.config.train.fast_dev_run = False
     m.create_trainer()
-    prediction = m.predict_tile(path=path,
-                                patch_size=300,
-                                patch_overlap=0,
-                                mosaic=False)
-    assert len(prediction) == 4
-    assert len(prediction[0]) == 2
-    assert prediction[0][1].shape == (300, 300, 3)
+    prediction = m.predict_tile(image=image, patch_size=300)
 
+    assert not prediction.empty    
 
 def test_evaluate(m, tmpdir):
     csv_file = get_data("OSBS_029.csv")
@@ -516,7 +534,7 @@ def get_transform(augment):
     root_dir = os.path.dirname(csv_file)
     train_ds = m.load_dataset(csv_file, root_dir=root_dir, augment=True)
 
-    path, image, target = next(iter(train_ds))
+    image, target, path = next(iter(train_ds))
     assert m.transforms.__doc__ == "This is the new transform"
 
 #TODO: Fix this test to check that predictions change as checking
@@ -593,19 +611,16 @@ def test_load_existing_train_dataloader(m, tmpdir, existing_loader):
     # Inspect original for comparison of batch size
     m.config.train.csv_file = "{}/train.csv".format(tmpdir.strpath)
     m.config.train.root_dir = tmpdir.strpath
-    m.create_trainer(fast_dev_run=True)
-    m.trainer.fit(m)
-    batch = next(iter(m.trainer.train_dataloader))
+    batch = next(iter(m.train_dataloader()))
     assert len(batch[0]) == m.config.batch_size
 
     # Existing train dataloader
     m.config.train.csv_file = "{}/train.csv".format(tmpdir.strpath)
     m.config.train.root_dir = tmpdir.strpath
     m.existing_train_dataloader = existing_loader
-    m.train_dataloader()
     m.create_trainer(fast_dev_run=True)
     m.trainer.fit(m)
-    batch = next(iter(m.trainer.train_dataloader))
+    batch = next(iter(m.train_dataloader()))
     assert len(batch[0]) == m.config.batch_size + 1
 
 
@@ -613,16 +628,15 @@ def test_existing_val_dataloader(m, tmpdir, existing_loader):
     m.config.validation["csv_file"] = "{}/train.csv".format(tmpdir.strpath)
     m.config.validation["root_dir"] = tmpdir.strpath
     m.existing_val_dataloader = existing_loader
-    m.val_dataloader()
     m.create_trainer()
     m.trainer.validate(m)
-    batch = next(iter(m.trainer.val_dataloaders))
+    batch = next(iter(m.val_dataloader()))
     assert len(batch[0]) == m.config.batch_size + 1
 
 
 def test_existing_predict_dataloader(m, tmpdir):
     # Predict datasets yield only images
-    ds = dataset.TileDataset(tile=np.random.random((400, 400, 3)).astype("float32"),
+    ds = prediction.TiledRaster(path=get_data("test_tiled.tif"),
                              patch_overlap=0.1,
                              patch_size=100)
     existing_loader = m.predict_dataloader(ds)
@@ -697,7 +711,6 @@ def test_predict_tile_with_crop_model(m, config):
     patch_size = 400
     patch_overlap = 0.05
     iou_threshold = 0.15
-    mosaic = True
     # Set up the crop model
     crop_model = model.CropModel(num_classes=2, label_dict = {"Dead":0, "Alive":1})
 
@@ -708,7 +721,6 @@ def test_predict_tile_with_crop_model(m, config):
                             patch_size=patch_size,
                             patch_overlap=patch_overlap,
                             iou_threshold=iou_threshold,
-                            mosaic=mosaic,
                             crop_model=crop_model)
 
     # Assert the result
@@ -727,10 +739,6 @@ def test_predict_tile_with_crop_model_empty():
     """If the model return is empty, the crop model should return an empty dataframe"""
     path = get_data("SOAP_061.png")
     m = main.deepforest()
-    patch_size = 400
-    patch_overlap = 0.05
-    iou_threshold = 0.15
-    mosaic = True
     
     # Set up the crop model
     crop_model = model.CropModel(num_classes=2, label_dict = {"Dead": 0, "Alive": 1})
@@ -739,10 +747,9 @@ def test_predict_tile_with_crop_model_empty():
     m.config.train.fast_dev_run = False
     m.create_trainer()
     result = m.predict_tile(path=path,
-                            patch_size=patch_size,
-                            patch_overlap=patch_overlap,
-                            iou_threshold=iou_threshold,
-                            mosaic=mosaic,
+                            patch_size=400,
+                            patch_overlap=0.05,
+                            iou_threshold=0.15,
                             crop_model=crop_model)
     
 
@@ -754,7 +761,6 @@ def test_predict_tile_with_multiple_crop_models(m, config):
     patch_size = 400
     patch_overlap = 0.05
     iou_threshold = 0.15
-    mosaic = True
 
     # Create multiple crop models
     crop_model = [model.CropModel(num_classes=2, label_dict={"Dead":0, "Alive":1}), model.CropModel(num_classes=3, label_dict={"Dead":0, "Alive":1, "Sapling":2})]
@@ -766,7 +772,6 @@ def test_predict_tile_with_multiple_crop_models(m, config):
                             patch_size=patch_size,
                             patch_overlap=patch_overlap,
                             iou_threshold=iou_threshold,
-                            mosaic=mosaic,
                             crop_model=crop_model)
 
     # Assert result type
@@ -786,10 +791,6 @@ def test_predict_tile_with_multiple_crop_models_empty():
     """If no predictions are made, result should be empty"""
     path = get_data("SOAP_061.png")
     m = main.deepforest()
-    patch_size = 400
-    patch_overlap = 0.05
-    iou_threshold = 0.15
-    mosaic = True
 
     # Create multiple crop models
     crop_model_1 = model.CropModel(num_classes=2, label_dict={"Dead":0, "Alive":1})
@@ -798,50 +799,46 @@ def test_predict_tile_with_multiple_crop_models_empty():
     m.config.train.fast_dev_run = False
     m.create_trainer()
     result = m.predict_tile(path=path,
-                            patch_size=patch_size,
-                            patch_overlap=patch_overlap,
-                            iou_threshold=iou_threshold,
-                            mosaic=mosaic,
+                            patch_size=400,
+                            patch_overlap=0.05,
+                            iou_threshold=0.15,
                             crop_model=[crop_model_1, crop_model_2])
 
     assert result is None or result.empty  # Ensure empty result is handled properly
 
 def test_batch_prediction(m, path):
     # Prepare input data
-    tile = np.array(Image.open(path))
-    ds = dataset.TileDataset(tile=tile, patch_overlap=0.1, patch_size=300)
+    ds = prediction.SingleImage(path=path, patch_overlap=0.1, patch_size=300)
     dl = DataLoader(ds, batch_size=3)
 
     # Perform prediction
     predictions = []
     for batch in dl:
-        prediction = m.predict_batch(batch)
-        predictions.append(prediction)
-
-    # Check results
-    assert len(predictions) == len(dl)
-    for batch_pred in predictions:
-        for image_pred in batch_pred:
-            assert isinstance(image_pred, pd.DataFrame)
-            assert "label" in image_pred.columns
-            assert "score" in image_pred.columns
-            assert "geometry" in image_pred.columns
+        batch_predictions = m.predict_batch(batch)
+        predictions.extend(batch_predictions)
+
+    # Check results  
+    assert len(predictions) == len(ds)
+    for image_pred in predictions:
+        assert isinstance(image_pred, pd.DataFrame)
+        assert "label" in image_pred.columns
+        assert "score" in image_pred.columns
+        assert "geometry" in image_pred.columns
 
 def test_batch_inference_consistency(m, path):
-    tile = np.array(Image.open(path))
-    ds = dataset.TileDataset(tile=tile, patch_overlap=0.1, patch_size=300)
+    ds = prediction.SingleImage(path=path, patch_overlap=0.1, patch_size=300)
     dl = DataLoader(ds, batch_size=4)
 
     batch_predictions = []
     for batch in dl:
-        prediction = m.predict_batch(batch)
-        batch_predictions.extend(prediction)
+        batch_prediction = m.predict_batch(batch)
+        batch_predictions.extend(batch_prediction)
 
     single_predictions = []
     for image in ds:
-        image = image.permute(1,2,0).numpy() * 255
-        prediction = m.predict_image(image=image)
-        single_predictions.append(prediction)
+        image = np.rollaxis(image, 0, 3) * 255
+        single_prediction = m.predict_image(image=image)
+        single_predictions.append(single_prediction)
 
     batch_df = pd.concat(batch_predictions, ignore_index=True)
     single_df = pd.concat(single_predictions, ignore_index=True)
@@ -853,14 +850,15 @@ def test_batch_inference_consistency(m, path):
     pd.testing.assert_frame_equal(batch_df[["xmin", "ymin", "xmax", "ymax"]], single_df[["xmin", "ymin", "xmax", "ymax"]], check_dtype=False)
 
 
-def test_epoch_evaluation_end(m):
+def test_epoch_evaluation_end(m, tmpdir):
+    """Test the epoch evaluation end method by """
     preds = [{
         'boxes': torch.tensor([
             [690.3572, 902.9113, 781.1031, 996.5151],
             [998.1990, 655.7919, 172.4619, 321.8518]
         ]),
         'scores': torch.tensor([
-            0.6740, 0.6625
+            1.0, 1.0
         ]),
         'labels': torch.tensor([
             0, 0
@@ -873,8 +871,21 @@ def test_epoch_evaluation_end(m):
 
     boxes = format_geometry(preds[0])
     boxes["image_path"] = "test"
-    m.predictions = [boxes]
-    m.on_validation_epoch_end()
+
+    predictions = boxes.copy()
+    assert m.iou_metric.compute()["iou"] == 1.0
+
+    # write a csv file to the tmpdir
+    boxes["label"] = "Tree"
+    m.predictions = [predictions]
+    boxes.to_csv(tmpdir.strpath + "/predictions.csv", index=False)
+    m.config.validation.csv_file = tmpdir.strpath + "/predictions.csv"
+    m.config.validation.root_dir = tmpdir.strpath
+
+    results = m.on_validation_epoch_end()
+
+    assert results["box_precision"] == 1.0
+    assert results["box_recall"] == 1.0
 
 def test_epoch_evaluation_end_empty(m):
     """If the model returns an empty prediction, the metrics should not fail"""
@@ -909,7 +920,10 @@ def test_empty_frame_accuracy_all_empty_with_predictions(m, tmpdir):
 
     m.create_trainer()
     results = m.trainer.validate(m)
-    assert results[0]["empty_frame_accuracy"] == 0
+    
+    # This is bit of a preference, if there are no predictions, the empty frame accuracy should be 0, precision is 0, and accuracy is None.
+    assert results[0]["empty_frame_accuracy"] == 0.0
+    assert results[0]["box_precision"] == 0.0
 
 def test_empty_frame_accuracy_mixed_frames_with_predictions(m, tmpdir):
     """Test empty frame accuracy with a mix of empty and non-empty frames.
@@ -929,8 +943,9 @@ def test_empty_frame_accuracy_mixed_frames_with_predictions(m, tmpdir):
 
     # Save the ground truth to a temporary file
     ground_df.to_csv(tmpdir.strpath + "/ground_truth.csv", index=False)
-    m.config.validation["csv_file"] = tmpdir.strpath + "/ground_truth.csv"
-    m.config.validation["root_dir"] = os.path.dirname(get_data("testfile_deepforest.csv"))
+    m.config.validation.csv_file = tmpdir.strpath + "/ground_truth.csv"
+    m.config.validation.root_dir = os.path.dirname(get_data("testfile_deepforest.csv"))
+    m.config.validation.size = 400
 
     m.create_trainer()
     results = m.trainer.validate(m)
diff --git a/tests/test_model.py b/tests/test_model.py
index 9970f8af7..c98f723d4 100644
--- a/tests/test_model.py
+++ b/tests/test_model.py
@@ -5,19 +5,16 @@
 import pandas as pd
 import os
 from torchvision import transforms
-import pytorch_lightning as pl
-import numpy as np
-from deepforest.predict import _predict_crop_model_
 
 # The model object is architecture agnostic container.
 def test_model_no_args(config):
     with pytest.raises(ValueError):
-        model.Model(config)
+        model.Model.create_model(config)
 
 @pytest.fixture()
 def crop_model():
     crop_model = model.CropModel(num_classes=2)
-
+    
     return crop_model
 
 @pytest.fixture()
@@ -35,8 +32,7 @@ def crop_model_data(crop_model, tmpdir):
 
     return None
 
-def test_crop_model(
-        crop_model):  # Use pytest tempdir fixture to create a temporary directory
+def test_crop_model(crop_model):
     # Test forward pass
     x = torch.rand(4, 3, 224, 224)
     output = crop_model.forward(x)
@@ -69,6 +65,11 @@ def test_crop_model_train(crop_model, tmpdir, crop_model_data):
     crop_model.trainer.fit(crop_model)
     crop_model.trainer.validate(crop_model)
 
+def test_crop_model_recreate_model(tmpdir, crop_model_data):
+    crop_model = model.CropModel()
+    crop_model.load_from_disk(train_dir=tmpdir, val_dir=tmpdir, recreate_model=True)
+    assert crop_model.model is not None
+    assert crop_model.model.fc.out_features == 2
 
 def test_crop_model_custom_transform():
     # Create a dummy instance of CropModel
diff --git a/tests/test_multiprocessing.py b/tests/test_multiprocessing.py
index 9f67d7813..2f0236917 100644
--- a/tests/test_multiprocessing.py
+++ b/tests/test_multiprocessing.py
@@ -1,12 +1,11 @@
 # Ensure that multiprocessing is behaving as expected.
 from deepforest import main, get_data
-from deepforest import dataset
+from deepforest.datasets import prediction
 
 import pytest
 import os
 
-
-@pytest.mark.parametrize("num_workers", [0, 2])
+@pytest.mark.parametrize("num_workers", [2])
 def test_predict_tile_workers(m, num_workers):
     # Default workers is 0
     original_workers = m.config.workers
@@ -15,21 +14,72 @@ def test_predict_tile_workers(m, num_workers):
     m.config.workers = num_workers
     csv_file = get_data("OSBS_029.csv")
     # make a dataset
-    ds = dataset.TreeDataset(csv_file=csv_file,
-                             root_dir=os.path.dirname(csv_file),
-                             transforms=None,
-                             train=False)
+    ds = prediction.FromCSVFile(csv_file=csv_file,
+                             root_dir=os.path.dirname(csv_file))
     dataloader = m.predict_dataloader(ds)
     assert dataloader.num_workers == num_workers
 
-@pytest.mark.parametrize("num_workers", [0, 2])
-def test_predict_tile_workers_config(num_workers):
-    m = main.deepforest(config_args={"workers": num_workers})
+
+@pytest.mark.parametrize("num_workers", [2])
+@pytest.mark.parametrize("dataset_class", [
+    prediction.FromCSVFile,
+    prediction.SingleImage,
+    prediction.MultiImage,
+    prediction.TiledRaster,
+])
+def test_dataset_tile_workers_config(m, num_workers, dataset_class):
     csv_file = get_data("OSBS_029.csv")
-    # make a dataset
-    ds = dataset.TreeDataset(csv_file=csv_file,
-                             root_dir=os.path.dirname(csv_file),
-                             transforms=None,
-                             train=False)
+    root_dir = os.path.dirname(csv_file)
+    
+    # Create dataset based on class
+    if dataset_class == prediction.FromCSVFile:
+        ds = dataset_class(csv_file=csv_file, root_dir=root_dir)
+    elif dataset_class == prediction.SingleImage:
+        image_path = os.path.join(root_dir, "OSBS_029.png")
+        ds = dataset_class(path=image_path)
+    elif dataset_class == prediction.MultiImage:
+        image_path = os.path.join(root_dir, "OSBS_029.png")
+        ds = dataset_class(paths=[image_path], patch_size=400, patch_overlap=0.1)
+    else:  # TiledRaster
+        image_path = os.path.join(root_dir, "test_tiled.tif")
+        ds = dataset_class(path=image_path, patch_size=400, patch_overlap=0.1)
+        
     dataloader = m.predict_dataloader(ds)
     assert dataloader.num_workers == num_workers
+
+
+def test_multi_process_dataloader_strategy_single(m):
+    root_dir = os.path.dirname(get_data("OSBS_029.csv"))
+    image_path = os.path.join(root_dir, "OSBS_029.png")
+    
+    results = m.predict_tile(
+        path=image_path,
+        dataloader_strategy="single",
+        patch_size=400,
+        patch_overlap=0,
+    )
+    assert len(results) > 0
+
+def test_multi_process_dataloader_strategy_batch(m):
+    root_dir = os.path.dirname(get_data("OSBS_029.csv"))
+    image_path = os.path.join(root_dir, "OSBS_029.png")
+    
+    results = m.predict_tile(
+        path=[image_path],
+        dataloader_strategy="batch",
+        patch_size=400,
+        patch_overlap=0,
+    )
+    assert len(results) > 0
+
+def test_multi_process_dataloader_strategy_window(m):
+    root_dir = os.path.dirname(get_data("OSBS_029.csv"))
+    image_path = os.path.join(root_dir, "test_tiled.tif")
+    
+    with pytest.raises(ValueError):
+        results = m.predict_tile(
+            path=image_path,
+            dataloader_strategy="window", 
+            patch_size=400,
+            patch_overlap=0,
+        )
diff --git a/tests/test_preprocess.py b/tests/test_preprocess.py
index 27e11e3f1..53aa9f275 100644
--- a/tests/test_preprocess.py
+++ b/tests/test_preprocess.py
@@ -42,7 +42,11 @@ def geodataframe():
 @pytest.fixture()
 def image(config):
     raster = Image.open(config.path_to_raster)
-    return np.array(raster)
+
+    # Convert to channels first
+    raster = np.array(raster)
+    raster = np.moveaxis(raster, 2, 0)
+    return raster
 
 
 def test_compute_windows(config, image):
@@ -287,15 +291,6 @@ def test_split_raster_from_csv(tmpdir):
                                                 patch_size=300)
     assert not split_annotations.empty
 
-    # Plot labels
-    images = visualize.plot_prediction_dataframe(split_annotations,
-                                                 root_dir=tmpdir,
-                                                 savedir=tmpdir)
-
-    for image in images:
-        im = Image.open(image)
-        im.show()
-
 
 def test_split_raster_from_shp(tmpdir):
     annotations = get_data("2018_SJER_3_252000_4107000_image_477.csv")
@@ -312,26 +307,4 @@ def test_split_raster_from_shp(tmpdir):
                                                 root_dir=os.path.dirname(path_to_raster),
                                                 patch_size=300)
 
-    assert not split_annotations.empty
-
-    # Plot labels
-    images = visualize.plot_prediction_dataframe(split_annotations,
-                                                 root_dir=tmpdir,
-                                                 savedir=tmpdir)
-
-    for image in images:
-        im = Image.open(image)
-        im.show()
-
-
-# def test_view_annotation_split(tmpdir, config):
-#     """Test that the split annotations can be visualized and mantain location, turn show to True for debugging interactively"""
-#     annotations = get_data("2019_YELL_2_541000_4977000_image_crop.xml")
-#     gdf = utilities.read_file(annotations)
-#     path_to_raster = get_data("2019_YELL_2_541000_4977000_image_crop.png")
-#     split_annotations = preprocess.split_raster(gdf, path_to_raster=path_to_raster, save_dir=tmpdir, patch_size=300, patch_overlap=0.5)
-#     images = visualize.plot_prediction_dataframe(split_annotations, root_dir=tmpdir, savedir=tmpdir)
-#     # View the images
-#     for image in images:
-#         im = Image.open(image)
-#         im.show()
+    assert not split_annotations.empty
\ No newline at end of file
diff --git a/tests/test_retinanet.py b/tests/test_retinanet.py
index 1a588e6af..f2b29e683 100644
--- a/tests/test_retinanet.py
+++ b/tests/test_retinanet.py
@@ -33,6 +33,10 @@ def test_retinanet(config):
     assert r
 
 
+def retinanet_check_model(config):
+    r = retinanet.Model(config)
+    r.check_model()
+
 def test_load_backbone(config):
     r = retinanet.Model(config)
     resnet_backbone = r.load_backbone()
diff --git a/tests/test_utilities.py b/tests/test_utilities.py
index f2a692e8c..92ef0788a 100644
--- a/tests/test_utilities.py
+++ b/tests/test_utilities.py
@@ -7,6 +7,7 @@
 from shapely import geometry
 import geopandas as gpd
 import json
+import torch
 
 from deepforest import get_data
 from deepforest import visualize
@@ -348,14 +349,6 @@ def test_geo_to_image_coordinates_UTM_N(tmpdir):
     assert image_coords[image_coords.intersects(numpy_window)].shape[0] == pd.read_csv(
         annotations).shape[0]
 
-    images = visualize.plot_prediction_dataframe(image_coords,
-                                                 root_dir=os.path.dirname(path_to_raster),
-                                                 savedir=tmpdir)
-    # Confirm the image coordinates are correct
-    for image in images:
-        im = Image.open(image)
-        im.show()
-
 
 def test_geo_to_image_coordinates_UTM_S(tmpdir):
     """Read in a csv file, make a projected shapefile, convert to image coordinates and view the results"""
@@ -383,14 +376,6 @@ def test_geo_to_image_coordinates_UTM_S(tmpdir):
     numpy_window = geometry.box(0, 0, width, height)
     assert image_coords[image_coords.intersects(numpy_window)].shape[0] == gpd.read_file(annotations).shape[0]
 
-    images = visualize.plot_prediction_dataframe(image_coords,
-                                                 root_dir=os.path.dirname(path_to_raster),
-                                                 savedir=tmpdir)
-    # Confirm the image coordinates are correct
-    for image in images:
-        im = Image.open(image)
-        im.show()
-
 
 def test_image_to_geo_coordinates(tmpdir):
     annotations = get_data("2018_SJER_3_252000_4107000_image_477.csv")
@@ -398,16 +383,10 @@ def test_image_to_geo_coordinates(tmpdir):
 
     # Convert to image coordinates
     gdf = utilities.read_file(annotations)
-    images = visualize.plot_prediction_dataframe(gdf, root_dir=os.path.dirname(path_to_raster), savedir=tmpdir)
 
     # Confirm it has no crs
     assert gdf.crs is None
 
-    # Confirm the image coordinates are correct
-    for image in images:
-        im = Image.open(image)
-        im.show(title="before")
-
     # Convert to geo coordinates
     src = rio.open(path_to_raster)
     geo_coords = utilities.image_to_geo_coordinates(gdf)
@@ -428,18 +407,10 @@ def test_image_to_geo_coordinates_boxes(tmpdir):
 
     # Convert to image coordinates
     gdf = utilities.read_file(annotations)
-    images = visualize.plot_prediction_dataframe(gdf,
-                                                 root_dir=os.path.dirname(path_to_raster),
-                                                 savedir=tmpdir)
 
     # Confirm it has no crs
     assert gdf.crs is None
 
-    # Confirm the image coordinates are correct
-    for image in images:
-        im = Image.open(image)
-        im.show(title="before")
-
     # Convert to geo coordinates
     src = rio.open(path_to_raster)
     geo_coords = utilities.image_to_geo_coordinates(gdf)
@@ -461,18 +432,10 @@ def test_image_to_geo_coordinates_points(tmpdir):
     # Convert to image coordinates
     gdf = utilities.read_file(annotations)
     gdf["geometry"] = gdf.geometry.centroid
-    images = visualize.plot_prediction_dataframe(gdf,
-                                                 root_dir=os.path.dirname(path_to_raster),
-                                                 savedir=tmpdir)
 
     # Confirm it has no crs
     assert gdf.crs is None
 
-    # Confirm the image coordinates are correct
-    for image in images:
-        im = Image.open(image)
-        im.show(title="before")
-
     # Convert to geo coordinates
     src = rio.open(path_to_raster)
     geo_coords = utilities.image_to_geo_coordinates(gdf)
@@ -495,18 +458,10 @@ def test_image_to_geo_coordinates_polygons(tmpdir):
     gdf = utilities.read_file(annotations)
     # Skew boxes to make them polygons
     gdf["geometry"] = gdf.geometry.skew(7, 7)
-    images = visualize.plot_prediction_dataframe(gdf,
-                                                 root_dir=os.path.dirname(path_to_raster),
-                                                 savedir=tmpdir)
 
     # Confirm it has no crs
     assert gdf.crs is None
 
-    # Confirm the image coordinates are correct
-    for image in images:
-        im = Image.open(image)
-        im.show(title="before")
-
     # Convert to geo coordinates
     src = rio.open(path_to_raster)
     geo_coords = utilities.image_to_geo_coordinates(gdf)
@@ -577,3 +532,140 @@ def test_read_coco_json(tmpdir):
     for geom in df.geometry:
         assert geom.is_valid
         assert isinstance(geom, shapely.geometry.Polygon)
+
+
+def test_format_geometry_box():
+    """Test formatting box geometry from model predictions"""
+    # Create a mock prediction with box coordinates
+    prediction = {
+        "boxes": torch.tensor([[10, 20, 30, 40], [50, 60, 70, 80]]),
+        "labels": torch.tensor([0, 0]),
+        "scores": torch.tensor([1.0, 0.8])
+    }
+    
+    # Format geometry
+    result = utilities.format_geometry(prediction)
+    
+    # Check output format
+    assert isinstance(result, pd.DataFrame)
+    assert list(result.columns) == ["xmin", "ymin", "xmax", "ymax", "label", "score", "geometry"]
+    assert len(result) == 2
+    
+    # Check values
+    assert result.iloc[0]["xmin"] == 10
+    assert result.iloc[0]["ymin"] == 20
+    assert result.iloc[0]["xmax"] == 30
+    assert result.iloc[0]["ymax"] == 40
+    assert result.iloc[0]["label"] == 0
+    assert result.iloc[0]["score"] == 1.0
+
+
+def test_format_geometry_empty():
+    """Test formatting empty predictions"""
+    # Create empty prediction
+    prediction = {
+        "boxes": torch.tensor([]),
+        "labels": torch.tensor([]),
+        "scores": torch.tensor([])
+    }
+    
+    # Format geometry
+    result = utilities.format_geometry(prediction)
+    
+    # Check output format
+    assert result is None
+
+def test_format_geometry_multi_class():
+    """Test formatting predictions with multiple classes"""
+    # Create predictions with different classes
+    prediction = {
+        "boxes": torch.tensor([[10, 20, 30, 40], [50, 60, 70, 80]]),
+        "labels": torch.tensor([0, 1]),  # Different classes
+        "scores": torch.tensor([0.9, 0.8])
+    }
+    
+    # Format geometry
+    result = utilities.format_geometry(prediction)
+    
+    # Check output format
+    assert isinstance(result, pd.DataFrame)
+    assert list(result.columns) == ["xmin", "ymin", "xmax", "ymax", "label", "score", "geometry"]
+    assert len(result) == 2
+    
+    # Check values
+    assert result.iloc[0]["label"] == 0
+    assert result.iloc[1]["label"] == 1
+
+
+def test_format_geometry_invalid_input():
+    """Test handling of invalid input"""
+    # Test with missing required keys
+    prediction = {
+        "boxes": torch.tensor([[10, 20, 30, 40]]),
+        "labels": torch.tensor([0])
+        # Missing scores
+    }
+    
+    with pytest.raises(KeyError):
+        utilities.format_geometry(prediction)
+    
+    # Test with mismatched lengths
+    prediction = {
+        "boxes": torch.tensor([[10, 20, 30, 40], [50, 60, 70, 80]]),
+        "labels": torch.tensor([0]),  # Only one label
+        "scores": torch.tensor([0.9, 0.8])
+    }
+    
+    with pytest.raises(ValueError):
+        utilities.format_geometry(prediction)
+
+
+def test_format_geometry_with_geometry_column():
+    """Test formatting predictions and adding geometry column"""
+    # Create predictions
+    prediction = {
+        "boxes": torch.tensor([[10, 20, 30, 40], [50, 60, 70, 80]]),
+        "labels": torch.tensor([0, 0]),
+        "scores": torch.tensor([0.9, 0.8])
+    }
+    
+    # Format geometry
+    result = utilities.format_geometry(prediction)
+    
+    # Check output format
+    assert isinstance(result, pd.DataFrame)
+    assert "geometry" in result.columns
+    assert len(result) == 2
+    
+    # Check geometry values
+    assert isinstance(result.iloc[0]["geometry"], geometry.Polygon)
+    assert result.iloc[0]["geometry"].bounds == (10, 20, 30, 40)
+
+
+def test_format_geometry_point():
+    """Test formatting point predictions"""
+    # Create a mock prediction with point coordinates
+    prediction = {
+        "points": torch.tensor([[10, 20], [50, 60]]),
+        "labels": torch.tensor([0, 0]),
+        "scores": torch.tensor([0.9, 0.8])
+    }
+    
+    # Format geometry should raise ValueError since point predictions are not supported
+    with pytest.raises(ValueError, match="Point predictions are not yet supported for formatting"):
+        utilities.format_geometry(prediction, geom_type="point")
+
+
+def test_format_geometry_polygon():
+    """Test formatting polygon predictions"""
+    # Create a mock prediction with polygon coordinates
+    prediction = {
+        "polygon": torch.tensor([[[10, 20], [30, 20], [30, 40], [10, 40], [10, 20]],
+                               [[50, 60], [70, 60], [70, 80], [50, 80], [50, 60]]]),
+        "labels": torch.tensor([0, 0]),
+        "scores": torch.tensor([0.9, 0.8])
+    }
+    
+    # Format geometry should raise ValueError since polygon predictions are not supported
+    with pytest.raises(ValueError, match="Polygon predictions are not yet supported for formatting"):
+        utilities.format_geometry(prediction, geom_type="polygon")
\ No newline at end of file
diff --git a/tests/test_visualize.py b/tests/test_visualize.py
index 6d47be5d2..7104a576d 100644
--- a/tests/test_visualize.py
+++ b/tests/test_visualize.py
@@ -1,6 +1,6 @@
 # Test visualize
 from deepforest import visualize
-from deepforest.utilities import read_file
+from deepforest import utilities
 from deepforest import get_data
 import os
 import pytest
@@ -11,59 +11,6 @@
 from shapely import geometry
 import cv2
 
-
-def test_format_boxes(m):
-    ds = m.val_dataloader()
-    batch = next(iter(ds))
-    paths, images, targets = batch
-    for path, image, target in zip(paths, images, targets):
-        target_df = visualize.format_boxes(target, scores=False)
-        assert list(target_df.columns.values) == ["xmin", "ymin", "xmax", "ymax", "label"]
-        assert not target_df.empty
-
-
-# Test different color labels
-@pytest.mark.parametrize("label", [0, 1, 20])
-def test_plot_predictions(m, tmpdir, label):
-    ds = m.val_dataloader()
-    batch = next(iter(ds))
-    paths, images, targets = batch
-    for path, image, target in zip(paths, images, targets):
-        target_df = visualize.format_boxes(target, scores=False)
-        target_df["image_path"] = path
-        image = np.array(image)[:, :, ::-1]
-        image = np.rollaxis(image, 0, 3)
-        target_df.label = label
-        image = visualize.plot_predictions(image, target_df)
-
-        assert image.dtype == "uint8"
-
-
-def test_plot_prediction_dataframe(m, tmpdir):
-    ds = m.val_dataloader()
-    batch = next(iter(ds))
-    paths, images, targets = batch
-    for path, image, target in zip(paths, images, targets):
-        target_df = visualize.format_boxes(target, scores=False)
-        target_df["image_path"] = path
-        filenames = visualize.plot_prediction_dataframe(
-            df=target_df, savedir=tmpdir, root_dir=m.config.validation.root_dir)
-
-    assert all([os.path.exists(x) for x in filenames])
-
-
-def test_plot_predictions_and_targets(m, tmpdir):
-    ds = m.val_dataloader()
-    batch = next(iter(ds))
-    paths, images, targets = batch
-    m.model.eval()
-    predictions = m.model(images)
-    for path, image, target, prediction in zip(paths, images, targets, predictions):
-        image = image.permute(1, 2, 0)
-        save_figure_path = visualize.plot_prediction_and_targets(
-            image, prediction, target, image_name=os.path.basename(path), savedir=tmpdir)
-        assert os.path.exists(save_figure_path)
-
 def test_predict_image_and_plot(m, tmpdir):
     sample_image_path = get_data("OSBS_029.png")
     results = m.predict_image(path=sample_image_path)
@@ -78,10 +25,9 @@ def test_predict_tile_and_plot(m, tmpdir):
 
     assert os.path.exists(os.path.join(tmpdir, "OSBS_029.png"))
 
-
 def test_multi_class_plot( tmpdir):
     results = pd.read_csv(get_data("testfile_multi.csv"))
-    results = read_file(results, root_dir=os.path.dirname(get_data("SOAP_061.png")))
+    results = utilities.read_file(results, root_dir=os.path.dirname(get_data("SOAP_061.png")))
     visualize.plot_results(results, savedir=tmpdir)
 
     assert os.path.exists(os.path.join(tmpdir, "SOAP_061.png"))
@@ -98,7 +44,7 @@ def test_convert_to_sv_format():
         'image_path': ['image1.jpg', 'image1.jpg']
     }
     df = pd.DataFrame(data)
-    df = read_file(df, root_dir=os.path.dirname(get_data("OSBS_029.tif")))
+    df = utilities.read_file(df, root_dir=os.path.dirname(get_data("OSBS_029.tif")))
 
     # Call the function
     detections = visualize.convert_to_sv_format(df)
@@ -126,7 +72,7 @@ def test_plot_annotations(tmpdir):
         "score": [0.9, 0.8]
     }
     df = pd.DataFrame(data)
-    gdf = read_file(df, root_dir=os.path.dirname(get_data("OSBS_029.tif")))
+    gdf = utilities.read_file(df, root_dir=os.path.dirname(get_data("OSBS_029.tif")))
     gdf.root_dir = os.path.dirname(get_data("OSBS_029.tif"))
 
     # Call the function
@@ -148,7 +94,7 @@ def test_plot_results_box(m, tmpdir):
         "score": [0.9, 0.8]
     }
     df = pd.DataFrame(data)
-    gdf = read_file(df, root_dir=os.path.dirname(get_data("OSBS_029.tif")))
+    gdf = utilities.read_file(df, root_dir=os.path.dirname(get_data("OSBS_029.tif")))
     gdf.root_dir = os.path.dirname(get_data("OSBS_029.tif"))
 
     # Call the function
@@ -167,7 +113,7 @@ def test_plot_results_point(m, tmpdir):
         'score': [0.9, 0.8]
     }
     df = pd.DataFrame(data)
-    gdf = read_file(df, root_dir=os.path.dirname(get_data("OSBS_029.tif")))
+    gdf = utilities.read_file(df, root_dir=os.path.dirname(get_data("OSBS_029.tif")))
     gdf.root_dir = os.path.dirname(get_data("OSBS_029.tif"))
 
     # Call the function
@@ -185,7 +131,7 @@ def test_plot_results_point_no_label(m, tmpdir):
         'image_path': [get_data("OSBS_029.tif"), get_data("OSBS_029.tif")],
     }
     df = pd.DataFrame(data)
-    gdf = read_file(df, root_dir=os.path.dirname(get_data("OSBS_029.tif")))
+    gdf = utilities.read_file(df, root_dir=os.path.dirname(get_data("OSBS_029.tif")))
     gdf.root_dir = os.path.dirname(get_data("OSBS_029.tif"))
 
     # Call the function
diff --git a/www/dataloader-strategy.png b/www/dataloader-strategy.png
new file mode 100644
index 000000000..e61816c07
Binary files /dev/null and b/www/dataloader-strategy.png differ