You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we support many model types, they all have different types of outputs produced during inference, including intermediate ones (e.g., centroids in the first stage of top-down models), as well as ones that are optional (e.g., confidence maps since they're expensive to transfer off the GPU).
Historically, organizing these tensors were hard to do, particularly in graph mode in TensorFlow, since autograph barely supported data containers other than tuples (e.g., dictionaries, dataclasses, or arbitrary python classes), in part because it's harder to traverse them and do shape and datatype tracing since Python is not strongly typed.
It would be great to take advantage of the fact that we are no longer constrained by these limitations and organize our output a bit better.
Right now, we're using dictionaries, which are helpful but a bit brittle. For example, here's the output of our stage-2 (centered instance) model:
Compare that to using a dataclass like this from ultralytics:
classResults(SimpleClass):
""" A class for storing and manipulating inference results. Attributes: orig_img (numpy.ndarray): Original image as a numpy array. orig_shape (tuple): Original image shape in (height, width) format. boxes (Boxes, optional): Object containing detection bounding boxes. masks (Masks, optional): Object containing detection masks. probs (Probs, optional): Object containing class probabilities for classification tasks. keypoints (Keypoints, optional): Object containing detected keypoints for each object. speed (dict): Dictionary of preprocess, inference, and postprocess speeds (ms/image). names (dict): Dictionary of class names. path (str): Path to the image file. Methods: update(boxes=None, masks=None, probs=None, obb=None): Updates object attributes with new detection results. cpu(): Returns a copy of the Results object with all tensors on CPU memory. numpy(): Returns a copy of the Results object with all tensors as numpy arrays. cuda(): Returns a copy of the Results object with all tensors on GPU memory. to(*args, **kwargs): Returns a copy of the Results object with tensors on a specified device and dtype. new(): Returns a new Results object with the same image, path, and names. plot(...): Plots detection results on an input image, returning an annotated image. show(): Show annotated results to screen. save(filename): Save annotated results to file. verbose(): Returns a log string for each task, detailing detections and classifications. save_txt(txt_file, save_conf=False): Saves detection results to a text file. save_crop(save_dir, file_name=Path("im.jpg")): Saves cropped detection images. tojson(normalize=False): Converts detection results to JSON format. """
These get produced as outputs from their high level APIs (~= Predictors in our repo):
defpredict(
self,
source: Union[str, Path, int, list, tuple, np.ndarray, torch.Tensor] =None,
stream: bool=False,
predictor=None,
**kwargs,
) ->list:
""" Performs predictions on the given image source using the YOLO model. This method facilitates the prediction process, allowing various configurations through keyword arguments. It supports predictions with custom predictors or the default predictor method. The method handles different types of image sources and can operate in a streaming mode. It also provides support for SAM-type models through 'prompts'. The method sets up a new predictor if not already present and updates its arguments with each call. It also issues a warning and uses default assets if the 'source' is not provided. The method determines if it is being called from the command line interface and adjusts its behavior accordingly, including setting defaults for confidence threshold and saving behavior. Args: source (str | int | PIL.Image | np.ndarray, optional): The source of the image for making predictions. Accepts various types, including file paths, URLs, PIL images, and numpy arrays. Defaults to ASSETS. stream (bool, optional): Treats the input source as a continuous stream for predictions. Defaults to False. predictor (BasePredictor, optional): An instance of a custom predictor class for making predictions. If None, the method uses a default predictor. Defaults to None. **kwargs (any): Additional keyword arguments for configuring the prediction process. These arguments allow for further customization of the prediction behavior. Returns: (List[ultralytics.engine.results.Results]): A list of prediction results, encapsulated in the Results class. Raises: AttributeError: If the predictor is not properly set up. """
For us, it would be handy to be able to store arbitrary tensors that should go together, like indexing information which should help with tasks like re-batching in top-down models.
It would also enable handy utilities like a to() method that moves all its containing tensors to the same device, for example:
defto(self, map_location):
"""Move instance to different device or change dtype. (See `torch.to` for more info). Args: map_location: Either the device or dtype for the instance to be moved. Returns: self: reference to the instance moved to correct device/dtype. """ifmap_locationisnotNoneandmap_location!="":
self._gt_track_id=self._gt_track_id.to(map_location)
self._pred_track_id=self._pred_track_id.to(map_location)
self._bbox=self._bbox.to(map_location)
self._crop=self._crop.to(map_location)
self._features=self._features.to(map_location)
self.device=map_locationreturnself
As we support many model types, they all have different types of outputs produced during inference, including intermediate ones (e.g., centroids in the first stage of top-down models), as well as ones that are optional (e.g., confidence maps since they're expensive to transfer off the GPU).
Historically, organizing these tensors were hard to do, particularly in graph mode in TensorFlow, since autograph barely supported data containers other than tuples (e.g., dictionaries, dataclasses, or arbitrary python classes), in part because it's harder to traverse them and do shape and datatype tracing since Python is not strongly typed.
It would be great to take advantage of the fact that we are no longer constrained by these limitations and organize our output a bit better.
Right now, we're using dictionaries, which are helpful but a bit brittle. For example, here's the output of our stage-2 (centered instance) model:
sleap-nn/sleap_nn/inference/inference.py
Lines 340 to 345 in f093ce2
Compare that to using a dataclass like this from ultralytics:
(source)
These get produced as outputs from their high level APIs (~=
Predictor
s in our repo):(source)
Which are kicked up from their
Predictor
s (~=InferenceModel
s in core SLEAP ~=LightningModule
s in oursleap_nn.inference
):(source)
For us, it would be handy to be able to store arbitrary tensors that should go together, like indexing information which should help with tasks like re-batching in top-down models.
It would also enable handy utilities like a
to()
method that moves all its containing tensors to the same device, for example:(source)
The text was updated successfully, but these errors were encountered: