Skip to content

Commit

Permalink
Merge branch 'main' into arm-adjust-hue-fix
Browse files Browse the repository at this point in the history
  • Loading branch information
NicolasHug authored Oct 11, 2024
2 parents b6d16e6 + ed55b03 commit 784bd76
Show file tree
Hide file tree
Showing 23 changed files with 327 additions and 183 deletions.
1 change: 1 addition & 0 deletions .github/workflows/build-wheels-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ jobs:
os: windows
test-infra-repository: pytorch/test-infra
test-infra-ref: main
with-xpu: enable
build:
needs: generate-matrix
strategy:
Expand Down
24 changes: 0 additions & 24 deletions .github/workflows/update-viablestrict.yml

This file was deleted.

1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ versions.
| `torch` | `torchvision` | Python |
| ------------------ | ------------------ | ------------------- |
| `main` / `nightly` | `main` / `nightly` | `>=3.9`, `<=3.12` |
| `2.5` | `0.20` | `>=3.9`, `<=3.12` |
| `2.4` | `0.19` | `>=3.8`, `<=3.12` |
| `2.3` | `0.18` | `>=3.8`, `<=3.12` |
| `2.2` | `0.17` | `>=3.8`, `<=3.11` |
Expand Down
113 changes: 67 additions & 46 deletions docs/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,46 +3,98 @@ Decoding / Encoding images and videos

.. currentmodule:: torchvision.io

The :mod:`torchvision.io` package provides functions for performing IO
operations. They are currently specific to reading and writing images and
videos.
The :mod:`torchvision.io` module provides utilities for decoding and encoding
images and videos.

Images
------
Image Decoding
--------------

Torchvision currently supports decoding JPEG, PNG, WEBP and GIF images. JPEG
decoding can also be done on CUDA GPUs.

For encoding, JPEG (cpu and CUDA) and PNG are supported.
The main entry point is the :func:`~torchvision.io.decode_image` function, which
you can use as an alternative to ``PIL.Image.open()``. It will decode images
straight into image Tensors, thus saving you the conversion and allowing you to
run transforms/preproc natively on tensors.

.. code::
from torchvision.io import decode_image
img = decode_image("path_to_image", mode="RGB")
img.dtype # torch.uint8
# Or
raw_encoded_bytes = ... # read encoded bytes from your file system
img = decode_image(raw_encoded_bytes, mode="RGB")
:func:`~torchvision.io.decode_image` will automatically detect the image format,
and call the corresponding decoder. You can also use the lower-level
format-specific decoders which can be more powerful, e.g. if you want to
encode/decode JPEGs on CUDA.

.. autosummary::
:toctree: generated/
:template: function.rst

read_image
decode_image
encode_jpeg
decode_jpeg
write_jpeg
encode_png
decode_gif
decode_webp
encode_png
decode_png
write_png
read_file
write_file

.. autosummary::
:toctree: generated/
:template: class.rst

ImageReadMode

Obsolete decoding function:

.. autosummary::
:toctree: generated/
:template: function.rst

read_image

Image Encoding
--------------

For encoding, JPEG (cpu and CUDA) and PNG are supported.


.. autosummary::
:toctree: generated/
:template: function.rst

encode_jpeg
write_jpeg
encode_png
write_png

IO operations
-------------

.. autosummary::
:toctree: generated/
:template: function.rst

read_file
write_file

Video
-----

.. warning::

Torchvision supports video decoding through different APIs listed below,
some of which are still in BETA stage. In the near future, we intend to
centralize PyTorch's video decoding capabilities within the `torchcodec
<https://github.com/pytorch/torchcodec>`_ project. We encourage you to try
it out and share your feedback, as the torchvision video decoders will
eventually be deprecated.

.. autosummary::
:toctree: generated/
:template: function.rst
Expand All @@ -52,45 +104,14 @@ Video
write_video


Fine-grained video API
^^^^^^^^^^^^^^^^^^^^^^
**Fine-grained video API**

In addition to the :mod:`read_video` function, we provide a high-performance
lower-level API for more fine-grained control compared to the :mod:`read_video` function.
It does all this whilst fully supporting torchscript.

.. betastatus:: fine-grained video API

.. autosummary::
:toctree: generated/
:template: class.rst

VideoReader


Example of inspecting a video:

.. code:: python
import torchvision
video_path = "path to a test video"
# Constructor allocates memory and a threaded decoder
# instance per video. At the moment it takes two arguments:
# path to the video file, and a wanted stream.
reader = torchvision.io.VideoReader(video_path, "video")
# The information about the video can be retrieved using the
# `get_metadata()` method. It returns a dictionary for every stream, with
# duration and other relevant metadata (often frame rate)
reader_md = reader.get_metadata()
# metadata is structured as a dict of dicts with following structure
# {"stream_type": {"attribute": [attribute per stream]}}
#
# following would print out the list of frame rates for every present video stream
print(reader_md["video"]["fps"])
# we explicitly select the stream we would like to operate on. In
# the constructor we select a default video stream, but
# in practice, we can set whichever stream we would like
video.set_current_stream("video:0")
16 changes: 8 additions & 8 deletions docs/source/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -226,10 +226,10 @@ Here is an example of how to use the pre-trained image classification models:

.. code:: python
from torchvision.io import read_image
from torchvision.io import decode_image
from torchvision.models import resnet50, ResNet50_Weights
img = read_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")
img = decode_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")
# Step 1: Initialize model with the best available weights
weights = ResNet50_Weights.DEFAULT
Expand Down Expand Up @@ -283,10 +283,10 @@ Here is an example of how to use the pre-trained quantized image classification

.. code:: python
from torchvision.io import read_image
from torchvision.io import decode_image
from torchvision.models.quantization import resnet50, ResNet50_QuantizedWeights
img = read_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")
img = decode_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")
# Step 1: Initialize model with the best available weights
weights = ResNet50_QuantizedWeights.DEFAULT
Expand Down Expand Up @@ -339,11 +339,11 @@ Here is an example of how to use the pre-trained semantic segmentation models:

.. code:: python
from torchvision.io.image import read_image
from torchvision.io.image import decode_image
from torchvision.models.segmentation import fcn_resnet50, FCN_ResNet50_Weights
from torchvision.transforms.functional import to_pil_image
img = read_image("gallery/assets/dog1.jpg")
img = decode_image("gallery/assets/dog1.jpg")
# Step 1: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
Expand Down Expand Up @@ -411,12 +411,12 @@ Here is an example of how to use the pre-trained object detection models:
.. code:: python
from torchvision.io.image import read_image
from torchvision.io.image import decode_image
from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, FasterRCNN_ResNet50_FPN_V2_Weights
from torchvision.utils import draw_bounding_boxes
from torchvision.transforms.functional import to_pil_image
img = read_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")
img = decode_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")
# Step 1: Initialize model with the best available weights
weights = FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT
Expand Down
10 changes: 5 additions & 5 deletions gallery/others/plot_repurposing_annotations.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,12 @@ def show(imgs):
# We will take images and masks from the `PenFudan Dataset <https://www.cis.upenn.edu/~jshi/ped_html/>`_.


from torchvision.io import read_image
from torchvision.io import decode_image

img_path = os.path.join(ASSETS_DIRECTORY, "FudanPed00054.png")
mask_path = os.path.join(ASSETS_DIRECTORY, "FudanPed00054_mask.png")
img = read_image(img_path)
mask = read_image(mask_path)
img = decode_image(img_path)
mask = decode_image(mask_path)


# %%
Expand Down Expand Up @@ -181,8 +181,8 @@ def __getitem__(self, idx):
img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])

img = read_image(img_path)
mask = read_image(mask_path)
img = decode_image(img_path)
mask = decode_image(mask_path)

img = F.convert_image_dtype(img, dtype=torch.float)
mask = F.convert_image_dtype(mask, dtype=torch.float)
Expand Down
6 changes: 3 additions & 3 deletions gallery/others/plot_scripted_tensor_transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
import torch.nn as nn

import torchvision.transforms as v1
from torchvision.io import read_image
from torchvision.io import decode_image

plt.rcParams["savefig.bbox"] = 'tight'
torch.manual_seed(1)
Expand All @@ -39,8 +39,8 @@
# :class:`torch.nn.Sequential` instead of
# :class:`~torchvision.transforms.v2.Compose`:

dog1 = read_image(str(ASSETS_PATH / 'dog1.jpg'))
dog2 = read_image(str(ASSETS_PATH / 'dog2.jpg'))
dog1 = decode_image(str(ASSETS_PATH / 'dog1.jpg'))
dog2 = decode_image(str(ASSETS_PATH / 'dog2.jpg'))

transforms = torch.nn.Sequential(
v1.RandomCrop(224),
Expand Down
10 changes: 5 additions & 5 deletions gallery/others/plot_visualization_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ def show(imgs):
# image of dtype ``uint8`` as input.

from torchvision.utils import make_grid
from torchvision.io import read_image
from torchvision.io import decode_image
from pathlib import Path

dog1_int = read_image(str(Path('../assets') / 'dog1.jpg'))
dog2_int = read_image(str(Path('../assets') / 'dog2.jpg'))
dog1_int = decode_image(str(Path('../assets') / 'dog1.jpg'))
dog2_int = decode_image(str(Path('../assets') / 'dog2.jpg'))
dog_list = [dog1_int, dog2_int]

grid = make_grid(dog_list)
Expand Down Expand Up @@ -362,9 +362,9 @@ def show(imgs):
#

from torchvision.models.detection import keypointrcnn_resnet50_fpn, KeypointRCNN_ResNet50_FPN_Weights
from torchvision.io import read_image
from torchvision.io import decode_image

person_int = read_image(str(Path("../assets") / "person1.jpg"))
person_int = decode_image(str(Path("../assets") / "person1.jpg"))

weights = KeypointRCNN_ResNet50_FPN_Weights.DEFAULT
transforms = weights.transforms()
Expand Down
4 changes: 2 additions & 2 deletions gallery/transforms/plot_transforms_getting_started.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,14 @@
plt.rcParams["savefig.bbox"] = 'tight'

from torchvision.transforms import v2
from torchvision.io import read_image
from torchvision.io import decode_image

torch.manual_seed(1)

# If you're trying to run that on Colab, you can download the assets and the
# helpers from https://github.com/pytorch/vision/tree/main/gallery/
from helpers import plot
img = read_image(str(Path('../assets') / 'astronaut.jpg'))
img = decode_image(str(Path('../assets') / 'astronaut.jpg'))
print(f"{type(img) = }, {img.dtype = }, {img.shape = }")

# %%
Expand Down
2 changes: 2 additions & 0 deletions packaging/windows/internal/vc_env_helper.bat
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ if "%VSDEVCMD_ARGS%" == "" (

@echo on

if "%CU_VERSION%" == "xpu" call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

set DISTUTILS_USE_SDK=1

set args=%1
Expand Down
Loading

0 comments on commit 784bd76

Please sign in to comment.