Spike: Investigate convertion/processing optimisations #36

jrdh · 2022-11-18T13:05:18Z

Now that we have a stable platform which is working well, it might be worth investigating if it is worth adding optimisations for particular file formats.

Currently, the server works like so when processing a typical IIIF image request:

Load the original image into memory from wherever it resides
Convert the source image to JPEG format and save in the configured source directory
Load the JPEG source image from the source directory into memory
Apply crops, scaling etc according to the IIIF request options
Write the resulting image to the configured cache directory
Send the resulting image back to the client who made the request

There are two caching layers as part of these steps:

before processing a request, the cache directory is checked. If the request has already been processed and we have the result cached then the response will be served immediately and therefore we skip straight to step 6.
when a request is made for a part of an image that hasn't been processed yet (e.g. a different crop or other modification) then the new result will be processed using the cached source image if it's available. In this scenario, we can skip to step 3 immediately which avoids redownloading the original image.

The reason we convert all original images to JPEG and store them is so that we can write a common processing flow from step 3 onwards regardless of where the image came from or what format it is in. Additionally, JPEG was chosen as a) this is the likely output format requested by the user (it's the IIIF default) and b) the jpegtran library was identified as particularly quick at certain operations which were likely to take up a lot of IIIF requests (scaling and cropping primarily).

It may be that for certain formats it is beneficial to adjust some of this processing and that is essentially the point of this issue - what opportunities are there for optimisations? For example, if the original image is a TIFF with pyramid/tiled images in it, can we make use of those derivatives for faster processing time? Are there fast TIFF libraries out there that could mean we write a TIFF processing pipeline to mirror the JPEG one that already exists?

The text was updated successfully, but these errors were encountered:

jrdh · 2022-11-18T13:09:44Z

Additionally, sounds like a lot of the herbarium sheet images are pyramidal TIFFs, would be good to find an example to test with.

jrdh added this to Science Platform Development Tasks Nov 18, 2022

alycejenni self-assigned this Feb 24, 2023

alycejenni moved this to Todo in Science Platform Development Tasks Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: Investigate convertion/processing optimisations #36

Spike: Investigate convertion/processing optimisations #36

jrdh commented Nov 18, 2022 •

edited

Loading

jrdh commented Nov 18, 2022

Spike: Investigate convertion/processing optimisations #36

Spike: Investigate convertion/processing optimisations #36

Comments

jrdh commented Nov 18, 2022 • edited Loading

jrdh commented Nov 18, 2022

jrdh commented Nov 18, 2022 •

edited

Loading