Skip to content

Spike: Investigate convertion/processing optimisations #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jrdh opened this issue Nov 18, 2022 · 1 comment
Open

Spike: Investigate convertion/processing optimisations #36

jrdh opened this issue Nov 18, 2022 · 1 comment
Assignees

Comments

@jrdh
Copy link
Member

jrdh commented Nov 18, 2022

Now that we have a stable platform which is working well, it might be worth investigating if it is worth adding optimisations for particular file formats.

Currently, the server works like so when processing a typical IIIF image request:

  1. Load the original image into memory from wherever it resides
  2. Convert the source image to JPEG format and save in the configured source directory
  3. Load the JPEG source image from the source directory into memory
  4. Apply crops, scaling etc according to the IIIF request options
  5. Write the resulting image to the configured cache directory
  6. Send the resulting image back to the client who made the request

There are two caching layers as part of these steps:

  • before processing a request, the cache directory is checked. If the request has already been processed and we have the result cached then the response will be served immediately and therefore we skip straight to step 6.
  • when a request is made for a part of an image that hasn't been processed yet (e.g. a different crop or other modification) then the new result will be processed using the cached source image if it's available. In this scenario, we can skip to step 3 immediately which avoids redownloading the original image.

The reason we convert all original images to JPEG and store them is so that we can write a common processing flow from step 3 onwards regardless of where the image came from or what format it is in. Additionally, JPEG was chosen as a) this is the likely output format requested by the user (it's the IIIF default) and b) the jpegtran library was identified as particularly quick at certain operations which were likely to take up a lot of IIIF requests (scaling and cropping primarily).

It may be that for certain formats it is beneficial to adjust some of this processing and that is essentially the point of this issue - what opportunities are there for optimisations? For example, if the original image is a TIFF with pyramid/tiled images in it, can we make use of those derivatives for faster processing time? Are there fast TIFF libraries out there that could mean we write a TIFF processing pipeline to mirror the JPEG one that already exists?

@jrdh
Copy link
Member Author

jrdh commented Nov 18, 2022

Additionally, sounds like a lot of the herbarium sheet images are pyramidal TIFFs, would be good to find an example to test with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants