Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for true streaming #33

Closed
hey-red opened this issue May 16, 2019 · 27 comments
Closed

Add support for true streaming #33

hey-red opened this issue May 16, 2019 · 27 comments
Labels
enhancement New feature or request
Milestone

Comments

@hey-red
Copy link

hey-red commented May 16, 2019

Is it possible with libvips?

kleisauke added a commit that referenced this issue May 19, 2019
@kleisauke
Copy link
Owner

True streaming is not yet supported within libvips. There has been quite a bit of talk of adding this, and there's a branch that adds this, but it's never been merged for various reasons. See:
lovell/sharp#30 (comment).

In the meantime, I've added support for loading and saving from and to streams (commit e23b10b):

If you want to test this, you can use the nightly version of NetVips. Add the https://ci.appveyor.com/nuget/net-vips feed in the <packageSources> section of your NuGet.config:

<packageSources>
  <add key="netvips-nightly" value="https://ci.appveyor.com/nuget/net-vips" />
</packageSources>

And update NetVips to 1.1.0.134-develop. When true streaming is supported within libvips it will use the native methods instead (for e.g. VipsStream, VipsStreamInput and VipsStreamOutput).

@kleisauke kleisauke added the enhancement New feature or request label May 19, 2019
@hey-red
Copy link
Author

hey-red commented May 19, 2019

Thx. I have similar approach with temp file.
I'm interensting how Magick.NET do this. I think imagemagick provides stream API itself(maybe with buffer or temp file?)

@kleisauke
Copy link
Owner

I'm interensting how Magick.NET do this. I think imagemagick provides stream API itself(maybe with buffer or temp file?)

I had a quick look at their code to see how they handle streams. It looks like they use temporary files:
MagickImage_ReadStream
CustomStreamToImage - AcquireUniqueFileResource
CustomStreamToImage - Read from stream write to temporary file

MagickImage_WriteStream
ImageToCustomStream - AcquireUniqueFileResource
ImageToCustomStream - Read from temporary file write to stream

So no "true streaming" either.

kleisauke added a commit that referenced this issue Jun 6, 2019
@kleisauke kleisauke added this to the 1.1.0 milestone Jun 6, 2019
@kleisauke
Copy link
Owner

NetVips v1.1.0 is now available.

@jcupitt
Copy link
Contributor

jcupitt commented Oct 12, 2019

libvips has a PR now to add true streaming:

libvips/libvips#1443

Feedback etc. very welcome.

@kleisauke
Copy link
Owner

The true-streaming branch adds support for this feature which will be addressed in the next release of NetVips.

@jcupitt The test results look good! The only problem I found is loading a WebP image with a seekable stream that is not backed by a file descriptor. It look like vips_streami_map only supports pipes and seekable descriptors.

@kleisauke kleisauke reopened this Nov 30, 2019
@kleisauke kleisauke added the blocked-upstream-dependency Upstream dependency needs to be updated label Nov 30, 2019
@kleisauke kleisauke changed the title Streams support? Add support for true streaming Nov 30, 2019
@jcupitt
Copy link
Contributor

jcupitt commented Nov 30, 2019

I'll have a look, thanks Kleis.

Do you have benchmark results you could share? I've not seen any numbers yet :(

jcupitt added a commit to libvips/libvips that referenced this issue Nov 30, 2019
The logic was a bit wonky. Thanks Kleis.

See kleisauke/net-vips#33 (comment)
@jcupitt
Copy link
Contributor

jcupitt commented Nov 30, 2019

You're right, the logic for map was a bit messed up. It should be better now.

@jcupitt
Copy link
Contributor

jcupitt commented Dec 1, 2019

Update: I've checked php-vips quickly and it seems OK. The add-streams branch of ruby-vips adds support for streams. ruby-vips master now works with the new metadata change restrictions. I've opened an issue on image_processing (the gem behind libvips on rails etc.) telling them about streams. Phew.

@kleisauke
Copy link
Owner

Thanks John! I needed to apply this patch to make it work for seekable streams:

Details
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Kleis Auke Wolthuizen <[email protected]>
Date: Sun, 1 Dec 2019 13:00:00 +0100
Subject: [PATCH] Ensure vips_streami_map works for seekable streams


diff --git a/libvips/iofuncs/streami.c b/libvips/iofuncs/streami.c
index 1111111..2222222 100644
--- a/libvips/iofuncs/streami.c
+++ b/libvips/iofuncs/streami.c
@@ -825,7 +825,7 @@ vips_streami_is_mappable( VipsStreami *streami )
 	return( streami->data ||
 		VIPS_STREAM( streami )->filename ||
 		(!streami->is_pipe && 
-		 VIPS_STREAM( streami )->descriptor) );
+		 VIPS_STREAM( streami )->descriptor != -1) );
 }
 
 /**
@@ -860,6 +860,16 @@ vips_streami_map( VipsStreami *streami, size_t *length_out )
 				return( NULL );
 		}
 		else {
+			/* Seekable streams must be converted.
+			 */
+			if( !streami->is_pipe ) {
+				if( vips_streami_rewind( streami ) )
+					return( NULL );
+
+				streami->header_bytes = g_byte_array_new();
+				streami->is_pipe = TRUE;
+			}
+
 			if( vips_streami_pipe_to_memory( streami ) )
 				return( NULL );
 		}

I also discovered a small bug on Windows (couldn't reproduce this on Linux, weird). See:

$ vips.exe copy x.jpg x.v
VipsForeignSave: "x.v" is not a known stream format
$ echo Exit Code is %errorlevel%
Exit Code is 1
$ if exist x.v echo x.v found
x.v found

I guess this call should be clamped with vips_error_freeze() and vips_error_thaw() to hide any errors (just like vips_image_new_from_file). vips_image_new_from_buffer needs this perhaps too.

By the way,
Why does vips_image_new_from_file, vips_image_new_from_buffer vips_image_write_to_file and vips_image_write_to_buffer first try to use the streaming API? All stream loaders and savers (*_[load/save]_*_stream) listed here also have a buffer *_[load/save]_*_buffer and file loader *_[load/save]_*_file along with them. Plus the old file and buffer loaders uses the streaming API internal (I think).

Could we maybe do the opposite in vips_image_new_from_stream() and vips_image_write_to_stream()? For example:

For the loader:

  • Sniff with the streaming API, if that fails;
    • Sniff with the old buffer loader, if that succeeds;
      • mmap the stream (vips_streami_map), if that succeeds;
        • Use the buffer loader.

For the saver:

  • Search format with the streaming API, if that fails;
    • Search format with the old buffer saver, if that succeeds;
      • Write image to buffer (vips_image_write_to_buffer), if that succeeds;
        • Write the buffer to the stream (in chunks).

This ensures that the streaming API can also utilize gifload, tiffsave, pdfload, heif[load/save] magick[load/save], etc. For images.weserv.nl this would also be useful, since I like to use the streaming API, but don't want to deprecate .gif (input and output), .tiff (output), .pdf (input) and .heic (input).

@kleisauke
Copy link
Owner

Regarding benchmarks; I'll do some benchmarking over the next week (it might be interesting to benchmark with S3 / network streams as well).

@kleisauke kleisauke modified the milestones: 1.1.0, 1.2.0 Jan 6, 2020
kleisauke added a commit that referenced this issue Jan 7, 2020
@kleisauke
Copy link
Owner

A pre-release version of NetVips v1.2.0-rc1 and NetVips.Native v8.9.0-rc4 (which contains the pre-compiled libvips 8.9.0-rc4 binaries for Linux, macOS and Windows) is now available on NuGet. This release adds support for true streaming. See the blogpost and the tutorial for more information.

Any feedback is more than welcome!

@jcupitt
Copy link
Contributor

jcupitt commented Jan 7, 2020

That looks great! The syntax for the custom handler looks really nice.

Progress handling looks good too. Do you expose signal_connect lower down the stack?

@hey-red
Copy link
Author

hey-red commented Jan 7, 2020

I have some problems with reading(detecting) gif.

using var input = File.OpenRead("test.gif");
using var output = File.OpenWrite("test.jpg");
using var image = Image.NewFromStream(input, access: Enums.Access.Sequential);

image.WriteToStream(output, ".jpg");

with random gifs and get following exception

NetVips.VipsException : unable to load from source
VipsForeignLoad: source is not in a known format

Windows 10, NetVips 1.2.0-rc-1, NetVips.Native v8.9.0-rc4.

Ex gif
test

@jcupitt
Copy link
Contributor

jcupitt commented Jan 8, 2020

The C version of new_from_stream has quite a bit of logic now to fall back to the older loaders for ones which have not yet been rewritten for the new stream API (eg. GIF):

https://github.com/libvips/libvips/blob/master/libvips/iofuncs/image.c#L2240

But it's inside a C func with varargs, so it's awkward to use from language bindings. All bindings will have to implement their equivalent of that.

Perhaps we should have a VipsOperation called new_from_source that implemented this. Then all bindings would get this code for free.

@jcupitt
Copy link
Contributor

jcupitt commented Jan 8, 2020

Oh hang one, sorry, that won't work either, since the arguments to new_from_source change with each loader.

@kleisauke
Copy link
Owner

Do you expose signal_connect lower down the stack?

It's available in the public API, if that's what you mean. The logic is quite simple and similar to pyvips:
https://github.com/kleisauke/net-vips/blob/master/src/NetVips/GObject.cs#L54-L90

It uses a custom marshaller for EvalDelegate and it keeps a reference to the callbacks that are passed (to prevent them being GCed). I noticed that pyvips only allows signals that are backend by a marshaller.
https://github.com/libvips/pyvips/blob/master/pyvips/gobject.py#L167

NetVips doesn't have this requirement, allowing you to do this (for example).

public delegate void PostClose(IntPtr objectPtr, IntPtr userDataPtr);

static void OnPostClose(IntPtr objectPtr, IntPtr userDataPtr)
{
    Console.WriteLine($"postclose: userData = {userDataPtr}");
}

static void Main(string[] args)
{
    var im = Image.NewFromFile("lichtenstein.jpg", access: "sequential");
    im.SignalConnect("postclose", (PostClose)OnPostClose, (IntPtr)42);
    im.Dispose(); // g_object_unref()
}
postclose: userData = 42

But it's inside a C func with varargs, so it's awkward to use from language bindings. All bindings will have to implement their equivalent of that.

I'll see if I can implement the equivalent in the binding. I think this will be easy for the load functions (simply call vips_source_map_blob and use that to call *load_buffer) but a bit more difficult for the save function (because it needs to write the buffer in chunks).

Do you know what loader is the "biggest sniffer" for determining the fall back loader with vips_foreign_find_load_buffer? I guess it was svgload but that one is already streaming based:
https://github.com/libvips/libvips/blob/b2334d8d620ba3e042b19478cabf21742caf3817/libvips/foreign/svgload.c#L367

@kleisauke
Copy link
Owner

I thought about an awful case. Consider this stream that doesn't support seeking (e.g. network streams or pipes):

GIF89a[1GB minus 7 bytes of crafted data]

Might use 999 MiB of memory (when using a fallback mechanism) since it maps the whole stream to memory because vips_foreign_find_load_buffer detect it as GIF image (untested, perhaps giflib ignores huge files) and it keeps below the vips_pipe_read_limit (which is 1 GiB).

Perhaps we should make vips_pipe_read_limit configurable and also apply this limit to vips_source_length?

@jcupitt
Copy link
Contributor

jcupitt commented Jan 10, 2020

Ooops hehe. Going too fast.

I think we'll need an 8.9.1 pretty soon, openslideload is broken too :(

@jcupitt
Copy link
Contributor

jcupitt commented Jan 10, 2020

Biggest sniffer: magickload, by far, since GetImageMagick() will (in turn) run all of its sniffers. I think the others are mostly fairly fast.

openslideload can be a bit slow since it has to load quite a bit of TIFF header to be sure. tiffload can be a little slow too since it must load the first directory to make certain this isn't a DNG etc. file.

Yes, perhaps pipe read limit needs to be a parameter. It should be easy to add to VipsSource I suppose.

@jcupitt
Copy link
Contributor

jcupitt commented Jan 10, 2020

I think webp load is the only one in 8.9 that has this problem.

I don't know what a sane limit on webp size would be. I'd say 10MB for the pixel data, since they can only be 16k x 16k pixels, for now anyway. Perhaps another 10 for crazy XMP metadata blocks.

svgload has an unlimited option, maybe webpload needs that as well.

jcupitt added a commit to libvips/libvips that referenced this issue Jan 10, 2020
webpload must force sources into memory to read them, but huge objects
on pipes could fill memory. We had a 1GB limit, but that's probably too
large.

This patch adds a configurable limit and wepload sets it to 20MB.

See kleisauke/net-vips#33 (comment)
@jcupitt
Copy link
Contributor

jcupitt commented Jan 10, 2020

I made a branch adding a pipe read limit property, does that look reasonable? We'd probably need unlimited on webpload as well.

kleisauke added a commit that referenced this issue Jan 13, 2020
@kleisauke
Copy link
Owner

I made a branch adding a fallback for NewFromSource (which is also used by NewFromStream), see commit c745fae. These 2 testcases seems to work now:

// file-based loader fallback
using (var input = Source.NewFromFile(Helper.GifFile))
{
    var img = Image.NewFromSource(input, access: Enums.Access.Sequential);
    GifValid(img);
}

// buffer-based loader fallback
using (var input = File.OpenRead(Helper.GifFile))
{
    var img = Image.NewFromStream(input, access: Enums.Access.Sequential);
    GifValid(img);
}

Biggest sniffer: magickload, by far, since GetImageMagick() will (in turn) run all of its sniffers. I think the others are mostly fairly fast.

Ah I see. My idea was to first pass a chunk of the stream to vips_foreign_find_load_buffer for determining the loader. If that fails, the whole stream will not be immediately mapped to a blob (to reduce memory usage). This will obviously not work because some loaders needs the entire buffer.

I don't know what a sane limit on webp size would be. I'd say 10MB for the pixel data, since they can only be 16k x 16k pixels, for now anyway. Perhaps another 10 for crazy XMP metadata blocks.

How about animated WebP images? I think some of them could exceed 10MiB.

svgload has an unlimited option, maybe webpload needs that as well.

If the limit could be configured, then an unlimited flag seems a bit superfluous. For example, you can also set pipe_read_limit to -1.

I made a branch adding a pipe read limit property, does that look reasonable?

Looks good! Doesn't it make more sense to add a global function (like the vips_cache_set_max() functions)? I don't know how useful it is to set it per image / stream.

@jcupitt
Copy link
Contributor

jcupitt commented Jan 13, 2020

I was trying to reuse the svgload behaviour, but you're right, perhaps a global option is better.

@kleisauke
Copy link
Owner

NetVips v1.2.0 and NetVips.Native v8.9.1 is now available.

The stream-fallback branch has been merged within this release, so it should fallback to the old file- and buffer loaders when the loader does not support streaming (for e.g. gifload, pdfload, etc.). This fallback mechanism will be removed in a next release when all loaders are streaming based, possibly within libvips v8.10(?).

@jcupitt Should I create a new issue for configuring the pipe_read_limit property?

@jcupitt
Copy link
Contributor

jcupitt commented Jan 30, 2020

Good idea, it'll get forgotten otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants