-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Byte-range request performance problems with large files #3840
Comments
@TheWizz Ideally to serve this kinds of requests, I think we would want the resource to be in a memory mapped file, so that it can have true random access. So you may need to adjust your resource cache parameters to allow this.... although if you have a lot of GB files, perhaps you will run out of address space for even memory mapped files. Tell us a bit more about the actual sizes and number of such resources, together with how many concurrent requests and if they are for the same resource. We'll have a look at what can be done, as it has been a while since we checked out range requests... certainly not since the switch to PathResources. |
Thanks fort your reply! Well, since we're building sort of a general purpose server, with special focus on media files, we don't really know how many or how large files our customers will serve. Hence, we need a performant and scalable general solution. Using a plain SEEK operation may not be as performant as a memory mapped file, but should be far better than the current "read-2048-bytes-at-a-time-to-skip-ahead" method, I would think. My plan at this point is to see if I can make it use the FileResource instead, even though it's deprecated. Hopefully, such a solution may solve my immediate problem, giving some time to come up with a more contemporary solution. What would be the easiest way to "plug in" the FileResource instead of the PathResources (assuming that might be made to work for serving plain files)? -JM |
JM/@TheWizz , @Override
public InputStream getInputStream() throws IOException
{
/* Mimic behavior from old FileResource class and its
* usage of java.io.FileInputStream(File) which will trigger
* an IOException on construction if the path is a directory
*/
if (Files.isDirectory(path))
throw new IOException(path + " is a directory");
try
{
File file = getFile();
if (file != null)
return new FileInputStream(file);
}
catch (Exception e)
{
LOG.ignore(e);
}
return Files.newInputStream(path, StandardOpenOption.READ);
} or perhaps simpler as the alternative probably is not needed. We will put something like this in the next release. |
Reverted to use FileInputStream Signed-off-by: Greg Wilkins <[email protected]>
Thanks! I can confirm that your fix solves our problem. |
Note also that we have opened https://bugs.openjdk.java.net/browse/JDK-8227080 |
Reverted to use FileInputStream Signed-off-by: Greg Wilkins <[email protected]>
Reopening ... Why are We have Resource.getReadableByteChannel(). Oddly the PathResource.getReableByteChannel() is using a wonky implementation. If this were changed to ... public ReadableByteChannel getReadableByteChannel() throws IOException
{
return Files.newByteChannel(path, StandardOpenOption.READ, StandardOpenOption.SPARSE);
} Then you get a |
Here's some average timing for the various techniques across 2 different JVMs on 3 different OS's.
The |
@joakime, so the plan is to revert to using Should the I'm assuming you will work on this, so I'm assigning it to you. Assign it back to me if you want me to work on it. This would be good to fix at least in part in 9.4.20. |
To keep the discussion in one place ... Added testcase from discussion at Issue #3906 ... Results in the following exception ...
|
SPARSE has very little meaning on static files on Linux / OSX. However, SPARSE is used on NTFS and that FileSystem definitely takes advantage of it, optimizing for reads more efficiently. Either way, I'll retest without SPARSE too. |
I think the reason InputStream is selected is due to the bugs that PR #3889 fixes for large static resources. Prior to the fixes in PR #3889, if the static file was over Integer.MAX_VALUE in size, then InputStream was always selected when serving the whole file. |
OK, so I'm confused now. If we revert to |
Should we be calling |
IMO, We should never be calling We should, however, make sure that our implementation of |
So if we can use So sure, it would be best to avoid calling Isn't the best thing to do in the short term to re-implement getInputStream along the lines I'm suggesting so we will get encapsulation and fast ranges.... this can be quickly released in 9.4.20. We can then rework the range code to be channel based rather than input stream based at a more suitable pace for that larger change. P.S. I'm still confused as to what |
Java 14 has the InputStream.skip() slowness fixed per https://bugs.openjdk.java.net/browse/JDK-8227080
Our range code is rather messy, opening up a new InputStream against the same Resource for each supplied range on the same request. (only needed if the next range is before the current one). And the range code attempts to work around cached resources? why? that's not the job of the range code, but the the HttpContent, or the Resource. If that work around wasn't there, then the Range code just delegates to The original goal that I've started working on is, is if |
I'm not so sure the cached resource "hack" comments are accurate. They are conditional on Plus it is not so much a hack, but an optimisation. We don't want repetitive calls to Either way, this code is going to need very careful consideration, review and testing. I can't see it being changed in time for a 9.4.20 release. Thus I think we should still do a quick fix of check for a null |
Yet, that's what the current implementation seems to do already, opening a new InputStream for each range requested. See line 817
I considered this, but the mime multipart output of names / boundaries / etc on the HttpServletResponse would mean that Resource needs to know HTTP specifics (or at least mime specifics), right? (Out of curiosity, is this multipart behavior also present on HTTP/2 requests for byte ranges?) Would almost need a |
+ Reverting toFile().getInputStream() on PathResource + Adding RangeWriter concept for managing open resource across multiple range writes + RangeWriter implementation delegates to HttpContent behaviors Lookup is : - Direct Buffer - Indirect Buffer - ReadableByteChannel (as SeekableByteChannel) - InputStream + Adding unit tests for all RangeWriter implementation to ensure that they behave the same way everywhere. + Making ResourceService use new RangeWriter implementation + Existing DefaultServletRangeTest still works as-is Signed-off-by: Joakim Erdfelt <[email protected]>
+ break out if progress isn't made, loop if not enough progress is made Signed-off-by: Joakim Erdfelt <[email protected]>
+ Construction of PathResource now tests if path belongs to the Default FileSystem or not. This important info for later actions against the PathResource that would need to know the File object for the Path object. Non-Default FileSystem == null Default FileSystem == Path.toFile() Signed-off-by: Joakim Erdfelt <[email protected]>
+ Reset progress on any positive skip value + Throw IOException(EOF) for any negative skip value Signed-off-by: Joakim Erdfelt <[email protected]>
+ Using techniques from SeekableByteChannelRangeWriter with variant for -1 count parameter Signed-off-by: Joakim Erdfelt <[email protected]>
+ Removing unnecessary variables (per PR review) Signed-off-by: Joakim Erdfelt <[email protected]>
Signed-off-by: Joakim Erdfelt <[email protected]>
…yterange Issue #3840 Static resource byte-range support performance
Results of tests of byte-range support on Jetty 9.2.28, Jetty 9.4.19, and Jetty 9.4.x HEAD The ZipFs (non-default FileSystem error 500's have been mentioned in #3906 (comment) and will be addressed in Issue #3906 with another PR) Single Range
Multiple Ranges
|
Fixed in |
Following Jetty fix jetty/jetty.project#3845 due to issue jetty/jetty.project#3840
I use org.eclipse.jetty.servlet.DefaultServlet to serve video files, which may be large (several GB). Many browsers use byte-range requests to play video. When serving a large file in this way, there are significant performance problems caused by the PathResource used to read the file. Its getInputStream method calls Files.newInputStream which calls newInputStream through the FileSystemProvider, which calls Channels.newInputStream, finally returning a sun.nio.ch.ChannelInputStream. This works fine when reading the stream from its beginning. However, to read byte ranges, ChannelInputStream uses InputStream#skip to "skip ahead" in the stream. This is done by repeatedly reading data into a 2048 byte large buffer in a loop. Needless to say, when skipping a few hundred MB into a 2 GB video file, this is very inefficient compared to doing a "seek" operation on the file.
I'm using Jetty 9.4.11. As far as I can tell, nothing has changed in more recent versions here. Is this a known issue with Jetty? If so, is there any work-around or tweak that can be applied to make byte-range requests use a "seek" operation instead of a sequential "read-2048-bytes-at-a-time-to-skip-ahead" loop? Presumably, the (deprecated) FileResource behaves better here (hard to tell since its skip method is implemented in native code, but presumably it does a seek operation on the underlying file).
-JM
The text was updated successfully, but these errors were encountered: