Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files larger than 2GiB are hoarded but undownloadable #818

Closed
1 task done
viravera opened this issue Jan 3, 2025 · 0 comments
Closed
1 task done

Files larger than 2GiB are hoarded but undownloadable #818

viravera opened this issue Jan 3, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@viravera
Copy link

viravera commented Jan 3, 2025

Describe the Bug

Due to a limitation within Node.js, Hoarder fails to serve files larger than 2GiB. It downloads the files correctly and stores them to the relevant asset.bin, but upon accessing the hoarded video it fails to serve it with the following error:

 ⨯ RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3200564360) is greater than 2 GiB
    at readFileHandle (node:internal/fs/promises:537:11)
    at async Promise.all (index 0)
    at async g (/app/apps/web/.next/server/chunks/1479.js:1:1697)
    at async q (/app/apps/web/.next/server/app/api/assets/[assetId]/route.js:1:2345)
    at async /app/node_modules/next/dist/compiled/next-server/app-route.runtime.prod.js:6:36957
    at async eC.execute (/app/node_modules/next/dist/compiled/next-server/app-route.runtime.prod.js:6:27552)
    at async eC.handle (/app/node_modules/next/dist/compiled/next-server/app-route.runtime.prod.js:6:38291)
    at async doRender (/app/node_modules/next/dist/server/base-server.js:1352:42)
    at async cacheEntry.responseCache.get.routeKind (/app/node_modules/next/dist/server/base-server.js:1574:28)
    at async NextNodeServer.renderToResponseWithComponentsImpl (/app/node_modules/next/dist/server/base-server.js:1482:28) {
  code: 'ERR_FS_FILE_TOO_LARGE'
}

Steps to Reproduce

with CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE set to -1, download any sufficiently large file, such as this video: https://www.youtube.com/watch?v=qn6OqefuH08

Expected Behaviour

Hoarder should be able to serve the file without error.

Screenshots or Additional Context

For now, I could set the CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE to 2000 as a workaround, but then I would not be getting the maximum quality video so I'd prefer not to do that. Also, I should note that the current method of loading the whole file into memory causes Hoarder to use significantly more server RAM than necessary during the interval where it it serving files. The error comes from app/api/assets/[assetId]/route.ts, and I think the author is already aware of this because there is a relevant TODO mentioning that ideally the file shouldn't be loaded fully into memory in the first place.

The correct fix for this would be to stream the file rather than loading it into memory. Currently the asset is loaded into memory from the readAsset function in packages/shared/assetdb.ts. I'm not confident enough in my knowledge of JavaScript to do this myself, but I believe the correct solution would be something like I found in the link below:

vercel/next.js#15453 (comment)

I will replicate the relevant part here: First you'd need to add a small module to make a stream from a file like this:

// @see https://github.com/vercel/next.js/discussions/15453#discussioncomment-6226391
import fs from "fs"

/**
 * Took this syntax from https://github.com/MattMorgis/async-stream-generator
 * Didn't find proper documentation: how come you can iterate on a Node.js ReadableStream via "of" operator?
 * What's "for await"?
 */
async function* nodeStreamToIterator(stream: fs.ReadStream) {
    for await (const chunk of stream) {
        yield chunk;
    }
}

/**
 * Taken from Next.js doc
 * https://nextjs.org/docs/app/building-your-application/routing/router-handlers#streaming
 * Itself taken from mozilla doc
 * https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#convert_async_iterator_to_stream
 */
function iteratorToStream(iterator): ReadableStream {
    return new ReadableStream({
        async pull(controller) {
            const { value, done } = await iterator.next()

            if (done) {
                controller.close()
            } else {
                // conversion to Uint8Array is important here otherwise the stream is not readable
                // @see https://github.com/vercel/next.js/issues/38736
                controller.enqueue(new Uint8Array(value))
            }
        },
    })
}

export function streamFile(path: string): ReadableStream {
    const downloadStream = fs.createReadStream(path);
    const data: ReadableStream = iteratorToStream(nodeStreamToIterator(downloadStream))
    return data
}

Then you would replace the readFile calls in readAsset with streamFile calls.I believe the stream can be passed directly to the NextResponse constructor in route.ts for a full file serve, but for the ranged case of the GET request you might need to seek to the correct part of the stream first (I am not certain about this).

Device Details

Docker container on a Proxmox server

Exact Hoarder Version

v0.20.0

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem
@MohamedBassem MohamedBassem added the bug Something isn't working label Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants