Skip to content

Feature request: allow specifying the size of the chunks provided by File#createReadStream() #860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Dinduks opened this issue Sep 18, 2019 · 6 comments
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. needs more info This issue needs more information from the customer to proceed. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@Dinduks
Copy link

Dinduks commented Sep 18, 2019

Hello,

At the moment, the chunks returned by File#createReadStream() method are 16kb long.
I believe that's Node's default highWaterMark value.

The issue with this is that it consumes too much memory to read a big file.

For a 250mb file, that's 16k chunks, and it consumes 2GB of memory before the process crashes.

If you are okay with it, I'd like to submit a PR to specify the chunks size.

I will look into createReadStream() code tonight and see what I can do.

Meanwhile, any help or guidance is welcome.

Thanks,

Samy

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Sep 19, 2019
@callmehiphop callmehiphop added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Sep 19, 2019
@AVaksman
Copy link
Contributor

AVaksman commented Sep 19, 2019

@Dinduks Thank you for reporting this.
Could you please give more details, share your environment details

Environment details

  • OS:
  • Node.js version:
  • npm version:
  • @google-cloud/storage version:
  • related modules versions:

Please share a code snippet that causes the issue.

I've tried to reproduce in a couple of ways (400mb):

  • file.download into file/buffer
  • file.createReadStream into file/buffer
    but unsuccessful

@AVaksman AVaksman added the needs more info This issue needs more information from the customer to proceed. label Sep 20, 2019
@Dinduks
Copy link
Author

Dinduks commented Sep 20, 2019

@AVaksman Hi Alex.

Indeed, there is no issue with @google-cloud/storage itself.

The memory issues come from the thousands of operations inside my .on('data', () => {}) on the stream created by createReadStream.

I think being able to grab chunks bigger than 16kb would solve this problem by reducing the number of operations inside .on('data', () => {}).

Meanwhile, I'll try to optimize those operations.

I could provide the sample if you think it's relevant.

Also:

@jkwlui
Copy link
Member

jkwlui commented Sep 26, 2019

highWaterMark does not dictate the size of the chunk:

nodejs/node#8855

@Dinduks
Copy link
Author

Dinduks commented Sep 27, 2019

Ok @jkwlui.
Is there anything you suggest to increase the payload of the data provided by this library?

@jkwlui
Copy link
Member

jkwlui commented Sep 27, 2019

Would you mind providing us with a simple reproduction code that we can try and replicate? You mentioned using stream-buffers inside the event callback, so if you could provide an example of that it'd be great, thanks!

@stephenplusplus
Copy link
Contributor

@Dinduks please feel free to provide more information, but I believe this is an issue to solve at the application level as opposed to our level in the library. You could look into using something like throttle, or otherwise queueing the operations you're running in the .on() handler in steps. You can also use the start and end options when reading a file to only handle chunks of the file at once.

I'm going to close the issue, but I'm happy to re-open if there's more we should do on our side.

@google-cloud-label-sync google-cloud-label-sync bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. needs more info This issue needs more information from the customer to proceed. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

6 participants