Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache tarballs streamed through pacote #74

Closed
wants to merge 1 commit into from

Conversation

mjsir911
Copy link
Contributor

@mjsir911 mjsir911 commented Apr 1, 2021

Tell fetch not to cache it's downloaded tarballs, leave that to us

Not sure the implications of changing the return type of _istream / adding a cacache to the end of the pipe, too unfamiliar with streams

Fixes #73 & Fixes npm/cli#2160

Also see npm/cli#2976 for my first attempt & some discussions as to how we got here.

Also this behavior should probably be documented somewhere so it doesn't slip away again 😅

Tell `fetch` not to cache it's downloaded tarballs, leave that to us

Not sure the implications of changing the return type of _istream /
adding a `cacache` to the end of the pipe, too unfamiliar with streams

Fixes npm#73 & Fixes npm/cli#2160

Also see npm/cli#2976 for my first attempt & some discussions as to how
we got here.
`pacote:tarball:${this.from}`,
this.opts
)
return stream.on('error', er => istream.emit('error', er)).pipe(istream).pipe(cstream)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the pipe-ing is somewhat correct, would a MinipassPipeline be better here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this should almost certainly be return new Pipeline(stream, istream, cstream)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, wait, no. I think cstream might not be a Passthrough?

I'll pull this locally and poke at it a bit, sorry for the uninformed comment 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have a fix for this, will squash into the commit when landing.

Suggested change
return stream.on('error', er => istream.emit('error', er)).pipe(istream).pipe(cstream)
const pipeline = new Pipeline(stream, istream)
stream.pipe(cacache.put.stream(
this.opts.cache,
`pacote:tarball:${this.from}`,
this.opts
))
return pipeline

@mjsir911 mjsir911 changed the title Cache the tarballs in pacote Cache tarballs streamed through pacote Apr 1, 2021
@isaacs
Copy link
Contributor

isaacs commented Apr 13, 2021

So, in principle, I think this is probably the right direction. We still let mfh do the caching of everything other than tarballs, but explicitly cache tarballs in the single choke-point in the logical flow.

I'll pull locally and take a look now. At the very least, it will need tests to assert that file: and other tarballs are being cached as expected.

Thanks for updating your use case information over in npm/cli#2976 (comment) It's helpful to know the end goal, and that does make sense. We may want to update npm cache add <spec> to take multiple specs and add them all, I don't think that would be too terrible of a CLI change, and would likely make your bash script quite a bit faster.

Copy link
Contributor

@isaacs isaacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as "request changes", but I have the required changes ready to land, so don't sweat it. I'll open up a separate PR to review, and mark this as closed once that lands.

Thanks!

`pacote:tarball:${this.from}`,
this.opts
)
return stream.on('error', er => istream.emit('error', er)).pipe(istream).pipe(cstream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have a fix for this, will squash into the commit when landing.

Suggested change
return stream.on('error', er => istream.emit('error', er)).pipe(istream).pipe(cstream)
const pipeline = new Pipeline(stream, istream)
stream.pipe(cacache.put.stream(
this.opts.cache,
`pacote:tarball:${this.from}`,
this.opts
))
return pipeline

@@ -29,6 +29,7 @@ class RemoteFetcher extends Fetcher {
spec: this.spec,
integrity: this.integrity,
algorithms: [ this.pickIntegrityAlgorithm() ],
cache: null, // leave the caching of the tarball up to _istream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also prevent make-fetch-happen from seeing the tarballs that pacote has cached. The options seem to be:

  1. Accept extra requests being made when the tarball already exists in cache (not really acceptable)
  2. Let it write twice into the cache (cacache prevents data integrity problems by using atomic writes, but still not great, because it's double disk work when fetching a remote tarball)
  3. Only pipe into the cstream for fetchers that don't expect their fetches to be cached anyway (a little bit dirty, because it requires that the fetcher knows what mfh is going to do)
  4. Provide a cacheManager object that reads from cache, but only writes packuments to cache, not tarballs (probably the "cleanest" solution, but... oof, that's a lot of duplicated code).

I'm back to thinking (3) makes the most sense here.

isaacs pushed a commit that referenced this pull request Apr 13, 2021
Tell `fetch` not to cache it's downloaded tarballs, leave that to us

Also see npm/cli#2976 for my first attempt & some discussions as to how
we got here.

Fix: #73
Fix: npm/cli#2160
Close: npm/cli#2976

PR-URL: #74
Credit: @mjsir911
Close: #74
Reviewed-by: @isaacs

EDIT(@isaacs): Updated to add test, allow make-fetch-happen to use the
cache, pipe into cache properly, and avoid cache double-writes.
isaacs pushed a commit that referenced this pull request Apr 13, 2021
Tell `fetch` not to cache it's downloaded tarballs, leave that to us

Also see npm/cli#2976 for my first attempt & some discussions as to how
we got here.

Fix: #73
Fix: npm/cli#2160
Close: npm/cli#2976

PR-URL: #74
Credit: @mjsir911
Close: #74
Reviewed-by: @isaacs

EDIT(@isaacs): Updated to add test, allow make-fetch-happen to use the
cache, pipe into cache properly, and avoid cache double-writes.
@isaacs isaacs mentioned this pull request Apr 13, 2021
@isaacs
Copy link
Contributor

isaacs commented Apr 13, 2021

@mjsir911 Can you take a look at #75?

isaacs pushed a commit that referenced this pull request Apr 15, 2021
Tell `fetch` not to cache it's downloaded tarballs, leave that to us

Also see npm/cli#2976 for my first attempt & some discussions as to how
we got here.

Fix: #73
Fix: npm/cli#2160
Close: npm/cli#2976

PR-URL: #74
Credit: @mjsir911
Close: #74
Reviewed-by: @isaacs

EDIT(@isaacs): Updated to add test, allow make-fetch-happen to use the
cache, pipe into cache properly, and avoid cache double-writes.
@isaacs
Copy link
Contributor

isaacs commented Apr 15, 2021

Closed in favor of #75

@isaacs isaacs closed this Apr 15, 2021
isaacs pushed a commit that referenced this pull request Apr 15, 2021
Tell `fetch` not to cache it's downloaded tarballs, leave that to us

Also see npm/cli#2976 for my first attempt & some discussions as to how
we got here.

Fix: #73
Fix: npm/cli#2160
Close: npm/cli#2976

PR-URL: #74
Credit: @mjsir911
Close: #74
Reviewed-by: @isaacs

EDIT(@isaacs): Updated to add test, allow make-fetch-happen to use the
cache, pipe into cache properly, and avoid cache double-writes.
isaacs pushed a commit that referenced this pull request Apr 22, 2021
Tell `fetch` not to cache it's downloaded tarballs, leave that to us

Also see npm/cli#2976 for my first attempt & some discussions as to how
we got here.

Fix: #73
Fix: npm/cli#2160
Close: npm/cli#2976
Close: #74

EDIT(@isaacs): Updated to add test, allow make-fetch-happen to use the
cache, pipe into cache properly, and avoid cache double-writes.

PR-URL: #75
Credit: @mjsir911, @isaacs
Close: #75
Reviewed-by: @wraithgar, @mjsir911
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] pacote does not add to cache [BUG] cache add tarball file does not add to cache
2 participants