Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support local cache server/daemon #6690

Closed
NullVoxPopuli opened this issue Dec 4, 2023 · 7 comments
Closed

Support local cache server/daemon #6690

NullVoxPopuli opened this issue Dec 4, 2023 · 7 comments

Comments

@NullVoxPopuli
Copy link

Which project is this feature idea for?

Turborepo

Describe the feature you'd like to request

Turborepo already has a daemon, so I think it kinda makes sense to support a local cache server hooking in to that daemon.

I'm currently using https://github.com/ducktors/turborepo-remote-cache
Which works great, but at the moment, for uploading to S3, we can't have devs manage their own S3 keys.

Describe the solution you'd like

Ideally, we want to

  • have C.I. with only write access to s3
  • developers only have read access to s3

this is easily managed with AWS IAM -- but we need to be able to run a cache server locally as a daemon.

Additionally, we need a way to force the daemon to be booted before turbo runs anything.
there are situations right now where turbo will run without the daemon, and as a monorepo grows, we really need that to run.

Describe alternatives you've considered

afaict there are two alternatives

@NullVoxPopuli NullVoxPopuli added needs: triage New issues get this label. Remove it after triage story labels Dec 4, 2023
@mehulkar
Copy link
Contributor

mehulkar commented Dec 4, 2023

This seems like a mix of a few different asks:

  • Read/write permissions to a remote cache. we added a tiny bit of support for that here: feat: add read only remote cache #6624, with a similar use case
  • local cache server. I don't understand what this would do?
  • "hooking local cache server into daemon". I'm also not sure what this means. Are you wanting the daemon to do cache uplaods in the background? The current role of the daemon is purely for performance. It does file watching, so we can do things like: opt out of cache restoration or update the package graph in the background
  • force the daemon to be booted before turbo runs. What problem are you trying to solve with this also? The daemon is intended to be an implementation detail. Do you have performance issues?

(If I'm understanding correctly, and these are all separate things, it would be best to file separate issues, but let's start by getting on the same page first!)

@mehulkar mehulkar added needs: author input and removed needs: triage New issues get this label. Remove it after triage labels Dec 4, 2023
@NullVoxPopuli
Copy link
Author

NullVoxPopuli commented Dec 4, 2023

we added a tiny bit of support for that here

does each developer have an api key somehow? how is access managed?

local cache server. I don't understand what this would do?

communicate with a storage provider, as mentioned in https://ducktors.github.io/turborepo-remote-cache/supported-storage-providers.html

"hooking local cache server into daemon". I'm also not sure what this means.

the demo PR is a good example.

Are you wanting the daemon to do cache uplaods in the background?

a daemon, yeah. because ultimately the existence of a local cache server which communicates with remote storage should be transparent to developers.

force the daemon to be booted before turbo runs. What problem are you trying to solve with this also?

See the demo PR -- it is what uploads and downloads cache from the remote storage.

The daemon is intended to be an implementation detail.

yes, exactly, end-users shouldn't be aware of it.

Do you have performance issues?

only usability and security issues (part of what I'm trying to address by having each developer manage their keys to the "remote storage", having their own s3 keys).

I didn't mention it specifically, but we're not using Vercel hosted cache. That could be where a disconnect could come from in this convo.

Some big companies prefer to manage their own infra because its way cheaper. (S3 for storing turbo's cache assets, for example -- but with S3, we need a strategy to make it so only CI has write access, and developers only have read access -- which is the crux of the whole problem -- it's totally possible there is an easier way to achieve that goal)

If I'm understanding correctly, and these are all separate things,

not quite <3

@NicholasLYang
Copy link
Contributor

From reading your PR, I'm a little confused by "manage the daemon myself (gross, lots of code)". The PR doesn't seem to be a huge amount of code? Of course it's a proof of concept but I'm a little confused as to where the code will be needed. That could just be my naive estimation.

As for alternatives, I do think you could accomplish this by hosting your own remote cache and creating authorization rules that prevent some users from writing to the cache. Granted, that would be "gross, lots of code" and perhaps some hosting burden.

We have discussed the possibility of adding authorization to the cache. We can definitely discuss it more and return with some decisions on our end.

To answer the actual question:

Additionally, we need a way to force the daemon to be booted before turbo runs anything.
there are situations right now where turbo will run without the daemon, and as a monorepo grows, we really need that to run.

You can run turbo daemon start to force it to run. You could preface commands with turbo daemon start to ensure that the daemon runs, or do something a little trickier with a git hook. But it wouldn't make sense to start the daemon and immediately run, because the whole conceit of the daemon is that it does work before the command is called.

For adding hooks into the daemon lifecycle, we can discuss that at our next team meeting.

Hope that was helpful! TL;DR: We'll discuss your requests and get back to you.

@NicholasLYang NicholasLYang added the needs: team input Filter for core team meetings label Dec 5, 2023
@gsoltis gsoltis removed the needs: team input Filter for core team meetings label Feb 12, 2024
@NicholasLYang
Copy link
Contributor

Hi @NullVoxPopuli, we discussed your proposal. Here's our thoughts:

  • We're a little confused by the requirement that the cache server needs to run locally. Could this be solved with a hosted cache server that can issue read-only tokens?
  • Otherwise, we could look into some sort of lifecycle hooks for Turborepo's daemon. However, as it currently stands, Turborepo's daemon is meant as a performance optimization and therefore is not guaranteed to run all the time.
  • Therefore, it might make sense for you to run the local cache server as a separate daemon completely independent from Turborepo's daemon. In addition, turbo uses a separate daemon for each repository, which may not make sense for a remote cache (where you'd probably want one locally run cache server per machine and not per repository).

@NullVoxPopuli
Copy link
Author

NullVoxPopuli commented Feb 12, 2024

Could this be solved with a hosted cache server that can issue read-only tokens?

ye!
I've been working under the assumption that a cache server could only ever have one token, which is why I've been prototyping with a different sort-of-proxy server (ie: working around the issue)

If a cache server could work with any number of tokens / with per-token access (read-only by default, opt one token in to write), that would be STELLAR

@NicholasLYang
Copy link
Contributor

Perhaps I'm mistaken, but I think the tokens are really up to your cache server's implementation. If you want, you could store some users as having read-only permissions and others as having read and write. From there, the cache server can read the token and return either enabled or disabled for the caching status. Now there is some limitations in that we currently don't support the cache server returning a caching status that indicates it's read-only (we can look into that!), but if you use the --remote-cache-read-only flag, the client won't even try to write in the first place.

Basically, here's the flow:

  • When turbo makes request to insert an item to the cache, read the token and map it to the user
  • Check if the user has permission to insert to cache (probably with a field in your database)
  • If yes, insert, otherwise return error
  • For CI, run with --remote-cache-read-only so turbo doesn't try to insert to cache in the first place.

@NullVoxPopuli
Copy link
Author

yeah, that's about what @fox1t was saying here: ducktors/turborepo-remote-cache#199 (comment)

I'll close this since it seems everyone is in agreement on design for how this feature should work 🎉

thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants