Support local cache server/daemon #6690

NullVoxPopuli · 2023-12-04T17:05:54Z

Which project is this feature idea for?

Turborepo

Describe the feature you'd like to request

Turborepo already has a daemon, so I think it kinda makes sense to support a local cache server hooking in to that daemon.

I'm currently using https://github.com/ducktors/turborepo-remote-cache
Which works great, but at the moment, for uploading to S3, we can't have devs manage their own S3 keys.

Describe the solution you'd like

Ideally, we want to

have C.I. with only write access to s3
developers only have read access to s3

this is easily managed with AWS IAM -- but we need to be able to run a cache server locally as a daemon.

Additionally, we need a way to force the daemon to be booted before turbo runs anything.
there are situations right now where turbo will run without the daemon, and as a monorepo grows, we really need that to run.

Describe alternatives you've considered

afaict there are two alternatives

manage the daemon myself (gross, lots of code): Demo a local daemonized cache with turborepo NullVoxPopuli/limber#475
use nx 😬

The text was updated successfully, but these errors were encountered:

mehulkar · 2023-12-04T19:43:12Z

This seems like a mix of a few different asks:

Read/write permissions to a remote cache. we added a tiny bit of support for that here: feat: add read only remote cache #6624, with a similar use case
local cache server. I don't understand what this would do?
"hooking local cache server into daemon". I'm also not sure what this means. Are you wanting the daemon to do cache uplaods in the background? The current role of the daemon is purely for performance. It does file watching, so we can do things like: opt out of cache restoration or update the package graph in the background
force the daemon to be booted before turbo runs. What problem are you trying to solve with this also? The daemon is intended to be an implementation detail. Do you have performance issues?

(If I'm understanding correctly, and these are all separate things, it would be best to file separate issues, but let's start by getting on the same page first!)

NullVoxPopuli · 2023-12-04T21:44:15Z

we added a tiny bit of support for that here

does each developer have an api key somehow? how is access managed?

local cache server. I don't understand what this would do?

communicate with a storage provider, as mentioned in https://ducktors.github.io/turborepo-remote-cache/supported-storage-providers.html

"hooking local cache server into daemon". I'm also not sure what this means.

the demo PR is a good example.

Are you wanting the daemon to do cache uplaods in the background?

a daemon, yeah. because ultimately the existence of a local cache server which communicates with remote storage should be transparent to developers.

force the daemon to be booted before turbo runs. What problem are you trying to solve with this also?

See the demo PR -- it is what uploads and downloads cache from the remote storage.

The daemon is intended to be an implementation detail.

yes, exactly, end-users shouldn't be aware of it.

Do you have performance issues?

only usability and security issues (part of what I'm trying to address by having each developer manage their keys to the "remote storage", having their own s3 keys).

I didn't mention it specifically, but we're not using Vercel hosted cache. That could be where a disconnect could come from in this convo.

Some big companies prefer to manage their own infra because its way cheaper. (S3 for storing turbo's cache assets, for example -- but with S3, we need a strategy to make it so only CI has write access, and developers only have read access -- which is the crux of the whole problem -- it's totally possible there is an easier way to achieve that goal)

If I'm understanding correctly, and these are all separate things,

not quite <3

NicholasLYang · 2023-12-05T14:37:30Z

From reading your PR, I'm a little confused by "manage the daemon myself (gross, lots of code)". The PR doesn't seem to be a huge amount of code? Of course it's a proof of concept but I'm a little confused as to where the code will be needed. That could just be my naive estimation.

As for alternatives, I do think you could accomplish this by hosting your own remote cache and creating authorization rules that prevent some users from writing to the cache. Granted, that would be "gross, lots of code" and perhaps some hosting burden.

We have discussed the possibility of adding authorization to the cache. We can definitely discuss it more and return with some decisions on our end.

To answer the actual question:

Additionally, we need a way to force the daemon to be booted before turbo runs anything.
there are situations right now where turbo will run without the daemon, and as a monorepo grows, we really need that to run.

You can run turbo daemon start to force it to run. You could preface commands with turbo daemon start to ensure that the daemon runs, or do something a little trickier with a git hook. But it wouldn't make sense to start the daemon and immediately run, because the whole conceit of the daemon is that it does work before the command is called.

For adding hooks into the daemon lifecycle, we can discuss that at our next team meeting.

Hope that was helpful! TL;DR: We'll discuss your requests and get back to you.

NicholasLYang · 2024-02-12T18:33:22Z

Hi @NullVoxPopuli, we discussed your proposal. Here's our thoughts:

We're a little confused by the requirement that the cache server needs to run locally. Could this be solved with a hosted cache server that can issue read-only tokens?
Otherwise, we could look into some sort of lifecycle hooks for Turborepo's daemon. However, as it currently stands, Turborepo's daemon is meant as a performance optimization and therefore is not guaranteed to run all the time.
Therefore, it might make sense for you to run the local cache server as a separate daemon completely independent from Turborepo's daemon. In addition, turbo uses a separate daemon for each repository, which may not make sense for a remote cache (where you'd probably want one locally run cache server per machine and not per repository).

NullVoxPopuli · 2024-02-12T21:03:45Z

Could this be solved with a hosted cache server that can issue read-only tokens?

ye!
I've been working under the assumption that a cache server could only ever have one token, which is why I've been prototyping with a different sort-of-proxy server (ie: working around the issue)

If a cache server could work with any number of tokens / with per-token access (read-only by default, opt one token in to write), that would be STELLAR ✨

NicholasLYang · 2024-02-13T16:00:52Z

Perhaps I'm mistaken, but I think the tokens are really up to your cache server's implementation. If you want, you could store some users as having read-only permissions and others as having read and write. From there, the cache server can read the token and return either enabled or disabled for the caching status. Now there is some limitations in that we currently don't support the cache server returning a caching status that indicates it's read-only (we can look into that!), but if you use the --remote-cache-read-only flag, the client won't even try to write in the first place.

Basically, here's the flow:

When turbo makes request to insert an item to the cache, read the token and map it to the user
Check if the user has permission to insert to cache (probably with a field in your database)
If yes, insert, otherwise return error
For CI, run with --remote-cache-read-only so turbo doesn't try to insert to cache in the first place.

NullVoxPopuli · 2024-02-13T17:16:33Z

yeah, that's about what @fox1t was saying here: ducktors/turborepo-remote-cache#199 (comment)

I'll close this since it seems everyone is in agreement on design for how this feature should work 🎉

thanks!!

NullVoxPopuli added needs: triage New issues get this label. Remove it after triage story labels Dec 4, 2023

mehulkar added needs: author input and removed needs: triage New issues get this label. Remove it after triage labels Dec 4, 2023

NicholasLYang added the needs: team input Filter for core team meetings label Dec 5, 2023

mehulkar added the owned-by: turborepo label Dec 14, 2023

gsoltis assigned NicholasLYang Feb 12, 2024

gsoltis removed the needs: team input Filter for core team meetings label Feb 12, 2024

NullVoxPopuli closed this as completed Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support local cache server/daemon #6690

Support local cache server/daemon #6690

NullVoxPopuli commented Dec 4, 2023

mehulkar commented Dec 4, 2023

NullVoxPopuli commented Dec 4, 2023 •

edited

Loading

NicholasLYang commented Dec 5, 2023

NicholasLYang commented Feb 12, 2024

NullVoxPopuli commented Feb 12, 2024 •

edited

Loading

NicholasLYang commented Feb 13, 2024

NullVoxPopuli commented Feb 13, 2024

Support local cache server/daemon #6690

Support local cache server/daemon #6690

Comments

NullVoxPopuli commented Dec 4, 2023

Which project is this feature idea for?

Describe the feature you'd like to request

Describe the solution you'd like

Describe alternatives you've considered

mehulkar commented Dec 4, 2023

NullVoxPopuli commented Dec 4, 2023 • edited Loading

NicholasLYang commented Dec 5, 2023

NicholasLYang commented Feb 12, 2024

NullVoxPopuli commented Feb 12, 2024 • edited Loading

NicholasLYang commented Feb 13, 2024

NullVoxPopuli commented Feb 13, 2024

NullVoxPopuli commented Dec 4, 2023 •

edited

Loading

NullVoxPopuli commented Feb 12, 2024 •

edited

Loading