What happens if a VCS tag is deleted/modified? #113

arschles · 2018-03-26T22:29:49Z

Our current design is approximately the following:

Olympus holds module/version metadata
Athens proxy holds cached module/version metadata and source

As we've written elsewhere, if a proxy doesn't have code cached locally, it'll go to the VCS to download it. that works without a hitch if the VCS version (on Github, this would be a git tag) content never changes. If someone does change it, though, some proxies will end up with different code than others. Consider this scenario:

Client executes vgo get github.com/arschles/[email protected]
Proxy fetches version list from Olympus, does not find v0.1.0, returns 404
vgo downloads v0.1.0 from the VCS
Olympus starts background metadata-fetch job, finds v0.1.0
Client executes vgo get github.com/arschles/[email protected]
Proxy finds v0.1.0 in Olympus, has a cache miss, returns 404
v0.1.0 is deleted on Github
Proxy starts background job to download github.com/arschles/[email protected], doesn't find it

The resulting state is that Olympus says that v0.1.0 exists, but no proxy has it. Ordering can vary here, as can the operations on the tag (edit contents vs. delete). If you edited the tag in step 7 (instead of deleting it), the proxy would end up downloading code for v0.1.0 that was different from what it was when Olympus fetched its metadata.

Because metadata fetch and code fetch are not atomic, we effectively allow tags to be mutable for the window between the two fetches.

We're planning to say that nobody should modify their tags in-place, but we're not protecting against that case

A Solution

As I wrote above, we need to ensure that metadata and code fetch are atomic. To completely prevent the inconsistencies, we'd need to ensure atomicity across all proxies, public and private. That's a big task that we can't achieve because anybody is allowed to run their own proxy. This proposal takes an incremental step to make metadata and code fetch atomic in the the public proxy & Olympus infrastructure.

I believe the best way to make code and metadata fetch atomic is to put them in the same place. That means the "public proxy" and "central repository" would become the same entity, and code and metadata would be stored in the same place. I'm calling this combined system Zeus here to disambiguate from the proxy and Olympus.

If we made this change, here's what would happen on a cache miss:

vgo get github.com/arschles/[email protected]
Zeus returns 404
vgo fetches from the VCS
Zeus starts a background job to fetch that version and its code, atomically
vgo get github.com/arschles/[email protected]
Zeus returns 200 and vgo downloads code

This change fixes the following two cases:

When proxies could not download code for versions that Olympus had in its list
When proxies may later download code that was different than it was when Olympus fetched the version list for it

This change does not fix the following cases:

When the vgo get caller has different code that Zeus has
- We can ameliorate by:
  - Crawling Github and other VCSs (as godoc.org does) to seed the Zeus cache
  - Allowing and encouraging people to upload their tags to Zeus (this solution is out of scope)
When other proxies have different code than the vgo get caller and Zeus
- We can ameliorate by allowing other proxies to upstream to olympus for some or all packages.
When a VCS tag is deleted or modified and Zeus has the old data
- We are already planning to say "don't change your tags," so I think we can add "we're not going to respect your changes" and we're covered here

cc/ @michalpristas @bketelsen

The text was updated successfully, but these errors were encountered:

michalpristas · 2018-03-27T06:13:14Z

isnt CDN exactly for that? we can have backed storage connected to CDN. talking MS Stack it would be Azure Storage with Azure CDN i'm sure amazon and google have the same thing.
Flow would be like this:
proxy -- cache miss --> Olympus (MS)
Olympus registers cache miss in it's append log and prepares zip in his backing storage
Olympus syncs metadata about package as well as CDN address so other vendors( Google, Amazon) can store it in their storage.

Or we can have only one CDN to avoid data duplication.
So proxy notifies Olympus about cache miss
olympus prepares storage connected to CDN
olympus communicates cache miss with other instances

bketelsen · 2018-03-27T13:27:40Z

On 03/27, Michal Pristas wrote: isnt CDN exactly for that? we can have backed storage connected to CDN. talking MS Stack it would be Azure Storage with Azure CDN i'm sure amazon and google have the same thing. Flow would be like this: proxy -- _cache miss_ --> Olympus (MS) Olympus registers cache miss in it's append log and prepares zip in his backing storage Olympus syncs metadata about package as well as CDN address so other vendors( Google, Amazon) can store it in their storage. Or we can have only one CDN to avoid data duplication. So proxy notifies Olympus about cache miss olympus prepares storage connected to CDN olympus communicates cache miss with other instances -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #113 (comment)

I think this makes sense. Olympus needs to be the system of record everywhere.

michalpristas · 2018-03-27T13:36:29Z

i'm going to paste here message from chat with @arschles let's discuss it here as it is all related

but i have different flow in mind
client -> proxy (cache miss) -- 404 --> client
client -- get --> VCS

proxy -- cache miss --> Olympus
Olympus, fetches code and stores into CDN

client -- get --> proxy (still cache miss as no synchronization is performed at all)
proxy asks olympus (hey do you know this package)
olympus: for sure it's here cdn.org/owner/package/version
proxy -- meta with redirect -- >client
proxy stores pkg locally

client -- get --> proxy (do you have this package)
proxy (of course my little fella, here you go) -- serves directly

pros:

this would simplify things like gossiping, leader election....and development of Olympus overall (no order consistency just relaxed/eventual content consistency)
proxy would be populated only with packages it was asked for - no ambiguous packages, lower memory consumption (ideal for private proxies, which would not require configuration for 'blacklisting' unwanted repos)

cons:

proxy would be populated after second request for a package which might be a blocker

arschles · 2018-03-27T17:10:32Z

Olympus registers cache miss in it's append log and prepares zip in his backing storage

I agree with this 😄. if we put the zip in Olympus, I’m happy

For some reason I was under the impression that after our last call, we were only going to store metadata but not code in Olympus.

Or we can have only one CDN to avoid data duplication.

We should use multiple CDNs because cloud agnostic was our goal

olympus communicates cache miss with other instances

This is the hard part. We need to decide what kind of read consistency we want across all the Olympus-es (you wrote about that below 😄) and how we’re going to achieve it. There's a WIP document about this here

this would simplify things like gossiping, leader election....and development of Olympus overall (no order consistency just relaxed/eventual content consistency)

It would solve consistency in the proxies for public packages, but I’m not clear how it would allow us to be eventually consistent in the Olympus deployment. Can you expand on that?

proxy would be populated only with packages it was asked for - no ambiguous packages, lower memory consumption

That sounds good to me

ideal for private proxies, which would not require configuration for 'blacklisting' unwanted repos

Can you expand on how we don’t need ‘blacklisting’ in the private proxies? Seems like we still don’t want those proxies to allow serving unwanted modules

proxy would be populated after second request for a package which might be a blocker

I don’t think it’s a blocker because on the first request for a public package, it will redirect to Olympus, which is the source of truth anyway, and is append only by default

michalpristas · 2018-03-27T17:44:50Z

this would simplify things like gossiping, leader election....and development of Olympus overall (no order consistency just relaxed/eventual content consistency)

It would solve consistency in the proxies for public packages, but I’m not clear how it would allow us to be eventually consistent in the Olympus deployment. Can you expand on that?

What i had in mind is that when you work with proxy as 'know it all' and download all modules with UUID after XX you need to solve order on and across Olympus instances.

When proxy is 'cache of previously missed packages' it does not need to know order, it just need Olympus to serve information reliably (all Os need to have same set of information but order how these are processed does not matter).

Can you expand on how we don’t need ‘blacklisting’ in the private proxies? Seems like we still don’t want those proxies to allow serving unwanted modules

You are right I was prematurely optimistic after coffee

But looks like we have an agreement how things should work.

arschles · 2018-03-27T18:36:17Z

What i had in mind is that when you work with proxy as 'know it all' and download all modules with UUID after XX you need to solve order on and across Olympus instances.

So are you thinking that this UUID has ordering implied in it? (we've talked about using a vector clock here to express causal ordering in the event log)

all Os need to have same set of information but order how these are processed does not matter

seems like here is where we need to do work in the multi-cloud Olympus deployment to ensure read consistency. do I have it wrong? I think I'm missing something...

I just want to avoid this:

client 1 at t0: vgo get github.com/arschles/[email protected] -> cache miss in Olympus
client 1 at t1:vgo get github.com/arschles/[email protected] -> cache hit in Olympus
client 2 at t2: vgo get github.com/arschles/[email protected] -> cache miss in Olympus

... can you help me understand how what you wrote above prevents that case?

You are right I was prematurely optimistic after coffee

I completely understand that phenomenon 😄

michalpristas · 2018-03-27T19:08:56Z

nono you're not missing anything. this is where work needs to be done

arschles · 2018-03-27T20:13:15Z

alrighty 😄 - we will talk offline about the consistency problem

michalpristas · 2018-09-25T14:13:25Z

i think we can close as this is mostly related to cross Olympus consistency

arschles · 2018-10-11T21:08:18Z

@michalpristas 👍

Background for other readers: as of #772, we're not going to try and build a registry for the time being

arschles added proxy labels Mar 26, 2018

michalpristas added proxy Work to do on the module proxy and removed proxy labels Mar 27, 2018

arschles closed this as completed Oct 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happens if a VCS tag is deleted/modified? #113

What happens if a VCS tag is deleted/modified? #113

arschles commented Mar 26, 2018 •

edited

Loading

michalpristas commented Mar 27, 2018

bketelsen commented Mar 27, 2018 via email

michalpristas commented Mar 27, 2018

arschles commented Mar 27, 2018

michalpristas commented Mar 27, 2018

arschles commented Mar 27, 2018 •

edited

Loading

michalpristas commented Mar 27, 2018

arschles commented Mar 27, 2018

michalpristas commented Sep 25, 2018

arschles commented Oct 11, 2018 •

edited

Loading

What happens if a VCS tag is deleted/modified? #113

What happens if a VCS tag is deleted/modified? #113

Comments

arschles commented Mar 26, 2018 • edited Loading

A Solution

michalpristas commented Mar 27, 2018

bketelsen commented Mar 27, 2018 via email

michalpristas commented Mar 27, 2018

arschles commented Mar 27, 2018

michalpristas commented Mar 27, 2018

arschles commented Mar 27, 2018 • edited Loading

michalpristas commented Mar 27, 2018

arschles commented Mar 27, 2018

michalpristas commented Sep 25, 2018

arschles commented Oct 11, 2018 • edited Loading

arschles commented Mar 26, 2018 •

edited

Loading

arschles commented Mar 27, 2018 •

edited

Loading

arschles commented Oct 11, 2018 •

edited

Loading