Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happens if a VCS tag is deleted/modified? #113

Closed
arschles opened this issue Mar 26, 2018 · 10 comments
Closed

What happens if a VCS tag is deleted/modified? #113

arschles opened this issue Mar 26, 2018 · 10 comments
Labels
proxy Work to do on the module proxy

Comments

@arschles
Copy link
Member

arschles commented Mar 26, 2018

Our current design is approximately the following:

  • Olympus holds module/version metadata
  • Athens proxy holds cached module/version metadata and source

As we've written elsewhere, if a proxy doesn't have code cached locally, it'll go to the VCS to download it. that works without a hitch if the VCS version (on Github, this would be a git tag) content never changes. If someone does change it, though, some proxies will end up with different code than others. Consider this scenario:

  1. Client executes vgo get github.com/arschles/[email protected]
  2. Proxy fetches version list from Olympus, does not find v0.1.0, returns 404
  3. vgo downloads v0.1.0 from the VCS
  4. Olympus starts background metadata-fetch job, finds v0.1.0
  5. Client executes vgo get github.com/arschles/[email protected]
  6. Proxy finds v0.1.0 in Olympus, has a cache miss, returns 404
  7. v0.1.0 is deleted on Github
  8. Proxy starts background job to download github.com/arschles/[email protected], doesn't find it

The resulting state is that Olympus says that v0.1.0 exists, but no proxy has it. Ordering can vary here, as can the operations on the tag (edit contents vs. delete). If you edited the tag in step 7 (instead of deleting it), the proxy would end up downloading code for v0.1.0 that was different from what it was when Olympus fetched its metadata.

Because metadata fetch and code fetch are not atomic, we effectively allow tags to be mutable for the window between the two fetches.

We're planning to say that nobody should modify their tags in-place, but we're not protecting against that case

A Solution

As I wrote above, we need to ensure that metadata and code fetch are atomic. To completely prevent the inconsistencies, we'd need to ensure atomicity across all proxies, public and private. That's a big task that we can't achieve because anybody is allowed to run their own proxy. This proposal takes an incremental step to make metadata and code fetch atomic in the the public proxy & Olympus infrastructure.

I believe the best way to make code and metadata fetch atomic is to put them in the same place. That means the "public proxy" and "central repository" would become the same entity, and code and metadata would be stored in the same place. I'm calling this combined system Zeus here to disambiguate from the proxy and Olympus.

If we made this change, here's what would happen on a cache miss:

  1. vgo get github.com/arschles/[email protected]
  2. Zeus returns 404
  3. vgo fetches from the VCS
  4. Zeus starts a background job to fetch that version and its code, atomically
  5. vgo get github.com/arschles/[email protected]
  6. Zeus returns 200 and vgo downloads code

This change fixes the following two cases:

  • When proxies could not download code for versions that Olympus had in its list
  • When proxies may later download code that was different than it was when Olympus fetched the version list for it

This change does not fix the following cases:

  • When the vgo get caller has different code that Zeus has
    • We can ameliorate by:
      • Crawling Github and other VCSs (as godoc.org does) to seed the Zeus cache
      • Allowing and encouraging people to upload their tags to Zeus (this solution is out of scope)
  • When other proxies have different code than the vgo get caller and Zeus
    • We can ameliorate by allowing other proxies to upstream to olympus for some or all packages.
  • When a VCS tag is deleted or modified and Zeus has the old data
    • We are already planning to say "don't change your tags," so I think we can add "we're not going to respect your changes" and we're covered here

cc/ @michalpristas @bketelsen

@michalpristas
Copy link
Member

isnt CDN exactly for that? we can have backed storage connected to CDN. talking MS Stack it would be Azure Storage with Azure CDN i'm sure amazon and google have the same thing.
Flow would be like this:
proxy -- cache miss --> Olympus (MS)
Olympus registers cache miss in it's append log and prepares zip in his backing storage
Olympus syncs metadata about package as well as CDN address so other vendors( Google, Amazon) can store it in their storage.

Or we can have only one CDN to avoid data duplication.
So proxy notifies Olympus about cache miss
olympus prepares storage connected to CDN
olympus communicates cache miss with other instances

@bketelsen
Copy link
Contributor

bketelsen commented Mar 27, 2018 via email

@michalpristas
Copy link
Member

i'm going to paste here message from chat with @arschles let's discuss it here as it is all related

but i have different flow in mind
client -> proxy (cache miss) -- 404 --> client
client -- get --> VCS

proxy -- cache miss --> Olympus
Olympus, fetches code and stores into CDN

client -- get --> proxy (still cache miss as no synchronization is performed at all)
proxy asks olympus (hey do you know this package)
olympus: for sure it's here cdn.org/owner/package/version
proxy -- meta with redirect -- >client
proxy stores pkg locally

client -- get --> proxy (do you have this package)
proxy (of course my little fella, here you go) -- serves directly

pros:

  • this would simplify things like gossiping, leader election....and development of Olympus overall (no order consistency just relaxed/eventual content consistency)
  • proxy would be populated only with packages it was asked for - no ambiguous packages, lower memory consumption (ideal for private proxies, which would not require configuration for 'blacklisting' unwanted repos)

cons:

  • proxy would be populated after second request for a package which might be a blocker

@michalpristas michalpristas added proxy Work to do on the module proxy and removed proxy labels Mar 27, 2018
@arschles
Copy link
Member Author

Olympus registers cache miss in it's append log and prepares zip in his backing storage

I agree with this 😄. if we put the zip in Olympus, I’m happy

For some reason I was under the impression that after our last call, we were only going to store metadata but not code in Olympus.

Or we can have only one CDN to avoid data duplication.

We should use multiple CDNs because cloud agnostic was our goal

olympus communicates cache miss with other instances

This is the hard part. We need to decide what kind of read consistency we want across all the Olympus-es (you wrote about that below 😄) and how we’re going to achieve it. There's a WIP document about this here

this would simplify things like gossiping, leader election....and development of Olympus overall (no order consistency just relaxed/eventual content consistency)

It would solve consistency in the proxies for public packages, but I’m not clear how it would allow us to be eventually consistent in the Olympus deployment. Can you expand on that?

proxy would be populated only with packages it was asked for - no ambiguous packages, lower memory consumption

That sounds good to me

ideal for private proxies, which would not require configuration for 'blacklisting' unwanted repos

Can you expand on how we don’t need ‘blacklisting’ in the private proxies? Seems like we still don’t want those proxies to allow serving unwanted modules

proxy would be populated after second request for a package which might be a blocker

I don’t think it’s a blocker because on the first request for a public package, it will redirect to Olympus, which is the source of truth anyway, and is append only by default

@michalpristas
Copy link
Member

this would simplify things like gossiping, leader election....and development of Olympus overall (no order consistency just relaxed/eventual content consistency)

It would solve consistency in the proxies for public packages, but I’m not clear how it would allow us to be eventually consistent in the Olympus deployment. Can you expand on that?

What i had in mind is that when you work with proxy as 'know it all' and download all modules with UUID after XX you need to solve order on and across Olympus instances.

When proxy is 'cache of previously missed packages' it does not need to know order, it just need Olympus to serve information reliably (all Os need to have same set of information but order how these are processed does not matter).

Can you expand on how we don’t need ‘blacklisting’ in the private proxies? Seems like we still don’t want those proxies to allow serving unwanted modules

You are right I was prematurely optimistic after coffee

But looks like we have an agreement how things should work.

@arschles
Copy link
Member Author

arschles commented Mar 27, 2018

What i had in mind is that when you work with proxy as 'know it all' and download all modules with UUID after XX you need to solve order on and across Olympus instances.

So are you thinking that this UUID has ordering implied in it? (we've talked about using a vector clock here to express causal ordering in the event log)

all Os need to have same set of information but order how these are processed does not matter

seems like here is where we need to do work in the multi-cloud Olympus deployment to ensure read consistency. do I have it wrong? I think I'm missing something...

I just want to avoid this:

  1. client 1 at t0: vgo get github.com/arschles/[email protected] -> cache miss in Olympus
  2. client 1 at t1:vgo get github.com/arschles/[email protected] -> cache hit in Olympus
  3. client 2 at t2: vgo get github.com/arschles/[email protected] -> cache miss in Olympus

... can you help me understand how what you wrote above prevents that case?

You are right I was prematurely optimistic after coffee

I completely understand that phenomenon 😄

@michalpristas
Copy link
Member

nono you're not missing anything. this is where work needs to be done

@arschles
Copy link
Member Author

alrighty 😄 - we will talk offline about the consistency problem

@michalpristas
Copy link
Member

i think we can close as this is mostly related to cross Olympus consistency

@arschles
Copy link
Member Author

arschles commented Oct 11, 2018

@michalpristas 👍

Background for other readers: as of #772, we're not going to try and build a registry for the time being

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proxy Work to do on the module proxy
Projects
None yet
Development

No branches or pull requests

3 participants