Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutual exclusion for the /list and other shared endpoints for CDN/object storage drivers #761

Closed
arschles opened this issue Oct 8, 2018 · 5 comments
Labels
hosting Work to do to improve/change how we host the services storage work to do on one or more of our storage systems
Milestone

Comments

@arschles
Copy link
Member

arschles commented Oct 8, 2018

Is your feature request related to a problem? Please describe.

In the object storage systems we support (Azure CDN, GCS, Minio, etc...), there are a few entries that represent the entire module across versions. For example, the /list endpoint has to show every version of the requested module that Athens knows about.

When Athens goes and fetches a new version of a module, it has to read, update, and write the /list endpoint back to the obj. storage system. So, if someone does go get m@v1 and go get m@v2, and Athens has to download both versions, there will be a data race because it does those go gets in parallel.

Describe the solution you'd like

I think we should add some locking for the following implementations:

The rough design would be:

lock()
read /list
update /list
upload /list
unlock()

We could maybe use the same underlying locking mechanism as #760

Describe alternatives you've considered

We might want to consider using some vendor-specific concurrency features. Many of them provide per-bucket or blob locking or versioning. I researched some of them a bit for #50. From that OP:

  • Azure blob storage gives you versioned blobs and also lock semantics for blobs (see here)
  • Google Cloud Storage gives you generational versioning and preconditions (see here and here)
  • Quick research on S3 indicates that blobs (buckets?) have versions. Need more to research whether the API includes preconditions for generations

I still really strongly think we should prefer to do the locking with our own, unified mechanism.

Additional context

@arschles arschles added storage work to do on one or more of our storage systems hosting Work to do to improve/change how we host the services labels Oct 8, 2018
@arschles arschles added this to the preview milestone Oct 8, 2018
@marwan-at-work
Copy link
Contributor

Depending on the storage, I think there shouldn't be a data race if m@v1 and m@v2 were being stashed at the same time. For example, in blob storage path is just arbitrary and the fact that both modules start with m/@v/ is irrelevant.

@marpio
Copy link
Member

marpio commented Oct 11, 2018

@marwan-at-work
@arschles please correct me if i'm wrong but I think the idea here was that, if we use a CDN on top of the blob storage, we have to write/append the new version to a list file on every Save and so go can use CDN /list endpoint.
I'm not sure if this makes sense anymore, since we actually perform a merge on the /listendpoint (versions from VCS + versions from storage) to get all current versions of a module. If we only use the list file we would loose that (just like if we would use only what's in the storage).

@arschles arschles changed the title Protect the /list and other shared endpoints for CDN/object storage drivers Mutual exclusion for the /list and other shared endpoints for CDN/object storage drivers Oct 11, 2018
@arschles
Copy link
Member Author

arschles commented Nov 7, 2018

@marwan-at-work @marpio I wrote this a while ago when we planned to make the proxy redirect directly to a CDN. that would mean on a miss, it would then have to write directly to the CDN and two processes could still race for the /list blob, even if they do the append.

We're not doing the redirect-to-cdn logic now, except to redirect to an upstream, right? I think if that's the case, we can close this.

@marpio
Copy link
Member

marpio commented Nov 8, 2018

Yes, we redirect to upstream. I think we can close this as well.
/cc @michalpristas

@arschles
Copy link
Member Author

arschles commented Nov 8, 2018

thanks everyone, I will close this now then :)

@arschles arschles closed this as completed Nov 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hosting Work to do to improve/change how we host the services storage work to do on one or more of our storage systems
Projects
None yet
Development

No branches or pull requests

3 participants