You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In the object storage systems we support (Azure CDN, GCS, Minio, etc...), there are a few entries that represent the entire module across versions. For example, the /list endpoint has to show every version of the requested module that Athens knows about.
When Athens goes and fetches a new version of a module, it has to read, update, and write the /list endpoint back to the obj. storage system. So, if someone does go get m@v1 and go get m@v2, and Athens has to download both versions, there will be a data race because it does those go gets in parallel.
Describe the solution you'd like
I think we should add some locking for the following implementations:
We could maybe use the same underlying locking mechanism as #760
Describe alternatives you've considered
We might want to consider using some vendor-specific concurrency features. Many of them provide per-bucket or blob locking or versioning. I researched some of them a bit for #50. From that OP:
Azure blob storage gives you versioned blobs and also lock semantics for blobs (see here)
Google Cloud Storage gives you generational versioning and preconditions (see here and here)
Quick research on S3 indicates that blobs (buckets?) have versions. Need more to research whether the API includes preconditions for generations
I still really strongly think we should prefer to do the locking with our own, unified mechanism.
Additional context
The text was updated successfully, but these errors were encountered:
arschles
added
storage
work to do on one or more of our storage systems
hosting
Work to do to improve/change how we host the services
labels
Oct 8, 2018
Depending on the storage, I think there shouldn't be a data race if m@v1 and m@v2 were being stashed at the same time. For example, in blob storage path is just arbitrary and the fact that both modules start with m/@v/ is irrelevant.
@marwan-at-work @arschles please correct me if i'm wrong but I think the idea here was that, if we use a CDN on top of the blob storage, we have to write/append the new version to a list file on every Save and so go can use CDN /list endpoint.
I'm not sure if this makes sense anymore, since we actually perform a merge on the /listendpoint (versions from VCS + versions from storage) to get all current versions of a module. If we only use the list file we would loose that (just like if we would use only what's in the storage).
arschles
changed the title
Protect the /list and other shared endpoints for CDN/object storage drivers
Mutual exclusion for the /list and other shared endpoints for CDN/object storage drivers
Oct 11, 2018
@marwan-at-work@marpio I wrote this a while ago when we planned to make the proxy redirect directly to a CDN. that would mean on a miss, it would then have to write directly to the CDN and two processes could still race for the /list blob, even if they do the append.
We're not doing the redirect-to-cdn logic now, except to redirect to an upstream, right? I think if that's the case, we can close this.
Is your feature request related to a problem? Please describe.
In the object storage systems we support (Azure CDN, GCS, Minio, etc...), there are a few entries that represent the entire module across versions. For example, the
/list
endpoint has to show every version of the requested module that Athens knows about.When Athens goes and fetches a new version of a module, it has to read, update, and write the
/list
endpoint back to the obj. storage system. So, if someone doesgo get m@v1
andgo get m@v2
, and Athens has to download both versions, there will be a data race because it does thosego get
s in parallel.Describe the solution you'd like
I think we should add some locking for the following implementations:
The rough design would be:
We could maybe use the same underlying locking mechanism as #760
Describe alternatives you've considered
We might want to consider using some vendor-specific concurrency features. Many of them provide per-bucket or blob locking or versioning. I researched some of them a bit for #50. From that OP:
I still really strongly think we should prefer to do the locking with our own, unified mechanism.
Additional context
The text was updated successfully, but these errors were encountered: