Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single vs multi-resource locks #20

Closed
pwnall opened this issue Nov 17, 2017 · 5 comments
Closed

Single vs multi-resource locks #20

pwnall opened this issue Nov 17, 2017 · 5 comments

Comments

@pwnall
Copy link

pwnall commented Nov 17, 2017

I'm writing this issue to document my thoughts on the decision to have one lock own multiple resources (vs a single one). TL;DR: This imposes a non-trivial and non-obvious tax on future evolutions of the API.

Pros for Multi-Resource

Before I go into the complexities, I'll outline the benefits of multi-resource locks:

First, having built-in multi-resource locks mitigates the risk of deadlock due to broken userland (JavaScript) implementations of the multi-resource lock paradigm.

In database circles, it's fairly well-known that multi-resource locks can be built on top of a single-resource locking primitive by establishing a total ordering relationship between resources, and always acquiring the resources in order. In our case, string sorting would be an obvious ordering relationship. More clearly:

async function acquireMultiple(resources) {
  const sortedResources = Array.from(resources);
  sortedResources.sort();
  for (const resource of sortedResources)
    await aquireSingle(resource);
}

While in theory this is all good, in practice

  1. Application developers who don't have this particular bit of systems knowledge might implement a scheme that is prone to deadlocks.
  2. Application developers who do have the knowledge might still try to implement a smarter scheme and end up with deadlocks. Note that this risk is still present in a multi-resource locking API, if the conflict resolution is perceived as sub-optimal.

Second, as a general principle, having more expressive power in an API gives more room for optimization to future implementations. More specifically, assuming the specification becomes more loose around solving lock conflicts (which entails moving away from the current request queueing model), implementations might be able to make smarter decisions about which locks to grant when they see the whole scope of a multi-resource acquire call instead of having to fulfill a sequence of single-resourceacquire calls.

Pros for Single-Resource

The main pro for me is that single-resource locks are easier to implement, so we're more likely to produce a solid test suite, and browser vendors are more likely to end up with correct implementations sooner.

Two more subtle pros are that multi-resource impact the evolution of the API's conflict resolution specification.

Better Conflict Resolution for S/X Locks

First, the current conflict resolution algorithm, based on request queueing, misses opportunity for concurrency. A relevant subset of the great example in the explainer follows (assume exclusive locks).

L1 = acquire [A, B]
L2 = acquire [A    ]
L3 = acquire [A, B]
L4 = acquire [    B]
release L1

Given the requests above, releasing L1 will currently result in L2 being granted. L3 cannot be granted because it conflicts with L2. L4 could be granted, but it is queued behind L3, so it is not granted. This is a lost opportunity for concurrency.

Now, suppose we want to unlock concurrency, but we still want the benefit of deterministic, fully-specified conflict resolution, so Web content doesn't accidentally end up depending on an implementation.

  1. An enticing alternative is to specify that releasing a lock should grant the maximum possible number of locks given the existing conflicts (and disambiguate between multiple optimal solutions by lexicographically sorting the sets of granted locks). In a multi-lock world, this is exponential in the number of locks under contention -- it is an instance of Set packing problem

  2. Given the issue above, another enticing alternative is to specify that releasing a lock should iterate through the request queue in order, and grant each lock that can be granted. (This is a fairly straightforward greedy solution to the Set packing problem above.) While this isn't exponential anymore, solving N requests over K resources can still take O(KN^2) time.

For comparison, under the current lock conflict resolution, acquiring / releasing / aborting a lock can be implemented in O(number of resources held by the lock * log(number of pending locks)) amortized time, using the algorithm in #21. I don't have proof that the logarithm is necessary, so there might be linear algorithms (running in O(number of resources held by lock)). While this is fairly fast, implementing that algorithm is non-trivial.

Custom Lock Types

In #17, I propose supporting custom lock types, which could be helpful for high-concurrency database implementations. This is fairly simple to layer on top of a single-resource lock manager, but would become more complex in a multi-resource lock implementation.

@inexorabletash
Copy link
Member

I confess I mostly put multiple resource scopes in so that you could "explain" IndexedDB in terms of Locks.

But given the demonstration that this can be done in user space, I'm not longer a strong advocate. Anyone want to argue in favor of multi-resource?

@pwnall
Copy link
Author

pwnall commented Nov 21, 2017

I updated the description above to reflect the fact that I've failed to produce linear time algorithms for the current lock resolution method, as I hoped I would. Given the complexity of the algorithms in #21, I'm very inclined to think we should do single-resource locks.

@inexorabletash
Copy link
Member

Going once? Going twice.... ?

@asutherland
Copy link

I was excited when I saw the web-locks proposal because it avoided @jakearchibald's very neat Cache Transactions proposal at w3c/ServiceWorker#823 introducing this powerful functionality throughout the platform through a side door. I'm not sure I want to argue for multiple-resource locks, but I'd like to understand if the concept of Cache transactions will be updated to exist in a web-locks world given that its most recent proposal called for multi-resource locks, and if so, how...

Since that the Cache API spec has done away with fetching/incumbent records and addAll() is now basically just "perform a bunch of fetches and if they all work out, perform a bunch of put()s together atomically", the Cache API might not need its own API surface for transactions.

However, there are some reasons to still want some type of explicit integration into the Cache API:

  • Cache keys are RequestInfo instances which do not reduce cleanly to a single string thanks to the existence of the "Vary" header and edge-cases related to query strings and fragments. That said, the Cache API transaction proposal did just reduce them to url's for simplicity, and presumably any attempt at using opaque keys would do the same, ignoring vary and stripping off query strings and fragments.
  • Cache instances have a more complicated life-cycle than IDB databases because Cache instances remain operational after being orphaned by a delete(cacheName) call which simply removes them from the name-to-cache map. Compare with IDB deletion which cannot proceed until all open connections have been deleted. This makes IDB database names sufficiently distinct for locks since they can only ever refer to a single underlying database as a time, whereas cache names become potentially meaningless unless care is taken to hold shared transactions on the name of the cache itself that circumscribe its lifetime, thereby approximating IDB's semantics here.

@pwnall
Copy link
Author

pwnall commented Dec 11, 2017

@asutherland Assuming I understand Cache Transactions correctly, the scope of a transaction is either the entire cache or a list of URLs. I'd implement that on top of Locks as follows:

  1. Let the resource representing the entire cache be 'cache:' and the resource representing a cache URL be cache:${url}. Purpose: the resource for the whole cache should sort before any URL resource.
  2. When locking the entire cache, acquire an exclusive lock for the whole cache resource.
  3. When locking a list of URLs, call a slightly modified version of acquireMultiple in the OP to first get a shared lock on the whole cache resource, and then get exclusive locks on URL resources.

This sketch assumes a single cache. A real implementation would want something like ${cache_api_namespace}:${cache_id}:${url} where cache_api_namespace is unlikely to be used by a Web app for its own resources, and cache_id is a unique identifier for each Cache instance.

I don't see a way of implementing the semantics for a cache-wide lock + URL-scoped locks with a single acquire call in a multiple-resource Locks API. The implementation for locking a list of URLs would need to first issue an acquire call for the whole cache resource, and then issue another call for the URL resources. Assuming I'm right, implementing Cache transactions on top of Locks still requires understanding the deadlock-avoiding principles I mentioned in the OP, and doesn't get much value from a multiple-resource API.

MXEBot pushed a commit to mirror/chromium that referenced this issue Dec 19, 2017
Based on discussion on the proposal, simplify to locks representing a
single resource. Multi-resource locks can be implemented in user-space,
with appropriate care taken to avoid deadlocks.

Discussion: w3c/web-locks#20
Change-Id: I23980f3cdaf403bfbb641a36ef30bd92d665cef3
Reviewed-on: https://chromium-review.googlesource.com/828286
Commit-Queue: Joshua Bell <[email protected]>
Reviewed-by: Daniel Cheng <[email protected]>
Reviewed-by: Victor Costan <[email protected]>
Cr-Commit-Position: refs/heads/master@{#524856}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants