adds lockfile based locking strategy #82

akleber · 2016-05-23T06:05:03Z

This PR adds an additional lockfile based locking mechanism. To enable the lockfile based locking, "LockingStrategy" has to be set to "File" in the cache config file. A command line parameter to set the locking mechanism is not yet provided. In contrast to an earlier implementation that was removed some time ago, this implementation does not rely on a specific file being present or not, but instead uses a win32 api to explicitly lock the file. This mechanism is based on a file handle of the actual lock file and should release the lock in case the process dies unexpectedly.
This PR is still rough around the edges, but I wanted to provided a first look.
My reason for implementing this is, that I want to share the cache via a smb share to multiple build machines.
I ran some benchmarks (unfortunately not with the new test harness). It ran on a 12 core machine with SSD and 1Gibt network. All numbers are averages over 4 runs. The cache contains 1740 objects and is 1.5 Gb in size.

Lockfile, local cache, cold cache: 480s
Mutex, local cache, cold cache: 482s

Lockfile, local cache, warm cache: 270s
Mutex, local cache, warm cache: 272s

Lockfile, cache on smb, cold cache: 525s
Lockfile, cache on smb, warm cache: 361s

So it seems that the performance of the lockfile compared to the mutex is not significant different. I attribute the differences to inaccuracies in the measurement.
Placing the cache on a smb share has however a noticeable penalty.
I am going to test this implementation on a build cluster with a "real world" work load and have a look how it performs and will report back.

frerich · 2016-05-23T06:54:19Z

clcache.py

+        try:
+            lockfile = open(self._lockfileName, 'a')
+        except OSError:
+            lockfile = None


I think this assignment is unneeded: _acquire is only called if self.lockfile == None holds, which is still True in case open raises an exception. I'd say you can just return here. Doing so would permit dropping the explicit else branch and thus lowering the indentation.

frerich · 2016-05-23T07:06:18Z

Great PR, it looks very good! The idea to not rely on the sheer existence of a file for the locking but rather locking file regions (and then relying on those regions being released automatically when the file handle is closed) is really nice, that didn't occur to me. It nicely takes care of the main gripe we had with file-based locks (which is: not cleaning up in case of abnormal exists). I really only have a few stylistic concerns here.

I'd like to hear what you think about my comments, but in general I'm ready to merge this.

webmaster128 · 2016-05-23T21:47:44Z

I guess this non-trivial change justifies an entry in the copyright paragraph, right? What do you think @frerich?

Does this file based locking require any specific file system? Or can I do that on a FAT or FTP mount? Could you extend the README to tell people everything about File locking they need to know?

Regarding configuration: everything except cache size is configured via an environment variable and my feeling is that it would be better to move cache size there too. Since we're passing to cl, I think we should keep the clcache parameter parsing logic as minimal as possible. So for now I'd suggest configuring the locking method via an environment variable. Having everything in variables allows us to migrate to a more general configuration approach as seen in ccache Frerich?

webmaster128 · 2016-05-23T21:57:52Z

clcache.py

+            elif self._timeoutMs >= 0 and time.time() - start_time > self._timeoutMs/1000:
+                raise ObjectCacheLockException("Timeout waiting for file lock")
+            else:
+                poll_intervall = random.uniform(0.01, 0.1)


camelCase; is this really an interval? I think pollDelay would be a better name indicating that the delay before the next poll – a single float number – is stored.

frerich · 2016-05-24T07:17:51Z

@webmaster128 Wether or not some configuration knob is done as part of the cache configuration or via environment variables is - to me - mostly an issue of whether it's something you typically turn on and off for certain projects and/or builds (in which case I'd go for an environment variable), or whether it's a property of the cache itself which is independent of any particular build (e.g. the maximum cache size) in which case I'd go for a cache configuration switch.

To me, @akleber 's decision seems perfectly ~~sensible~~ plausible, but alas defective given the above mentioned hen and egg problem.

As for the copyright notice -- if @akleber doesn't mind getting some of the bl^H^Hfame, I'd be happy to add him to the list of contributors.

webmaster128 · 2016-05-24T09:32:58Z

+1 for making this the default behavior (override via environment variable?) as soon as we have the file system requirements on the table.

akleber · 2016-05-25T07:12:14Z

Thanks for the thorough review so far.

Regarding the config location: My thinking was that the type of locking is specific to the cache, so every process that wants to use a particular cache has to use the appropriate locking mechanism.
However I perfectly see the hen and egg problem.
I think changing the locking mechanism configuration to an env variable is fine in so far, as one who sets up a shared cache should also be able to properly configure the locking setting on the participating clcache processes.

@frerich I am fine with being added to the list of contributors.

Also I am adding relevant infos to the README regarding the lockfile requirements and usage.

* simplifies code * locking strategy as env variable * documentation

webmaster128 · 2016-06-01T19:21:03Z

I was especially thinking of the copyright header in the clcache.py file. I think this should be extended with every non-trivial change to that file.

akleber · 2016-06-03T06:27:28Z

I updated this PR with changes concerning the reviews. However I also observed some race-condition when using a network share for the cache so please do NOT merge this PR yet. I am still investigating...

frerich · 2016-06-12T12:32:08Z

clcache.py

@@ -186,7 +186,7 @@ def acquire(self):
            elif time.time() - startTime > self._timeoutMs/1000:
                raise ObjectCacheLockException("Timeout waiting for file lock")
            else:
-                pollDelay = random.uniform(0.01, 0.1)
+                pollDelay = random.uniform(0.1, 1.0)


I'm curious: what does this fix? SMB connections don't like polling too often for some reason?

frerich · 2016-06-17T11:48:47Z

@akleber ping? Any new insights recarding that race condition when using a network share? I saw that you bumped the retry interval when polling SMB shares, did that help with anything? Do you understand why?

akleber · 2016-07-05T17:17:35Z

I currently do not have much time to work on this, but I will soon hopefully. Unfortunately I was not yet able to fix the race condition when using the lockfile on a smb share and I even do not yet understand what the problem might be. Using the lockfile on a local filesystem works fine however as far I could see.

frerich · 2016-07-05T17:25:20Z

Thanks for the update, @akleber ! Pity that the cause for the race condition is still unknown, because non-local file systems were the prime reason for using a file-based locking mechanism in the first place. I guess it doesn't hurt to keep this PR around, I suspect it's "almost there".

adds lockfile based locking strategy

e5f1e10

frerich reviewed May 23, 2016
View reviewed changes

webmaster128 reviewed May 23, 2016
View reviewed changes

implements review feedback

a23081a

* simplifies code * locking strategy as env variable * documentation

akleber added 3 commits June 6, 2016 14:16

fixes pollDelay for smb share

0b6a697

Merge remote-tracking branch 'remotes/upstream/master' into lockfile

4de8217

removes whitespace

49a0d95

frerich reviewed Jun 12, 2016
View reviewed changes

frerich added the feature label Jun 24, 2016

frerich mentioned this pull request Oct 20, 2016

Hotfix race condition in mutex creation/deletion #201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds lockfile based locking strategy #82

adds lockfile based locking strategy #82

akleber commented May 23, 2016

frerich May 23, 2016 •

edited

Loading

frerich commented May 23, 2016

webmaster128 commented May 23, 2016

webmaster128 May 23, 2016

frerich commented May 24, 2016 •

edited

Loading

webmaster128 commented May 24, 2016 •

edited

Loading

akleber commented May 25, 2016

webmaster128 commented Jun 1, 2016

akleber commented Jun 3, 2016

frerich Jun 12, 2016

frerich commented Jun 17, 2016

akleber commented Jul 5, 2016

frerich commented Jul 5, 2016

adds lockfile based locking strategy #82

Are you sure you want to change the base?

adds lockfile based locking strategy #82

Conversation

akleber commented May 23, 2016

frerich May 23, 2016 • edited Loading

Choose a reason for hiding this comment

frerich commented May 23, 2016

webmaster128 commented May 23, 2016

webmaster128 May 23, 2016

Choose a reason for hiding this comment

frerich commented May 24, 2016 • edited Loading

webmaster128 commented May 24, 2016 • edited Loading

akleber commented May 25, 2016

webmaster128 commented Jun 1, 2016

akleber commented Jun 3, 2016

frerich Jun 12, 2016

Choose a reason for hiding this comment

frerich commented Jun 17, 2016

akleber commented Jul 5, 2016

frerich commented Jul 5, 2016

frerich May 23, 2016 •

edited

Loading

frerich commented May 24, 2016 •

edited

Loading

webmaster128 commented May 24, 2016 •

edited

Loading