-
Notifications
You must be signed in to change notification settings - Fork 83
adds lockfile based locking strategy #82
base: master
Are you sure you want to change the base?
Conversation
try: | ||
lockfile = open(self._lockfileName, 'a') | ||
except OSError: | ||
lockfile = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this assignment is unneeded: _acquire
is only called if self.lockfile == None
holds, which is still True
in case open
raises an exception. I'd say you can just return
here. Doing so would permit dropping the explicit else
branch and thus lowering the indentation.
Great PR, it looks very good! The idea to not rely on the sheer existence of a file for the locking but rather locking file regions (and then relying on those regions being released automatically when the file handle is closed) is really nice, that didn't occur to me. It nicely takes care of the main gripe we had with file-based locks (which is: not cleaning up in case of abnormal exists). I really only have a few stylistic concerns here. I'd like to hear what you think about my comments, but in general I'm ready to merge this. |
I guess this non-trivial change justifies an entry in the copyright paragraph, right? What do you think @frerich? Does this file based locking require any specific file system? Or can I do that on a FAT or FTP mount? Could you extend the README to tell people everything about File locking they need to know? Regarding configuration: everything except cache size is configured via an environment variable and my feeling is that it would be better to move cache size there too. Since we're passing to cl, I think we should keep the clcache parameter parsing logic as minimal as possible. So for now I'd suggest configuring the locking method via an environment variable. Having everything in variables allows us to migrate to a more general configuration approach as seen in ccache Frerich? |
elif self._timeoutMs >= 0 and time.time() - start_time > self._timeoutMs/1000: | ||
raise ObjectCacheLockException("Timeout waiting for file lock") | ||
else: | ||
poll_intervall = random.uniform(0.01, 0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camelCase; is this really an interval? I think pollDelay would be a better name indicating that the delay before the next poll – a single float number – is stored.
@webmaster128 Wether or not some configuration knob is done as part of the cache configuration or via environment variables is - to me - mostly an issue of whether it's something you typically turn on and off for certain projects and/or builds (in which case I'd go for an environment variable), or whether it's a property of the cache itself which is independent of any particular build (e.g. the maximum cache size) in which case I'd go for a cache configuration switch. To me, @akleber 's decision seems perfectly As for the copyright notice -- if @akleber doesn't mind getting some of the bl^H^Hfame, I'd be happy to add him to the list of contributors. |
+1 for making this the default behavior (override via environment variable?) as soon as we have the file system requirements on the table. |
Thanks for the thorough review so far. Regarding the config location: My thinking was that the type of locking is specific to the cache, so every process that wants to use a particular cache has to use the appropriate locking mechanism. @frerich I am fine with being added to the list of contributors. Also I am adding relevant infos to the README regarding the lockfile requirements and usage. |
* simplifies code * locking strategy as env variable * documentation
I was especially thinking of the copyright header in the clcache.py file. I think this should be extended with every non-trivial change to that file. |
I updated this PR with changes concerning the reviews. However I also observed some race-condition when using a network share for the cache so please do NOT merge this PR yet. I am still investigating... |
@@ -186,7 +186,7 @@ def acquire(self): | |||
elif time.time() - startTime > self._timeoutMs/1000: | |||
raise ObjectCacheLockException("Timeout waiting for file lock") | |||
else: | |||
pollDelay = random.uniform(0.01, 0.1) | |||
pollDelay = random.uniform(0.1, 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious: what does this fix? SMB connections don't like polling too often for some reason?
@akleber ping? Any new insights recarding that race condition when using a network share? I saw that you bumped the retry interval when polling SMB shares, did that help with anything? Do you understand why? |
I currently do not have much time to work on this, but I will soon hopefully. Unfortunately I was not yet able to fix the race condition when using the lockfile on a smb share and I even do not yet understand what the problem might be. Using the lockfile on a local filesystem works fine however as far I could see. |
Thanks for the update, @akleber ! Pity that the cause for the race condition is still unknown, because non-local file systems were the prime reason for using a file-based locking mechanism in the first place. I guess it doesn't hurt to keep this PR around, I suspect it's "almost there". |
This PR adds an additional lockfile based locking mechanism. To enable the lockfile based locking, "LockingStrategy" has to be set to "File" in the cache config file. A command line parameter to set the locking mechanism is not yet provided. In contrast to an earlier implementation that was removed some time ago, this implementation does not rely on a specific file being present or not, but instead uses a win32 api to explicitly lock the file. This mechanism is based on a file handle of the actual lock file and should release the lock in case the process dies unexpectedly.
This PR is still rough around the edges, but I wanted to provided a first look.
My reason for implementing this is, that I want to share the cache via a smb share to multiple build machines.
I ran some benchmarks (unfortunately not with the new test harness). It ran on a 12 core machine with SSD and 1Gibt network. All numbers are averages over 4 runs. The cache contains 1740 objects and is 1.5 Gb in size.
Lockfile, local cache, cold cache: 480s
Mutex, local cache, cold cache: 482s
Lockfile, local cache, warm cache: 270s
Mutex, local cache, warm cache: 272s
Lockfile, cache on smb, cold cache: 525s
Lockfile, cache on smb, warm cache: 361s
So it seems that the performance of the lockfile compared to the mutex is not significant different. I attribute the differences to inaccuracies in the measurement.
Placing the cache on a smb share has however a noticeable penalty.
I am going to test this implementation on a build cluster with a "real world" work load and have a look how it performs and will report back.