-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significantly improves startup performance by asynchronously building… #203
Conversation
Excited for this PR :). Let me know when it's ready for a full review — I'll try to look in depth within a day or two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big advancement for PINCache! I'm very excited to integrate this in my team's app.
I think the most important feedback in my comments relates to the spinlock / sleep behavior. I thought about it for a bit and think the suggestion should work pretty well!
PINDiskCacheSerializerBlock _serializer; | ||
PINDiskCacheDeserializerBlock _deserializer; | ||
|
||
PINDiskCacheKeyEncoderBlock _keyEncoder; | ||
PINDiskCacheKeyDecoderBlock _keyDecoder; | ||
} | ||
|
||
@property (assign, nonatomic) pthread_mutex_t mutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this should be a solid improvement!
Source/PINDiskCache.m
Outdated
if (date && key) | ||
[_dates setObject:date forKey:key]; | ||
// Do not continue to hold the lock while processing files. | ||
[self lock]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment doesn't seem to match the code - should this be re-locking or unlocking?
Suggestion that would avoid the many lock / unlock cycles:
- Build up a separate NSMutableDictionary instance (easier if merged into a shared _cacheItemMetadata, but can be done with each _sizes, _dates and _knownKeys)
- No locking within the loop
- At the end of the loop, lock and merge into the new instances any keys and values that have been set on the _sizes / _dates / _knownKeys while loading from disk has been going on.
- Overwrite the instance variables with the new collections.
This would be the only way to avoid contention in the common case of a busy cache (e.g. many images loading at startup) while loading a large disk cache (e.g. a 200MB+ cache of 10-100KB images).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the comment is supposed to mean, do not hold the lock the entire time while processing files.
The real issue around this locking is not the in memory state of ivars but disk access. PINDiskCache manages access to the disk solely through the lock. Without it, we can't prevent another operation from overwriting our file while we're trying to get its attributes.
In the future future, I really want to investigate using NSFileCoordinators instead of locking to manage disk access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting! I wonder if we should use a separate lock for disk access and for protecting the shared state? There are probably a bunch of methods that don't need to block on file access...
I think this explains why I've seen more lock contention in this layer than I would expect from just protecting the shared state. The good news is that, unlike some hierarchical objects like nodes where multiple locks create deadlocks extremely easily, this class might be ideally structured for a clean multi-lock strategy.
_byteCount = byteCount; | ||
|
||
if (self->_byteLimit > 0 && self->_byteCount > self->_byteLimit) | ||
[self trimToSizeByDateAsync:self->_byteLimit completion:nil]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we may need a way to avoid scheduling a large number of trim operations while this is processing. It probably makes sense to have a _trimPending / scheduled variable, so that a series of setObject: calls during this time doesn't schedule a lot of them.
Since the operation queue is concurrent, it looks like there may be an issue with multiple trim operations starting in parallel too — this could be addressed by using a barrier block on the underlying queue, which GCD will ensure is run serially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@appleguy the trim's are actually coalesced thanks to @nguyenhuy's improvement in PINOperationQueue! We could also skip trimming until we have a known state.
Source/PINDiskCache.m
Outdated
@@ -974,7 +1005,7 @@ - (void)setObject:(id <NSCoding>)object forKey:(NSString *)key fileURL:(NSURL ** | |||
return; | |||
} | |||
|
|||
[self lock]; | |||
[self lockUntilWritable]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to block until the setup is done? If this happens, then won't the common path that triggers cache init (e.g. an image download starting from PINRemoteImage) will still block displaying the image until after the cache is fully read in?
I may be missing something, like perhaps PINRemoteImage performs the setObject: asynchronously and we won't block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lockUntilWritable only waits until the cache directory is confirmed to exist or is created if it doesn't. lockUntilKnownState (I think that's what I called it) guarantees everything is created, iterated and populated.
Source/PINDiskCache.m
Outdated
if (self->_ttlCache) { | ||
// We actually need to know the entire disk state if we're a TTL cache. | ||
[self unlock]; | ||
[self lockUntilDiskStateKnown]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that? Can't we just return nil for any objectForKey: before initialization is done, including for a TTL cache?
I think I recall there being a default TTL of 30 days. We should consider turning this off by default if it incurs a meaningful performance penalty, which blocking on cache init would be a pretty big cost if the developer hasn't specifically indicated they want TTL behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TTL is off by default. It was a feature added by a community member which enforces ttl (I.e. it won't return an object even if it has it if it exceeds the TTL). There are separate time limits are probably the default you're think of but they only relate to trimming, not a guarantee something won't be returned. To be frank I'm a bit regretful we merged in this feature, it's not really in the spirit of the framework and adds a lot of complexity.
Source/PINDiskCache.m
Outdated
[self unlock]; | ||
usleep(100); | ||
|
||
__unused int result = pthread_mutex_lock(&_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not call [self lock] here and in the method below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
Source/PINDiskCache.m
Outdated
// spinlock if the disk isn't writable | ||
while (_diskWritable == NO) { | ||
[self unlock]; | ||
usleep(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this technique could cause some other problems, like a lot of threads to be spawned. We should be able to rely on the lock to be released in order to let the threads proceed.
How about a separate read-write mutex, just for cache init? At initialization of the class, the writer lock would be acquired by the code doing the read from disk. Any code calling -lockUntilWritable would first acquire the read lock (which could be grabbed by many threads at the same time, as long as the write lock isn't held). Secondly it would acquire the main [self lock].
This should allow all of them to wait on the initialization, without busying the main cache lock for operations that don't depend on _diskWritable.
As an alternative, I think there is a pthread version of the condition lock or a semaphore which can be signalled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I'll play around with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pthread condition might be the best option. I'm guessing performance isn't any better for read / write locks and it's far less indicative of what we're doing. Need to actually figure out how to do pthread conditions though, not a lot of great docs…
cdf4bd2
to
c673afc
Compare
…ge on an iPhone 6. Essentially just adds an in memory map of what is known to be on disk.
… known state on startup.
46704b2
to
f2ac652
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super duper good. Async all day!
Source/PINDiskCache.m
Outdated
attributes:nil | ||
error:&error]; | ||
PINDiskCacheError(error); | ||
created = success; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Let's assign directly to created
and remove success
if (_diskStateKnown == NO) { | ||
pthread_cond_wait(&_diskStateKnownCondition, &_mutex); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two alternate names might be lockForWriting
and lockAndWaitForKnownDiskState
or something. lockUntil
isn't quite on the money.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like those better.
includingPropertiesForKeys:keys | ||
options:NSDirectoryEnumerationSkipsHiddenFiles | ||
error:&error]; | ||
[self unlock]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation behind this lock? cacheURL
is immutable and I think it's safe to use NSFileManager from multiple threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that it's not strictly necessary and I think we should likely audit this behavior to further improve performance. However in it's current state, all access to the filesystem is done with the lock held and I want to keep that behavior intact until we decide on a cohesive replacement strategy.
@@ -859,17 +895,22 @@ - (id)objectForKeyedSubscript:(NSString *)key | |||
NSDate *now = [[NSDate alloc] init]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this diff, but I just noticed we can save some work by moving this date creation down.
Source/PINDiskCache.m
Outdated
@@ -830,7 +866,7 @@ - (void)enumerateObjectsWithBlockAsync:(PINDiskCacheFileURLEnumerationBlock)bloc | |||
- (void)synchronouslyLockFileAccessWhileExecutingBlock:(PINCacheBlock)block | |||
{ | |||
if (block) { | |||
[self lock]; | |||
[self lockUntilWritable]; | |||
block(self); | |||
[self unlock]; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also modify containsObjectForKey:
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
- (void)dealloc | ||
{ | ||
__unused int result = pthread_mutex_destroy(&_mutex); | ||
NSCAssert(result == 0, @"Failed to destroy lock in PINMemoryCache %p. Code: %d", (void *)self, result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add pthread_cond_destroy
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I knew there had to be something like that (but couldn't find docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bingo!
… known state on startup.