Skip to content

Significantly improves startup performance by asynchronously building… #203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 4, 2017
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
- [fix] Add some sane limits to the disk cache: [#201]https://github.com/pinterest/PINCache/pull/201
- [new] Update enumeration methods to allow a stop flag to be flipped by caller: [#204](https://github.com/pinterest/PINCache/pull/204)
- [performance] Improves cache miss performance by ~2 orders of magnitude on device: [#202](https://github.com/pinterest/PINCache/pull/202)
- [performance] Significantly improve startup performance: [#203](https://github.com/pinterest/PINCache/pull/203)

## 3.0.1 -- Beta 5
- [fix] Respect small byteLimit settings by checking object size in setObject: [#198](https://github.com/pinterest/PINCache/pull/198)
Expand Down
172 changes: 118 additions & 54 deletions Source/PINDiskCache.m
Original file line number Diff line number Diff line change
Expand Up @@ -46,20 +46,22 @@ @interface PINDiskCacheMetadata : NSObject
@end

@interface PINDiskCache () {
NSConditionLock *_instanceLock;

PINDiskCacheSerializerBlock _serializer;
PINDiskCacheDeserializerBlock _deserializer;

PINDiskCacheKeyEncoderBlock _keyEncoder;
PINDiskCacheKeyDecoderBlock _keyDecoder;
}

@property (assign, nonatomic) pthread_mutex_t mutex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this should be a solid improvement!

@property (copy, nonatomic) NSString *name;
@property (assign) NSUInteger byteCount;
@property (strong, nonatomic) NSURL *cacheURL;
@property (strong, nonatomic) PINOperationQueue *operationQueue;
@property (strong, nonatomic) NSMutableDictionary <NSString *, PINDiskCacheMetadata *> *metadata;
@property (assign, nonatomic) pthread_cond_t diskWritableCondition;
@property (assign, nonatomic) BOOL diskWritable;
@property (assign, nonatomic) pthread_cond_t diskStateKnownCondition;
@property (assign, nonatomic) BOOL diskStateKnown;
@end

Expand All @@ -83,6 +85,12 @@ @implementation PINDiskCache

#pragma mark - Initialization -

- (void)dealloc
{
__unused int result = pthread_mutex_destroy(&_mutex);
NSCAssert(result == 0, @"Failed to destroy lock in PINMemoryCache %p. Code: %d", (void *)self, result);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add pthread_cond_destroy here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew there had to be something like that (but couldn't find docs)

}

- (instancetype)init
{
@throw [NSException exceptionWithName:@"Must initialize with a name" reason:@"PINDiskCache must be initialized with a name. Call initWithName: instead." userInfo:nil];
Expand Down Expand Up @@ -143,10 +151,12 @@ - (instancetype)initWithName:(NSString *)name
@"PINDiskCache must be initialized with a encoder AND decoder.");

if (self = [super init]) {
__unused int result = pthread_mutex_init(&_mutex, NULL);
NSAssert(result == 0, @"Failed to init lock in PINMemoryCache %@. Code: %d", self, result);

_name = [name copy];
_prefix = [prefix copy];
_operationQueue = operationQueue;
_instanceLock = [[NSConditionLock alloc] initWithCondition:PINDiskCacheConditionNotReady];
_willAddObjectBlock = nil;
_willRemoveObjectBlock = nil;
_willRemoveAllObjectsBlock = nil;
Expand Down Expand Up @@ -195,14 +205,16 @@ - (instancetype)initWithName:(NSString *)name
} else {
_keyDecoder = self.defaultKeyDecoder;
}

pthread_cond_init(&_diskWritableCondition, NULL);
pthread_cond_init(&_diskStateKnownCondition, NULL);

//we don't want to do anything without setting up the disk cache, but we also don't want to block init, it can take a while to initialize. This must *not* be done on _operationQueue because other operations added may hold the lock and fill up the queue.
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
//should always be able to aquire the lock unless the below code is running.
[_instanceLock lockWhenCondition:PINDiskCacheConditionNotReady];
[self lock];
[self _locked_createCacheDirectory];
[self _locked_initializeDiskProperties];
[_instanceLock unlockWithCondition:PINDiskCacheConditionReady];
[self unlock];
[self initializeDiskProperties];
});
}
return self;
Expand Down Expand Up @@ -415,55 +427,78 @@ + (void)emptyTrash

- (BOOL)_locked_createCacheDirectory
{
if ([[NSFileManager defaultManager] fileExistsAtPath:[_cacheURL path]])
return NO;
BOOL created = NO;
if ([[NSFileManager defaultManager] fileExistsAtPath:[_cacheURL path]] == NO) {
NSError *error = nil;
BOOL success = [[NSFileManager defaultManager] createDirectoryAtURL:_cacheURL
withIntermediateDirectories:YES
attributes:nil
error:&error];
PINDiskCacheError(error);
created = success;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's assign directly to created and remove success

}

NSError *error = nil;
BOOL success = [[NSFileManager defaultManager] createDirectoryAtURL:_cacheURL
withIntermediateDirectories:YES
attributes:nil
error:&error];
PINDiskCacheError(error);


return success;
// while this may not be true if success is false, it's better than deadlocking later.
_diskWritable = YES;
pthread_cond_broadcast(&_diskWritableCondition);

return created;
}

- (void)_locked_initializeDiskProperties
- (void)initializeDiskProperties
{
NSUInteger byteCount = 0;
NSArray *keys = @[ NSURLContentModificationDateKey, NSURLTotalFileAllocatedSizeKey ];

NSError *error = nil;
NSArray *files = [[NSFileManager defaultManager] contentsOfDirectoryAtURL:_cacheURL
includingPropertiesForKeys:keys
options:NSDirectoryEnumerationSkipsHiddenFiles
error:&error];

[self lock];
NSArray *files = [[NSFileManager defaultManager] contentsOfDirectoryAtURL:_cacheURL
includingPropertiesForKeys:keys
options:NSDirectoryEnumerationSkipsHiddenFiles
error:&error];
[self unlock];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation behind this lock? cacheURL is immutable and I think it's safe to use NSFileManager from multiple threads.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that it's not strictly necessary and I think we should likely audit this behavior to further improve performance. However in it's current state, all access to the filesystem is done with the lock held and I want to keep that behavior intact until we decide on a cohesive replacement strategy.


PINDiskCacheError(error);

for (NSURL *fileURL in files) {
NSString *key = [self keyForEncodedFileURL:fileURL];

error = nil;
NSDictionary *dictionary = [fileURL resourceValuesForKeys:keys error:&error];
PINDiskCacheError(error);

_metadata[key] = [[PINDiskCacheMetadata alloc] init];
// Continually grab and release lock while processing files to avoid contention
[self lock];
Copy link
Contributor

@appleguy appleguy Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment doesn't seem to match the code - should this be re-locking or unlocking?

Suggestion that would avoid the many lock / unlock cycles:

  • Build up a separate NSMutableDictionary instance (easier if merged into a shared _cacheItemMetadata, but can be done with each _sizes, _dates and _knownKeys)
  • No locking within the loop
  • At the end of the loop, lock and merge into the new instances any keys and values that have been set on the _sizes / _dates / _knownKeys while loading from disk has been going on.
  • Overwrite the instance variables with the new collections.

This would be the only way to avoid contention in the common case of a busy cache (e.g. many images loading at startup) while loading a large disk cache (e.g. a 200MB+ cache of 10-100KB images).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the comment is supposed to mean, do not hold the lock the entire time while processing files.

The real issue around this locking is not the in memory state of ivars but disk access. PINDiskCache manages access to the disk solely through the lock. Without it, we can't prevent another operation from overwriting our file while we're trying to get its attributes.

In the future future, I really want to investigate using NSFileCoordinators instead of locking to manage disk access.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting! I wonder if we should use a separate lock for disk access and for protecting the shared state? There are probably a bunch of methods that don't need to block on file access...

I think this explains why I've seen more lock contention in this layer than I would expect from just protecting the shared state. The good news is that, unlike some hierarchical objects like nodes where multiple locks create deadlocks extremely easily, this class might be ideally structured for a clean multi-lock strategy.

NSDictionary *dictionary = [fileURL resourceValuesForKeys:keys error:&error];
PINDiskCacheError(error);

if (_metadata[key] == nil) {
_metadata[key] = [[PINDiskCacheMetadata alloc] init];
}

NSDate *date = [dictionary objectForKey:NSURLContentModificationDateKey];
if (date && key)
_metadata[key].date = date;
NSDate *date = [dictionary objectForKey:NSURLContentModificationDateKey];
if (date && key)
_metadata[key].date = date;

NSNumber *fileSize = [dictionary objectForKey:NSURLTotalFileAllocatedSizeKey];
if (fileSize) {
_metadata[key].size = fileSize;
byteCount += [fileSize unsignedIntegerValue];
}
NSNumber *fileSize = [dictionary objectForKey:NSURLTotalFileAllocatedSizeKey];
if (fileSize) {
_metadata[key].size = fileSize;
byteCount += [fileSize unsignedIntegerValue];
}
[self unlock];
}

if (byteCount > 0)
_byteCount = byteCount;
[self lock];
if (byteCount > 0)
_byteCount = byteCount;

if (self->_byteLimit > 0 && self->_byteCount > self->_byteLimit)
[self trimToSizeByDateAsync:self->_byteLimit completion:nil];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we may need a way to avoid scheduling a large number of trim operations while this is processing. It probably makes sense to have a _trimPending / scheduled variable, so that a series of setObject: calls during this time doesn't schedule a lot of them.

Since the operation queue is concurrent, it looks like there may be an issue with multiple trim operations starting in parallel too — this could be addressed by using a barrier block on the underlying queue, which GCD will ensure is run serially.

Copy link
Collaborator Author

@garrettmoon garrettmoon Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@appleguy the trim's are actually coalesced thanks to @nguyenhuy's improvement in PINOperationQueue! We could also skip trimming until we have a known state.


_diskStateKnown = YES;
_diskStateKnown = YES;
pthread_cond_broadcast(&_diskStateKnownCondition);
[self unlock];
}

- (void)asynchronouslySetFileModificationDate:(NSDate *)date forURL:(NSURL *)fileURL
Expand All @@ -472,7 +507,7 @@ - (void)asynchronouslySetFileModificationDate:(NSDate *)date forURL:(NSURL *)fil
[self.operationQueue addOperation:^{
PINDiskCache *strongSelf = weakSelf;
if (strongSelf) {
[strongSelf lock];
[strongSelf lockUntilWritable];
[strongSelf _locked_setFileModificationDate:date forURL:fileURL];
[strongSelf unlock];
}
Expand Down Expand Up @@ -505,7 +540,8 @@ - (BOOL)removeFileAndExecuteBlocksForKey:(NSString *)key
{
NSURL *fileURL = [self encodedFileURLForKey:key];

[self lock];
// We only need to lock until writable at the top because once writable, always writable
[self lockUntilWritable];
if (!fileURL || ![[NSFileManager defaultManager] fileExistsAtPath:[fileURL path]]) {
[self unlock];
return NO;
Expand Down Expand Up @@ -546,7 +582,7 @@ - (BOOL)removeFileAndExecuteBlocksForKey:(NSString *)key

- (void)trimDiskToSize:(NSUInteger)trimByteCount
{
[self lock];
[self lockUntilWritable];
if (_byteCount > trimByteCount) {
NSArray *keysSortedBySize = [_metadata keysSortedByValueUsingComparator:^NSComparisonResult(PINDiskCacheMetadata * _Nonnull obj1, PINDiskCacheMetadata * _Nonnull obj2) {
return [obj1.size compare:obj2.size];
Expand All @@ -569,7 +605,7 @@ - (void)trimDiskToSize:(NSUInteger)trimByteCount

- (void)trimDiskToSizeByDate:(NSUInteger)trimByteCount
{
[self lock];
[self lockUntilWritable];
if (_byteCount > trimByteCount) {
NSArray *keysSortedByDate = [_metadata keysSortedByValueUsingComparator:^NSComparisonResult(PINDiskCacheMetadata * _Nonnull obj1, PINDiskCacheMetadata * _Nonnull obj2) {
return [obj1.date compare:obj2.date];
Expand All @@ -592,7 +628,7 @@ - (void)trimDiskToSizeByDate:(NSUInteger)trimByteCount

- (void)trimDiskToDate:(NSDate *)trimDate
{
[self lock];
[self lockUntilWritable];
NSArray *keysSortedByDate = [_metadata keysSortedByValueUsingComparator:^NSComparisonResult(PINDiskCacheMetadata * _Nonnull obj1, PINDiskCacheMetadata * _Nonnull obj2) {
return [obj1.date compare:obj2.date];
}];
Expand Down Expand Up @@ -650,7 +686,7 @@ - (void)lockFileAccessWhileExecutingBlockAsync:(PINCacheBlock)block

[self.operationQueue addOperation:^{
PINDiskCache *strongSelf = weakSelf;
[strongSelf lock];
[strongSelf lockUntilWritable];
block(strongSelf);
[strongSelf unlock];
} withPriority:PINOperationQueuePriorityLow];
Expand Down Expand Up @@ -693,7 +729,7 @@ - (void)fileURLForKeyAsync:(NSString *)key completion:(PINDiskCacheFileURLBlock)
PINDiskCache *strongSelf = weakSelf;
NSURL *fileURL = [strongSelf fileURLForKey:key];

[strongSelf lock];
[strongSelf lockUntilWritable];
block(key, fileURL);
[strongSelf unlock];
} withPriority:PINOperationQueuePriorityLow];
Expand Down Expand Up @@ -830,7 +866,7 @@ - (void)enumerateObjectsWithBlockAsync:(PINDiskCacheFileURLEnumerationBlock)bloc
- (void)synchronouslyLockFileAccessWhileExecutingBlock:(PINCacheBlock)block
{
if (block) {
[self lock];
[self lockUntilWritable];
block(self);
[self unlock];
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also modify containsObjectForKey:?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Expand Down Expand Up @@ -859,17 +895,22 @@ - (id)objectForKeyedSubscript:(NSString *)key
NSDate *now = [[NSDate alloc] init];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this diff, but I just noticed we can save some work by moving this date creation down.


[self lock];
BOOL isEmpty = _metadata.count == 0;
BOOL containsKey = _metadata[key] != nil;
BOOL containsKey = _metadata[key] != nil || _diskStateKnown == NO;
[self unlock];

if (!key || isEmpty || !containsKey)
if (!key || !containsKey)
return nil;

id <NSCoding> object = nil;
NSURL *fileURL = [self encodedFileURLForKey:key];

[self lock];
if (self->_ttlCache) {
// We actually need to know the entire disk state if we're a TTL cache.
[self unlock];
[self lockUntilDiskStateKnown];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that? Can't we just return nil for any objectForKey: before initialization is done, including for a TTL cache?

I think I recall there being a default TTL of 30 days. We should consider turning this off by default if it incurs a meaningful performance penalty, which blocking on cache init would be a pretty big cost if the developer hasn't specifically indicated they want TTL behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTL is off by default. It was a feature added by a community member which enforces ttl (I.e. it won't return an object even if it has it if it exceeds the TTL). There are separate time limits are probably the default you're think of but they only relate to trimming, not a guarantee something won't be returned. To be frank I'm a bit regretful we merged in this feature, it's not really in the spirit of the framework and adds a lot of complexity.

}

if (!self->_ttlCache || self->_ageLimit <= 0 || fabs([_metadata[key].date timeIntervalSinceDate:now]) < self->_ageLimit) {
// If the cache should behave like a TTL cache, then only fetch the object if there's a valid ageLimit and the object is still alive

Expand Down Expand Up @@ -920,7 +961,7 @@ - (NSURL *)fileURLForKey:(NSString *)key updateFileModificationDate:(BOOL)update
NSDate *now = [[NSDate alloc] init];
NSURL *fileURL = [self encodedFileURLForKey:key];

[self lock];
[self lockUntilWritable];
if (fileURL.path && [[NSFileManager defaultManager] fileExistsAtPath:fileURL.path]) {
if (updateFileModificationDate) {
[self asynchronouslySetFileModificationDate:now forURL:fileURL];
Expand Down Expand Up @@ -979,7 +1020,7 @@ - (void)setObject:(id <NSCoding>)object forKey:(NSString *)key fileURL:(NSURL **
return;
}

[self lock];
[self lockUntilWritable];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to block until the setup is done? If this happens, then won't the common path that triggers cache init (e.g. an image download starting from PINRemoteImage) will still block displaying the image until after the cache is fully read in?

I may be missing something, like perhaps PINRemoteImage performs the setObject: asynchronously and we won't block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lockUntilWritable only waits until the cache directory is confirmed to exist or is created if it doesn't. lockUntilKnownState (I think that's what I called it) guarantees everything is created, iterated and populated.

PINCacheObjectBlock willAddObjectBlock = self->_willAddObjectBlock;
if (willAddObjectBlock) {
[self unlock];
Expand Down Expand Up @@ -1089,11 +1130,12 @@ - (void)trimToSizeByDate:(NSUInteger)trimByteCount

- (void)removeAllObjects
{
[self lock];
// We don't need to know the disk state since we're just going to remove everything.
[self lockUntilWritable];
PINCacheBlock willRemoveAllObjectsBlock = self->_willRemoveAllObjectsBlock;
if (willRemoveAllObjectsBlock) {
[self unlock];
willRemoveAllObjectsBlock(self);
willRemoveAllObjectsBlock(self);
[self lock];
}

Expand All @@ -1120,7 +1162,7 @@ - (void)enumerateObjectsWithBlock:(PINDiskCacheFileURLEnumerationBlock)block
if (!block)
return;

[self lock];
[self lockUntilDiskStateKnown];
NSDate *now = [NSDate date];

for (NSString *key in _metadata) {
Expand Down Expand Up @@ -1397,20 +1439,42 @@ - (void)setWritingProtectionOption:(NSDataWritingOptions)writingProtectionOption
NSDataWritingOptions option = NSDataWritingFileProtectionMask & writingProtectionOption;

[strongSelf lock];
strongSelf->_writingProtectionOption = option;
strongSelf->_writingProtectionOption = option;
[strongSelf unlock];
} withPriority:PINOperationQueuePriorityHigh];
}
#endif

- (void)lockUntilWritable
{
[self lock];

// spinlock if the disk isn't writable
if (_diskWritable == NO) {
pthread_cond_wait(&_diskWritableCondition, &_mutex);
}
}

- (void)lockUntilDiskStateKnown
{
[self lock];

// spinlock if the disk state isn't known
if (_diskStateKnown == NO) {
pthread_cond_wait(&_diskStateKnownCondition, &_mutex);
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two alternate names might be lockForWriting and lockAndWaitForKnownDiskState or something. lockUntil isn't quite on the money.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like those better.


- (void)lock
{
[_instanceLock lockWhenCondition:PINDiskCacheConditionReady];
__unused int result = pthread_mutex_lock(&_mutex);
NSAssert(result == 0, @"Failed to lock PINDiskCache %@. Code: %d", self, result);
}

- (void)unlock
{
[_instanceLock unlockWithCondition:PINDiskCacheConditionReady];
__unused int result = pthread_mutex_unlock(&_mutex);
NSAssert(result == 0, @"Failed to unlock PINDiskCache %@. Code: %d", self, result);
}

@end
Expand Down
Loading