[Prototype] Add cache size limit and eviction.#3104
Conversation
| /// </summary> | ||
| /// <param name="useSharedCache">Set to true to share the cache between all ClientApplication objects. The cache becomes static. <see cref="UseSharedCache"/> for a detailed description. </param> | ||
| /// <param name="sizeLimit">Token cache size limit in bytes. <see cref="SizeLimit"/> for a detailed description.</param> | ||
| public CacheOptions(bool useSharedCache, long sizeLimit) |
There was a problem hiding this comment.
Can we have default values here (as per any recommended practices)?
There was a problem hiding this comment.
I don't think we can provide a realistic default value; it would be based on the user's scenario.
There was a problem hiding this comment.
Yeah, that's a good question we're struggling with :). What would be a good default? ...
There was a problem hiding this comment.
If default varies widely based on the user scenarios, we cannot do much. But if it primarly leans towards a certain value, that can be used. But we do not have answer, then it is good as it is.
There was a problem hiding this comment.
Danny also touched on a point of should there bit a minimum. Currently if someone specifies small value (like 1), cache will be compacted on each operation.
| @@ -216,13 +234,25 @@ public void SetiOSKeychainSecurityGroup(string keychainSecurityGroup) | |||
| public virtual void Clear() | |||
| { | |||
| AccessTokenCacheDictionary.Clear(); | |||
There was a problem hiding this comment.
Threading issue: Between Clear and setting CacheSize to zero, something may get added to the dictionary and the two can go out of sync
There was a problem hiding this comment.
True, but we also don't lock on cache operations in general (based on eventual consistency principle). So this behavior can happen with other add/remove methods.
There was a problem hiding this comment.
Eventual consistency is not guaranteed here. For example:
- Thread 1 calls the AccessTokenCacheDictionary.Clear() method and is paused.
- Thread 2 adds an item and updates the _appCacheSize.
- Thread 1 resumes execution and sets the appCacheSize to 0 (Interlocked.Exchange(ref _appCacheSize, 0)).
- Now there is one item in the cache but the size is 0.
- The same issue applies to the TokenCache.CacheSize, and in this case the error can accumulate over multiple Clear() calls.
There was a problem hiding this comment.
@dannybtsai Do you suggest locking? Only when clearing or always when updating the cache size?
There was a problem hiding this comment.
It is hard to achieve consistency without locking as far as I can tell. And you need to always lock, not just when clearing cache otherwise it is the same problem.
| /// IMPORTANT: Monitor app health metrics (including memory usage) and cache performance (<see href="https://aka.ms/msal-net-token-cache-serialization"/>) | ||
| /// and adjust size limit accordingly. | ||
| /// </remarks> | ||
| public long? SizeLimit |
There was a problem hiding this comment.
Size limit applies only when cache serialization is disabled.
| private void Compact() | ||
| { | ||
| _logger.Always("[UserCache] Compacting cache."); | ||
| Clear(); |
There was a problem hiding this comment.
Clearing the cache because it's simpler and other forms of compaction become complex for the user cache. (For ex. removing only ATs will leave the user cache with non-AT tokens; trying to remove all tokens for a user doesn't work for OBO since partition keys are different; randomly removing token entries will leave orphaned/unassociated tokens.)
| /// <summary> | ||
| /// Static, used by both app and user caches to track approximate token cache size. | ||
| /// </summary> | ||
| internal static long CacheSize = 0L; |
There was a problem hiding this comment.
Cache size includes user+ app token cache. 2 instances of TokenCache class are created (1 with app cache accessors, 1 with user accessors). So moved this size property up a level so it can be accessed by both. However, because of this design, clearing the cache is only done for that respective cache, app or user.
| private readonly CacheOptions _tokenCacheAccessorOptions; | ||
|
|
||
| // Approximate size of cache item objects | ||
| private const long AccessTokenSizeInBytes = 4500; |
There was a problem hiding this comment.
My main concern is that a constant approximation will not work well:
- AAD allows for configuration of optional claims
- POP tokens are a bit bigger because they contain the POP key id
- AAD is adding support for SAML tokens (there is already an OBO flow for this). We won't be able to update this aproximation for all token types.
I think there are a few ways out of this:
- Allow customers to define this constant (this is what SAL does). But this is a pretty "obscure" scenario
- Use non-constant size approximation, i.e. we know the size of the actual token (it's a string) at which we add some metadata (fairly constant)
- Do not expose sizeLimit, expose countLimit
There was a problem hiding this comment.
@dannybtsai @henrik-me - can you please review this strategy?
There was a problem hiding this comment.
Is there a reason to use a constant value instead of the actual size (the #2 option)?
There was a problem hiding this comment.
- Hmm, interesting, I'd like to see the implementation in SAL. How would customers know that constant?
- Yea, metadata should be fairly constant. This does seem like a better approximation.
- Specific to the user cache, what would the count represent: only access tokens, all tokens? Because in both cases, the count would not represent the whole cache size accurately.
@dannybtsai
How would you measure the actual size of objects in memory? We could measure the strings size in cache objects like Bogdan mentioned below and in Microsoft.Identity.Web we serialize the whole cache, so take size that way. But in this case cache entry is an object.
Why Compact() clears the entire cache instead of reducing the cache by a certain percentage and still keeping some? #Closed Refers to: src/client/Microsoft.Identity.Client/PlatformsCommon/Shared/InMemoryPartitionedAppTokenCacheAccessor.cs:23 in 64f8718. [](commit_id = 64f8718, deletion_comment = False) |
Basically clearing full cache is simpler. With the user cache, there are multiple token types saves in different dictionaries, so clearing them in a sensible manner becomes complex. I documented this in a doc, I'll share with the team. |
| // Update cache size only if cache item is added, not updated | ||
| if (!AccessTokenCacheDictionary.TryGetValue(partitionKey, out var partition) || !partition.TryGetValue(itemKey, out _)) | ||
| { | ||
| Interlocked.Add(ref _appCacheSize, AccessTokenSizeInBytes); |
There was a problem hiding this comment.
Is it really necessary to use a variable to track the size? Isn't _appCacheSize == AccessTokenCacheDictionary.Count * AccessTokenSizeInBytes?
There seems to be some risk of not setting the size correctly even though Interlocked methods are atomic. And it is more complicated to set them in the Clear() method without locking.
There was a problem hiding this comment.
ConcurrentDictionary<string, ConcurrentDictionary<string, MsalAccessTokenCacheItem>> AccessTokenCacheDictionary
TokenCacheDictionaries are dictionary of dictionaries. with the outer key being a partition key and an inner key is an actual cache item key. So unfortunately count does not work.
There was a problem hiding this comment.
If AccessTokenSizeInBytes is added to the _appCacheSize every time a token is saved, isn't _appCacheSize == AccessTokenCacheDictionary.Count * AccessTokenSizeInBytes?
| if (!AccessTokenCacheDictionary.TryGetValue(partitionKey, out var partition) || !partition.TryGetValue(itemKey, out _)) | ||
| { | ||
| Interlocked.Add(ref _appCacheSize, AccessTokenSizeInBytes); | ||
| Interlocked.Add(ref TokenCache.CacheSize, AccessTokenSizeInBytes); |
There was a problem hiding this comment.
Hmm, I don't think so. There are two instances of TokenCache class created and then one instance has app token cache accessor and one has user accessor, but they don't know about each other. And I don't think CacheSize can be moved into a higher parent class.
|
Can we close this one @pmaytak ? |
Fixes #3020
Changes proposed in this request
Issues:
Testing
To be added.
Performance impact
Should be no impact - in add/remove accessor methods, there're a few O(1) dictionary calls.