Skip to content

[Prototype] Add cache size limit and eviction.#3104

Closed
pmaytak wants to merge 4 commits into
masterfrom
pmaytak/cache-eviction-3020
Closed

[Prototype] Add cache size limit and eviction.#3104
pmaytak wants to merge 4 commits into
masterfrom
pmaytak/cache-eviction-3020

Conversation

@pmaytak

@pmaytak pmaytak commented Jan 12, 2022

Copy link
Copy Markdown
Contributor

Fixes #3020

Changes proposed in this request

  • Added size limit option to the CacheOptions public API, which represent cache size in bytes.
  • Added cache capacity properties to the user and app cache structures and a property that tracks combined cache size to a parent level TokenCache class.
  • These properties are updated during add and remove operations. Constant token size is used. -
  • The capacity overflow check is done when adding access tokens.
  • If capacity is reached, cache is cleared, app or user, depending on which operation triggered it.

Issues:

  • It is difficult to keep the current cache size in sync in multi-threaded environments. When clearing the cache, after it hits capacity, a lock is needed to make sure the current size counter is not updated by other cache methods, since it’s set to 0. Locks can cause performance issues, as in past cases. Current cache size tracker also cannot be calculated dynamically based on the count of dictionary elements, since the storage structure is not a simple dictionary, but a nested one.
  • With the current cache structure, there are app and user cache accessors, which are separate from each other. So, a cache size tracker is needed in a class above them, which increases the complexity of keeping track of the three capacity variables.

Testing
To be added.

Performance impact
Should be no impact - in add/remove accessor methods, there're a few O(1) dictionary calls.

/// </summary>
/// <param name="useSharedCache">Set to true to share the cache between all ClientApplication objects. The cache becomes static. <see cref="UseSharedCache"/> for a detailed description. </param>
/// <param name="sizeLimit">Token cache size limit in bytes. <see cref="SizeLimit"/> for a detailed description.</param>
public CacheOptions(bool useSharedCache, long sizeLimit)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have default values here (as per any recommended practices)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can provide a realistic default value; it would be based on the user's scenario.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good question we're struggling with :). What would be a good default? ...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If default varies widely based on the user scenarios, we cannot do much. But if it primarly leans towards a certain value, that can be used. But we do not have answer, then it is good as it is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Danny also touched on a point of should there bit a minimum. Currently if someone specifies small value (like 1), cache will be compacted on each operation.

Comment thread src/client/Microsoft.Identity.Client/AppConfig/CacheOptions.cs
Comment thread src/client/Microsoft.Identity.Client/AppConfig/CacheOptions.cs Outdated
@@ -216,13 +234,25 @@ public void SetiOSKeychainSecurityGroup(string keychainSecurityGroup)
public virtual void Clear()
{
AccessTokenCacheDictionary.Clear();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threading issue: Between Clear and setting CacheSize to zero, something may get added to the dictionary and the two can go out of sync

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but we also don't lock on cache operations in general (based on eventual consistency principle). So this behavior can happen with other add/remove methods.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventual consistency is not guaranteed here. For example:

  1. Thread 1 calls the AccessTokenCacheDictionary.Clear() method and is paused.
  2. Thread 2 adds an item and updates the _appCacheSize.
  3. Thread 1 resumes execution and sets the appCacheSize to 0 (Interlocked.Exchange(ref _appCacheSize, 0)).
  4. Now there is one item in the cache but the size is 0.
  5. The same issue applies to the TokenCache.CacheSize, and in this case the error can accumulate over multiple Clear() calls.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dannybtsai Do you suggest locking? Only when clearing or always when updating the cache size?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to achieve consistency without locking as far as I can tell. And you need to always lock, not just when clearing cache otherwise it is the same problem.

@pmaytak pmaytak changed the title Initial. Add app cache size limit, check, and cache clearing. Add cache size limit and eviction. Feb 11, 2022
@pmaytak pmaytak marked this pull request as ready for review February 11, 2022 08:28
/// IMPORTANT: Monitor app health metrics (including memory usage) and cache performance (<see href="https://aka.ms/msal-net-token-cache-serialization"/>)
/// and adjust size limit accordingly.
/// </remarks>
public long? SizeLimit

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size limit applies only when cache serialization is disabled.

private void Compact()
{
_logger.Always("[UserCache] Compacting cache.");
Clear();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearing the cache because it's simpler and other forms of compaction become complex for the user cache. (For ex. removing only ATs will leave the user cache with non-AT tokens; trying to remove all tokens for a user doesn't work for OBO since partition keys are different; randomly removing token entries will leave orphaned/unassociated tokens.)

/// <summary>
/// Static, used by both app and user caches to track approximate token cache size.
/// </summary>
internal static long CacheSize = 0L;

@pmaytak pmaytak Feb 11, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache size includes user+ app token cache. 2 instances of TokenCache class are created (1 with app cache accessors, 1 with user accessors). So moved this size property up a level so it can be accessed by both. However, because of this design, clearing the cache is only done for that respective cache, app or user.

Comment thread src/client/Microsoft.Identity.Client/AppConfig/CacheOptions.cs Outdated
Comment thread src/client/Microsoft.Identity.Client/AppConfig/CacheOptions.cs
private readonly CacheOptions _tokenCacheAccessorOptions;

// Approximate size of cache item objects
private const long AccessTokenSizeInBytes = 4500;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that a constant approximation will not work well:

  1. AAD allows for configuration of optional claims
  2. POP tokens are a bit bigger because they contain the POP key id
  3. AAD is adding support for SAML tokens (there is already an OBO flow for this). We won't be able to update this aproximation for all token types.

I think there are a few ways out of this:

  1. Allow customers to define this constant (this is what SAL does). But this is a pretty "obscure" scenario
  2. Use non-constant size approximation, i.e. we know the size of the actual token (it's a string) at which we add some metadata (fairly constant)
  3. Do not expose sizeLimit, expose countLimit

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dannybtsai @henrik-me - can you please review this strategy?

@dannybtsai dannybtsai Feb 11, 2022

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use a constant value instead of the actual size (the #2 option)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgavrilMS

  1. Hmm, interesting, I'd like to see the implementation in SAL. How would customers know that constant?
  2. Yea, metadata should be fairly constant. This does seem like a better approximation.
  3. Specific to the user cache, what would the count represent: only access tokens, all tokens? Because in both cases, the count would not represent the whole cache size accurately.

@dannybtsai
How would you measure the actual size of objects in memory? We could measure the strings size in cache objects like Bogdan mentioned below and in Microsoft.Identity.Web we serialize the whole cache, so take size that way. But in this case cache entry is an object.

Comment thread src/client/Microsoft.Identity.Client/AppConfig/CacheOptions.cs Outdated
Comment thread src/client/Microsoft.Identity.Client/AppConfig/CacheOptions.cs
@dannybtsai

dannybtsai commented Feb 11, 2022

Copy link
Copy Markdown
internal class InMemoryPartitionedAppTokenCacheAccessor : ITokenCacheAccessor

Why Compact() clears the entire cache instead of reducing the cache by a certain percentage and still keeping some? #Closed


Refers to: src/client/Microsoft.Identity.Client/PlatformsCommon/Shared/InMemoryPartitionedAppTokenCacheAccessor.cs:23 in 64f8718. [](commit_id = 64f8718, deletion_comment = False)

@pmaytak

pmaytak commented Feb 12, 2022

Copy link
Copy Markdown
Contributor Author
internal class InMemoryPartitionedAppTokenCacheAccessor : ITokenCacheAccessor

Why Compact() clears the entire cache instead of reducing the cache by a certain percentage and still keeping some? #Closed

Refers to: src/client/Microsoft.Identity.Client/PlatformsCommon/Shared/InMemoryPartitionedAppTokenCacheAccessor.cs:23 in 64f8718. [](commit_id = 64f8718, deletion_comment = False)

Basically clearing full cache is simpler. With the user cache, there are multiple token types saves in different dictionaries, so clearing them in a sensible manner becomes complex. I documented this in a doc, I'll share with the team.

// Update cache size only if cache item is added, not updated
if (!AccessTokenCacheDictionary.TryGetValue(partitionKey, out var partition) || !partition.TryGetValue(itemKey, out _))
{
Interlocked.Add(ref _appCacheSize, AccessTokenSizeInBytes);

@dannybtsai dannybtsai Feb 15, 2022

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_appCacheSize

Is it really necessary to use a variable to track the size? Isn't _appCacheSize == AccessTokenCacheDictionary.Count * AccessTokenSizeInBytes?

There seems to be some risk of not setting the size correctly even though Interlocked methods are atomic. And it is more complicated to set them in the Clear() method without locking.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConcurrentDictionary<string, ConcurrentDictionary<string, MsalAccessTokenCacheItem>> AccessTokenCacheDictionary
TokenCacheDictionaries are dictionary of dictionaries. with the outer key being a partition key and an inner key is an actual cache item key. So unfortunately count does not work.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If AccessTokenSizeInBytes is added to the _appCacheSize every time a token is saved, isn't _appCacheSize == AccessTokenCacheDictionary.Count * AccessTokenSizeInBytes?

if (!AccessTokenCacheDictionary.TryGetValue(partitionKey, out var partition) || !partition.TryGetValue(itemKey, out _))
{
Interlocked.Add(ref _appCacheSize, AccessTokenSizeInBytes);
Interlocked.Add(ref TokenCache.CacheSize, AccessTokenSizeInBytes);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TokenCache.CacheSize

Can this be calculated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think so. There are two instances of TokenCache class created and then one instance has app token cache accessor and one has user accessor, but they don't know about each other. And I don't think CacheSize can be moved into a higher parent class.

@pmaytak pmaytak marked this pull request as draft February 23, 2022 07:23
@pmaytak pmaytak changed the title Add cache size limit and eviction. [Prototype] Add cache size limit and eviction. Feb 23, 2022
@bgavrilMS

Copy link
Copy Markdown
Member

Can we close this one @pmaytak ?

@bgavrilMS bgavrilMS deleted the pmaytak/cache-eviction-3020 branch August 11, 2022 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] MSAL internal cache eviction

4 participants