Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: Compiler caches #18190

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft

Conversation

vzarytovskii
Copy link
Member

@vzarytovskii vzarytovskii commented Dec 30, 2024

This is a very first naive draft of universal cache for compiler internals.

Currently it's very dumb straightforward - have an underlying concurrent dictionary, evict on every insert.

Things it lacks:

  • I pretty much YOLO'd the eviction in about 30 minutes or so, and I don't expect it to be efficient.
  • Weakness handling.
  • Fair concurrent access (now everything is race-y, which is not inherently bad).
  • Shortcuts for eviction.
  • Eviction tasks+cancellation (currently the default strategy is to block for each eviction).
  • It's not the most memory-efficient soution there is, pretty much same efficiency as ConcurrentDictionaries we use currently.
  • Probably something else i've forgotten.

It currently solves the only issue versus using ConcurrentDictionarry - eviction. It can evict things, though not in the most efficient way possible.

Personally, I think, that first it has to be at least as good as ConcurrentDictionary+eviction, so we can enable some caching for tooling as well as for compiler.

Copy link
Contributor

❗ Release notes required

@vzarytovskii,

Caution

No release notes found for the changed paths (see table below).

Please make sure to add an entry with an informative description of the change as well as link to this pull request, issue and language suggestion if applicable. Release notes for this repository are based on Keep A Changelog format.

The following format is recommended for this repository:

* <Informative description>. ([PR #XXXXX](https://github.com/dotnet/fsharp/pull/XXXXX))

See examples in the files, listed in the table below or in th full documentation at https://fsharp.github.io/fsharp-compiler-docs/release-notes/About.html.

If you believe that release notes are not necessary for this PR, please add NO_RELEASE_NOTES label to the pull request.

You can open this PR in browser to add release notes: open in github.dev

Change path Release notes path Description
src/Compiler docs/release-notes/.FSharp.Compiler.Service/9.0.200.md No release notes found or release notes format is not correct

@vzarytovskii
Copy link
Member Author

I have added a CWT-based cache as well (optional, off by default), but it really needs more testing to make sure nothing is leaking.

while not cts.Token.IsCancellationRequested do
if this.GetEvictCount() > 0 then
this.TryEvictItems ()
// do! Task.Delay(100, cts.Token)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exponential backoff based on number of evicted items?

Did I evict 0 - can afford a longer delay.
A lot of stuff evicted - try again shortly after.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my idea as well

Copy link
Member Author

@vzarytovskii vzarytovskii Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I made it based on the cache utilization. It will fluctuate from 0 to 1 seconds. Worst case scenario would be if cache is populated in bursts (and has max size way off), then it will resize. Wondering if it's good enough for now? @0101 @T-Gro @KevinRansom thoughts?

@vzarytovskii
Copy link
Member Author

vzarytovskii commented Jan 6, 2025

I have added a CWT-based cache as well (optional, off by default), but it really needs more testing to make sure nothing is leaking.

FYI, I have removed it for time being, since making it part of one Cache object shown to be problematic, since in many places WeakReference or CWT will imply the use of the not struct constraint, which we don't want in generic cache. I will focus on non-weak cache (pretty much as a drop-in replacement for some existing uses of ConcurrentDictionary), and then add WeakCache separately, with keys and values-based weakness logic for places when we need to replace keys on regular basis.

let inline mkDelayedSeq (f: unit -> IEnumerable<'T>) =
mkSeq (fun () -> f().GetEnumerator())

let inline sortBy ([<InlineIfLambda>] projection) (source: ConcurrentDictionary<_, _>) =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure about InlineIfLambda here.

Comment on lines +10 to +15
// Default Seq.* function have one issue - when doing `Seq.sortBy`, it will call a `ToArray` on the collection,
// which is *not* calling `ConcurrentDictionary.ToArray`, but uses a custom one instead (treating it as `ICollection`)
// this leads to and exception when trying to evict without locking (The index is equal to or greater than the length of the array,
// or the number of elements in the dictionary is greater than the available space from index to the end of the destination array.)
// this is casuedby insertions happened between reading the `Count` and doing the `CopyTo`.
// This solution introduces a custom `ConcurrentDictionary.sortBy` which will be calling a proper `CopyTo`, the one on the ConcurrentDictionary itself.
Copy link
Member Author

@vzarytovskii vzarytovskii Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat important to note. There are almost no changes comparing to how Seq.sortBy works, it just calls into ConcurrentDictionary.ToArray, instead of internal toArray (which results in calling Enumerable.ToArray, which is not thread-safe).

| true, _ -> eviction.Trigger(key)
| _ -> () // TODO: We probably want to count eviction misses as well?

// TODO: Shall this be a safer task, wrapping everything in try .. with, so it's not crashing silently?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have some solid tests to be pretty confident it won't crash and will keep doing what it should, then it's probably fine. Also since it's executed with Task.Run, does task vs. backgroundTask make any difference?


}

// TODO: Explore an eviction shortcut, some sort of list of keys to evict first, based on the strategy.
Copy link
Member Author

@vzarytovskii vzarytovskii Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be quite easy to do - maintain some structure as shortcut (like linked list), and move pointers to either end of it based on strategy (most accessed to the end, for example), and chop beginning, since we don't care about ordering.

The question is if it's worth to allocate, maintain ans synchronise it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's worth it, especially if we want to put more items in the cache. Since eviction is super cheap then it can be done on each access. Also we then wouldn't need the CachedEntity type. At least for the eviction strategies based on recency.

I guess the only downside is that we'd have to maintain the lock to ensure thread safety. Probably wouldn't need a ConcurrentDictionary then also.

Anyway, this can always be done as a future improvement. Ideally if we have some benchmarks for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not a huge fan of dealing with concurrency myself tbh, when CD exists and is done well. I'll see what can be done with backlog.

@vzarytovskii
Copy link
Member Author

vzarytovskii commented Jan 13, 2025

This is more or less ready (baselines need updating, I will do it this week).

This, in my opinion, is good to replace ConcurrentDictionary<_,_> at least in the place where we use it to cache subsumptions. Once it's replaced, we can finally use it in tooling, which will allow to make tooling 5x-10x faster in cases when new (net7+) BCL is involved together with using a bunch of interfaces and overloads (like it was the case with OpenTK library).

Approach here is pretty straightforward, please refer to comments.

cc @KevinRansom, @0101, @T-Gro PTAL, i would love to have it under preview for tooling in 9.0.300.

Copy link
Contributor

@0101 0101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just needs some basic tests and plugging it into the place where it's needed.

type Cache<'Key, 'Value> (options: CacheOptions) as this =

let capacity = options.MaximumCapacity + (options.MaximumCapacity * options.PercentageToEvict / 100)
let cts = new CancellationTokenSource()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed when it never gets cancelled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it in case if there's ever a need for it to get cancelled (via dispose).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

4 participants