Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: 'cache' keyword, lazy initialization, and immutable value caching #681

Closed
jamesqo opened this issue Jun 13, 2017 · 14 comments
Closed

Comments

@jamesqo
Copy link
Contributor

jamesqo commented Jun 13, 2017

Background

Lazy initialization

There isn't really a concise way to do lazy initialization in .NET currently. The two popular patterns are 1)

private T _foo;

public T Foo => _foo ?? (_foo = InitializeFoo());

or 2)

private readonly Lazy<T> _foo = new Lazy<T>(() => InitializeFoo());

public T Foo => _foo.Value;

or 3)

private T _foo;

public T Foo => LazyInitializer.EnsureInitialized(ref _foo, InitializeFoo);

In addition to being wordy, 2) is also quite expensive and (last I checked) involves several heap allocations. All three methods require you to declare backing fields.

Caching immutable values

(I got ahead of myself and wrote the Proposal part before the Background. It would be awkward to explain it here and then re-explain it below, so just keep reading.)

Proposal

Add a cache keyword to the language. cache will accept a single expression and evaluate to an expression. This latter expression, when evaluated, will:

  • Check a flag to see if the inner expression was run.

    • If it wasn't run, run it, cache it in a field, and update the flag to say it was run.
    • If it was run, return the value of the field.
  • Will the value be cached for the current object instance or the current class?

    • cache <expr> => current instance.
    • cache static <expr> => current class.

Examples

Lazy initialization

// These are instance methods/properties.
// For static methods/properties, 'cache static' would be used.

public int ExpensiveValue => cache ComputeExpensiveValue();

private int ComputeExpensiveValue() { ... }

ExpensiveValue is => on purpose, since cache can be evaluated multiple times at little extra cost. { get; } = cache <expr> works too, but creates an unnecessary field.

Better immutability support

=> Less caching stuff in private static readonly s_ fields at the top of your file, then scrolling all the way back down.

public void Foo<T>(IEnumerable<T> source) { ... }

Foo(cache static ImmutableArray.Create(1, 3, 5));

(Note: I didn't just come up with the above example. I have an actual place in a project I'm currently working on where it would be really, really nice to use cache static. Ask for it in the comments and I'll post further details.)

This doesn't apply to just immutable collections, of course, or even just immutable types, but any place where the caller is known not to modify the parameter in question and a constant value is passed. For example, arrays are mutable, but cache is still perfect for use with string.Split:

// string.Split does not modify the contents of the array, perfectly safe to cache here
string.Split(cache static new[] { ',', ' ' }, StringSplitOptions.IgnoreEmptyEntries);

I seem to recall @NickCraver commenting how he had to cache the arrays for string.Split a lot. I couldn't find his comment, but I did find this.

Specification

var foo = cache <init-expr>;

// becomes

private bool $compilerPrefix_hasValue;
private bool $compilerPrefix_value;

if (!$compilerPrefix_hasValue)
{
    $compilerPrefix_hasValue = true;
    foo = $compilerPrefix_value = <init-expr>;
}
else
{
    foo = $compilerPrefix_value;
}

cache static would be the same code, except the fields would be static.

Notes

  • cache and cache static will not be thread-safe. It's possible the initializer expression could be called more than once. For the minority of cases where thread-safety is important, developers will be more prudent in adopting new language features and realize this. It's OK if we don't extend the benefits of this proposal to them, since they're the 5% case.

    • In addition, those devs likely want fine-grained control over their field accesses. More compiler magic would make their code harder to debug and miss out on some optimizations for their use case.
  • I decided to make instance-based caching the default with cache. Otherwise, writing

private int ExpensiveValue => cache ComputeExpensiveValue();

as above would cause ComputeExpensiveValue() to be run once for all instances, so whichever object has ExpensiveValue called first populates that field for all objects globally. I feel it would be too easy to fall into that trap if you had to write cache this or this.cache or something longer than cache.

  • We will be slightly decreasing perf when existing code that looks like
private object _foo;

public object Foo => _foo ?? (_foo = new object());

is replaced with cache new object(). The compiler doesn't know that the initialization expression doesn't compute to null, so we can't just take null to mean uninitialized. We have to introduce a separate bool field to determine that, which takes up more space.

  • However, we will still be able to use a single backing field for expressions that evaluate to non-nullable types, taking null to mean uninitialized.
@yaakov-h
Copy link
Member

Under existing cases there's also LazyInitializer.EnsureInitialized. I don't believe that has additional allocations, but it is wordy.

Also, your example specification above is multiple levels of not-thread-safe.

@jamesqo
Copy link
Contributor Author

jamesqo commented Jun 14, 2017

@yaakov-h

Under existing cases there's also LazyInitializer.EnsureInitialized. I don't believe that has additional allocations, but it is wordy.

Thank you, I forgot to mention that. Updated the proposal. Using EnsureInitialized you still have to declare a backing field, though, which the point of this proposal is to eliminate.

Also, your example specification above is multiple levels of not-thread-safe.

Yes, I am aware of that (I pointed that out as the first thing in Notes). However, I think we should not add overhead for the 95% case where devs don't care about thread-safety. Devs who do care will be more prudent before they adopt this feature, and realize it's not thread-safe.

@artcfa and @dstarkowski, I noticed you both downvoted the proposal. Please feel free to comment here on why you think this is not a good idea, or what I can do to improve it. Your feedback would be greatly appreciated.

@jamesqo jamesqo changed the title Proposal: 'cache' keyword and lazy initialization Proposal: 'cache' keyword and lazy initialization/immutability support Jun 14, 2017
@jamesqo jamesqo changed the title Proposal: 'cache' keyword and lazy initialization/immutability support Proposal: 'cache' keyword and lazy initialization Jun 14, 2017
@jamesqo jamesqo changed the title Proposal: 'cache' keyword and lazy initialization Proposal: 'cache' keyword, lazy initialization, and immutable value caching Jun 14, 2017
@dstarkowski
Copy link

@jamesqo

cache and cache static will not be thread-safe. It's possible the initializer expression could be called more than once. For the minority of cases where thread-safety is important, developers will be more prudent in adopting new language features and realize this. It's OK if we don't extend the benefits of this proposal to them, since they're the 5% case.

Where did this 5% come from? Can you back it with actual data or is it just your guess?

Static field is shared between requests in web app and even single user can trigger multiple concurrent requests. It doesn't seem like such a borderline case to me.

@ghost
Copy link

ghost commented Jun 14, 2017

@jamesqo The main reason I downvoted this proposal is that I don't think there needs to be language support for this.

Every feature is a liability and added baggage to long-term maintainability not only on the language dev side, but also for every single user of that language who will now be required to know its usage and implications - even if they don't use it themselves, they will have to deal with code that does.

Looking at it this way, I think there needs to be a really, really good reason to introduce a new language feature, which I personally don't see here. Please don't read my downvote as a review of the proposal itself (structure, clarity, depth, style, etc.) but as a note on whether I would appreciate if something along those lines existed.

@lloydjatkinson
Copy link

lloydjatkinson commented Jun 14, 2017

However, I think we should not add overhead for the 95% case where devs don't care about thread-safety. Devs who do care will be more prudent before they adopt this feature, and realize it's not thread-safe.

I think you are slightly underestimating how many people value thread safety in our modern multi-threaded/multi-core/multi-CPU workloads.

I do not see the value of this over Lazy?

@Opiumtm
Copy link

Opiumtm commented Jun 14, 2017

@lloydjatkinson

I do not see the value of this over Lazy ?

Lazy<T> is quite verbose and clumsy to go around.
You can't use instance function at inline Lazy<T> initialization, so you should assign Lazy<T> (if value calculation function isn't static) at constructor.

Example:

public class A {
    private readonly int a;
    private readonly int b;

    public A(int a, int b)
    {
        this.a = a;
        this,b = b;
    }

    private int Calculate()
    {
        return a + b;
    }

    // Compilation error!
    private readonly Lazy<int> lc = Lazy<int>(Calculate); // Can't do that! Calculate should be static for C# to compile.
}

so, you should assign lazy value inside constructor:

public class A {
    private readonly int a;
    private readonly int b;

    public A(int a, int b)
    {
        this.a = a;
        this,b = b;
        this.lc = new Lazy<int>(Calculate);
    }

    private int Calculate()
    {
        return a + b;
    }

    private readonly Lazy<int> lc;
}

I think this feature (if implemented) should be syntactic sugar around Lazy<T> (so it will be inherently thread-safe).

@svick
Copy link
Contributor

svick commented Jun 14, 2017

What happens when the cache expression depends on a value that can be different for every evaluation of the expression?

For example:

int CachedFibonacci(int i) => cache Fibonacci(i);

Is this allowed? Will it cache the value from the first evaluation? If that's the behavior, wouldn't it be confusing (as in the above example)?

@bondsbw
Copy link

bondsbw commented Jun 14, 2017

@svick I would expect the cache to hit if it had previously been evaluated with the same value for i, and miss otherwise. This feature is pretty useless otherwise.

@svick
Copy link
Contributor

svick commented Jun 14, 2017

@bondsbw So, in such case, cache would result in a full-on memoization? That's quite different from the implementation suggested in the original proposal.

@quinmars
Copy link

Here could the proposed field keyword be handy (#140). This would reduce option (3) to:

public T Foo => LazyInitializer.EnsureInitialized(ref field, InitializeFoo);

@jamesqo
Copy link
Contributor Author

jamesqo commented Jun 14, 2017

@svick To answer your question, the expression would not be able to access any local variables (no closures allowed), it would only be permitted to access static variables just as if it were a static readonly field, or static and instance variables/readonly field for instance-based caching.

Anyway, given the mostly negative community feedback, perhaps this idea is not the best. (BTW: Despite what I said, I wasn't fundamentally opposed to making this proposal thread-safe, I just thought it would be adding unnecessary overhead.)

I think @quinmars' one-liner above of using LazyInitializer and the feature in #140 is quite elegant, so perhaps that is a better solution to lazy initialization. Closing this in favor of that proposal.

@mattwar
Copy link
Contributor

mattwar commented Jun 14, 2017

I don't think the cache is really meant to be a memoization of all possible outcomes. I think it was meant to be a stand in for compute-this-the-first-time-through and store-it-in-a-field-I-don't-have-to-declare.

To be thread safe you need to be using a thread safe mechanism. The problem here is which one do you use.

If you use Lazy, you have to be okay with the additional allocation. This might be an issue in some cases, not in others.

You could use Interlocked.CompareExchange:

public ExpensiveValueType ExpensiveValue =>
{
    if (expensiveValue == default(ExpensiveValueType))
    {
         Interlocked.CompareExchange(ref expensiveValue, ComputeExpensiveValue(), default(ExpensiveValue));
    }

    return expensiveValue;
}

This perfect if the type is a reference type, or a primitive supported by Interlocked, and you are more concerned with always returning the same instance (when a reference type), as you should be when using a property, but not so concerned if the computation happens more than once sometimes due to concurrency.

If you require that the expensive computation only ever happen once, then you need to have some kind of locking. So you could write this:

public ExpensiveValueType ExpensiveValue =>
{
    if (expensiveValue == default(ExpensiveValueType))
    {
        lock (this)
        {
            if (expensiveValue == default(ExpensiveValueType))
            {
                   expensiveValue = ComputeExpensiveValue();
            }
        }
    }

    return expensiveValue;
}

This lock with a double check will work if the type is a reference type or an appropriately sized primitive (otherwise you'll likely get tearing and that would be bad.)

Otherwise, you could write it without the unguarded check:

public ExpensiveValueType ExpensiveValue =>
{
    lock (this)
    {
        if (expensiveValue == default(ExpensiveValueType))
        {
               expensiveValue = ComputeExpensiveValue();
        }
    }

    return expensiveValue;
}

Simpler, but likely to cause scaling problems as you have lots of contentions around the lock.

Also, the lock(this) pattern is frowned upon, because your users can also interact with this system supplied lock and cause you deadlocks. It's also really bad, because it is re-entrant, which is okay in situations where you know exactly what is happening inside ComputeExpensiveValue, but if its likely you are calling outside your codebase even to call system libraries, then you could be asking for trouble if this is ever run on a UI thread.

You might be able to improve on the lock contention with a ReaderWriterLock or equivalent, because you only ever have one write, but this is hard to implement correctly, as you'll have to acquire the read lock to check the value and if its default/null you'll need to release the read lock and attempt to acquire the write lock, and then check again, etc.

And you'll have to allocate this and store this lock somewhere. Is this going to be a per property per object instance lock? Are you going to share it with other cached items computed in the same object instance? Is it global?

Which one of these implementations is the compiler going to choose for you?

@yaakov-h
Copy link
Member

@mattwar if we're going to initialize something lazily, I'd think that LazyInitializer.EnsureInitialized should be a perfect fit.

However, as @quinmars suggested, a field keyword would allow a single-liner use of the various overloads, and you could then select your initialization method, including the overload of LazyInitializer.EnsureInitialized that accepts object syncLock.

@mattwar
Copy link
Contributor

mattwar commented Jun 15, 2017

@yaakov-h There is too much variability to the kind of caching you may want for your expensively computed value for the compiler to pick a single encoding of caching. LazyInitializer.EnsureInitialized might be a good fit, if your data is a class, its okay to compute it multiple times during concurrent initialization, and your initialization method is static. If its not static, then you'll probably want to avoid all the delegate allocations every time you go to access the cached value. If you don't want the initialization method to be invoked more than once, then you'll need locking. The kind of locking that gets used will have different impacts on your code. How can the compiler choose this for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants