Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Records as a collection of features #3137

Closed
MadsTorgersen opened this issue Feb 1, 2020 · 66 comments
Closed

Records as a collection of features #3137

MadsTorgersen opened this issue Feb 1, 2020 · 66 comments

Comments

@MadsTorgersen
Copy link
Contributor

MadsTorgersen commented Feb 1, 2020

Records as a collection of features

As we've been looking at adding a "records" feature to C#, it is evident that there are many different behaviors that you might want from such a feature. It is not obvious that they should all be available only when "bundled" together.

When we added LINQ in C# 3.0 it was in the form of many individual new features, that were independently useful (lambda expressions, extension methods, expression trees, etc.), as well as a syntax for "bundling them together", namely query expressions.

The records feature set revolves around succinctly expressing the shape and behavior of data, which in many ways is ill-served by object-oriented defaults. Let's try to catalog individual expressiveness that you might want to use independently, and suggest ways that those could be expressed as separate language features.

At the same time, just like query expressions, we want a way that these can all come together, and towards the end I make an attempt at that.

Running examples

We'll use two extremely simple running examples. One is a simple data class, and the other is a hierarchy of an abstract base class and a derived class (of which there would presumably be more). The former represents the simplest use case:

public class Point
{
    public int X { get; set; }
    public int Y { get; set; }
}

The latter example represents the use case that other languages use discriminated unions for: a family of data shapes united by a common type:

public abstract class Person
{
    public string Name { get; set; }
}
public class Student : Person
{
    public int ID { get; set; }
}

I've expressed them above in the simplest form allowed today. The simplicity leads to the following:

  • They have ordinary reference equality and make no attempt at comparing property-by-property.
  • They have only the default nullary constructors
  • They are mutable, and object initializers can be used to create them.
  • They have no validation logic
  • They have no easy way of creating a new object from an existing one
  • They have no deconstruction

The following seems like common things you'd want to achieve independently or together, that are currently either difficult, verbose or downright impossible:

  • Value-based equality, where two objects of the same type are considered equal if their pertinent properties are equal
  • Simpler constructor declarations that need not manually and explicitly
    • declare parameters for each of the properties,
    • assign the parameters to the properties, or
    • pass them to base constructors.
  • Object initializers for immutable properties
  • Validation logic that doesn't cause you to "fall off a cliff" and lose other terseness benefits
  • Non-destructive mutation creating a new object based on an existing one and a list of modifications to individual properties.
  • Automatic deconstructors created from constructors

Value-based equality

Hand-implementing value-based equality is hard, cumbersome and error-prone - especially when inheritance is involved.

With the original records proposal we figured out how to augment the relatively straightforward automatic generation of equality for a given type with the structure that allows correct value-equality across a hiearchy of types.

The difficulty with implementing value-based equality across a type hierarchy lies in ensuring symmetry - that the two values agree on the equality being applied. In order to ensure that, types with value-based equality (or any custom equality) that can participate in a hierarchy of mutually comparable objects must essentially agree on who implements their equality. They can do that by declaring a Type-valued virtual property EqualityContract in the root class of the hierarchy, with every derived type that alters equality overriding that to return its own type. Part of the equality implementation then is to compare EqualityContract as well as the individual data members:

public abstract class Person // Root of hierarchy with value equality
{
    public string Name { get; set; }
    protected virtual Type EqualityContract => typeof(Person);
    public override bool Equals(object other) =>
        other is Person that
        && object.Equals(this.EqualityContract, that.EqualityContract)
        && object.Equals(this.Name, that.Name);
}

public class Student : Person // derived class
{
    public int ID { get; set; }
    protected override Type EqualityContract => typeof(Student);
    public override bool Equals(object other) =>
        base.Equals(other) // checks EqualityContract and Name
        && other is Student that
        && object.Equals(this.ID, that.ID);
}

In order to auto-generate such support for value-based equality we need

  • an indication that value-based equality is desired
  • a list (possibly empty) of properties (or fields) that participate in equality

Strawman: value members

Allow properties and fields to have a value modifier. If they do, equality-related members are declared and/or overridden to define equality in terms of those members, together with any inherited value equality, as specified by the records proposal.

public class Point
{
    public value int X { get; set; }
    public value int Y { get; set; }
}

Generates something like:

public class Point
{
    public int X { get; set; }
    public int Y { get; set; }
    
    protected virtual Type EqualityContract => typeof(Point);
    public override bool Equals(object? other) =>
        other is Point that
        && this.EqualityContract == that.EqualityContract
        && this.X == that.X
        && this.Y == that.Y;
    public override int GetHashCode() => ... X ... Y ... ;
}

If value-based equality is inherited from a base class (i.e. it already has an EqualityContract), there'll be base calls to include that part of the equality computation:

public abstract class Person
{
    public value string Name { get; set; }
}
public class Student : Person
{
    public value int ID { get; set; }
}

Would turn into something like:

public abstract class Person
{
    public string Name { get; set; }
    
    protected virtual Type EqualityContract => typeof(Person);
    public override bool Equals(object? other) =>
        other is Person that
        && this.EqualityContract == that.EqualityContract
        && this.Name == that.Name;
        public override int GetHashCode() => ... Name ... ;
}
public class Student : Person
{
    public int ID { get; set; }
    protected override Type EqualityContract => typeof(Student);
    public override bool Equals(object? other) =>
        base.Equals(other)
        && other is Student that
        && this.ID == that.ID;
        public override int GetHashCode() => ... base.GetHashCode() ... ID ... ;
}

Open questions:

  • Generate IEquatable<T> implementation as well?
  • generate equality operators == and !=?
  • What exactly to generate in equality check - which kinds of equality to use recursively?
  • How to indicate that value-based equality is desired if no properties (e.g. in base class)?
  • How to decide if a derived class that inherits value equality and adds no new members should override the EqualityContract or not?
  • What to do if value members on classes are mutable? Is this just fine, or should we warn that these will e.g. be lost in dictionaries if they mutate?

Strawman: value types

The value modifier could be allowed on types as well as on members. Just like the presence of a value member this would cause the type to generate or override value equality.

This addresses the issue with value members that you cannot get generated value equality with no (additional) members, whether in the base or a derived type. For instance

public abstract value class Entry
{
    ... // No value members
}

which generates:

public abstract class Entry
{
    ... // No value members
    protected virtual Type EqualityContract => typeof(Entry);
    public override bool Equals(object other) => 
        other is Entry that 
        && this.EqualityContract == that.EqualityContract;
    public override int GetHashCode() => ...;
}

This is a key "discriminated union" scenario, if we want to easily express value-based equality that works across a whole set of variants expressed as classes derived from an empty base class.

It also addresses being able to express whether a derived class with no new value members should override the EqualityContract or not:

public class GraduateStudent: Student { ... }       // Can be equal to other kinds of Student
public value class GraduateStudent: Student { ... } // Can only be equal to other GraduateStudents

Strawman: value type implies value members

In addition, we may want to have the value member on a type imply that all public properties and fields participate in equality. It would be a nice shorthand:

public value class Point
{
    public int X { get; set; } // value is implied
    public int Y { get; set; } // value is implied
}

On the other hand this would make it impossible to specify no-member value equality when a public property, e.g. a computed property, is present. For overrides that could be achieved manually by simply overriding the EqualityContract property, but for the root of the hierarchy it is not quite that simple.

Removing construction boilerplate

Classes often have a lot of trivial boilerplate around declaring a member, declaring a corresponding constructor parameter, and then initializing the member. Before auto-properties there used to be even more, requiring both a property and a backing field, and that's still sometimes the case when auto-properties don't serve the needs.

To this end the records proposal has a primary constructor, where the class itself allows a parameter list, causing members to be automatically declared and initialized from those parameters.

Strawman: Direct constructor parameters

One way to eliminate some of the boilerplate would be to allow a constructor parameter list to directly mention members to be initialized instead of declaring a new parameter. A parameter would implicitly be declared of the same name and type as the member, and the initialization would happen at the beginning of the constructor body, in the order of appearance.

public class Point
{
    public int X { get; }
    public int Y { get; }
    
    public Point(X, Y) 
    { 
        // ... validation 
    }
}

Would generate:

public class Point
{
    public int X { get; }
    public int Y { get; }
    
    public Point(int X, int Y) 
    { 
        this.X = X;
        this.Y = Y;
        // ... validation 
    }
}

One problem is repeated initialization in the face of inheritance. We could have a more discerning rule such that both direct and inherited members are allowed, but we only initialize the direct ones, whereas the inherited ones are expected to be passed to base:

public abstract class Person
{
    public string Name { get; }
    public Person(Name) { } // Name is initialized
}
public class Student : Person
{
    public int ID { get; }
    public Student(Name, ID) : base(Name) { } // Only ID is directly initialized here
}

Repeated initialization could still occur if a constructor calls another one from the same class with this(...). We could consider warning on this (they would have to use an ordinary parameter in the calling constructor), or we could refine the rule to not generate initialization of a property whenever a parameter is passed to this(...) or base(...).

Strawman: Primary constructors

We've often talked about allowing (and once almost shipped) primary constructors for all classes. The class name would be optionally followed by a constructor parameter list, and any base class could be followed by an argument list to call a base constructor with:

public abstract class Person(string name)
{
    public string Name { get; } = name;
}
public class Student(string name, int id) : Person(name)
{
    public int ID { get; } = id;
}

Other constructors in the body would have to call directly or indirectly through the primary constructor through this(...). The parameters would be in scope for initializers. Possibly they'd also be available in the body of function members, and would automatically get captured into a private field if necessary.

We could adopt special syntax to provide a constructor body for primary constructors, e.g.:

public abstract class Person(string name)
{
    public Person { if (name is null) throw new NullArgumentException(nameof(name)); }
    
    public string Name { get; } = name;
}

Direct constructor parameters and primary constructors have great synergy:

public abstract class Person(Name)
{
    public Person { if (Name is null) throw new NullArgumentException(nameof(Name)); }
    
    public string Name { get; };
}

Strawman: Primary constructor member declarations

For primary constructors we could allow a further shorthand of not just mentioning but directly declaring a property or field of the enclosing class in the primary constructor parameter list:

public abstract class Person(public string Name { get; });

If this is allowed, there are likely to be a lot of classes with empty bodies. We could allow such class declarations to end in ; instead of {}, as shown above.

Open questions:

  • Exact syntax of having member declarations in a comma-separated list (elide trailing ;s etc)?
  • Allow parameter modifiers on members in a constructor parameter list as well (params)?

Strawman: Primary constructor "inheritance"

One big source of boilerplate is the repetition required when the constructor of a derived class needs to take all the constructor parameters of the base class, only to directly pass them on to the base constructor.

We could introduce a syntax for automatically "inheriting" all the constructor parameters and passing them on to the base class. To do so would require

  • A syntax for opting into that.
  • A designated constructor in the base class to inherit from. In practice it seems that would be the primary constructor, which would then have to be identified as such in metadata so that the derived class can "know" which one to inherit from.

For example:

public abstract class Person(string name)
{
    public string Name { get; } = name;
}
public class Student(..., int id) : Person
{
    public int ID { get; } = id;
}

The ... means that the primary constructor parameters of the base class (string name in this case) are copied in with the same name, type and order, and that those parameters are implicitly passed on to the base:

public class Student(string name, int id) : Person(name)
{
    public int ID { get; } = id;
}

Improvements for object inititalizers

Object initializers are an alternative/supplement to constructors which allows avoiding initialization code entirely at the declaration site, including constructors and their chaining, and provides flexibility at the call side (which properties in which order). The big downside is that they require the properties/fields to be mutable!

It would be desirable to allow this even for immutable properties and fields. Unfortunately, we cannot just start allowing object initializers on existing readonly fields and getter-only auto-properties, because that would bypass any constructor-based validation logic that authors were relying on. It would have to be enabled only for fields and properties that opt in.

Strawman: Init-only properties

We could introduce a new init accessor to properties, which is mutually exclusive with set. It works in the same way as a set accessor, except that it can only be used in the initialization of the object - including from an object initializer (or with-expression - more about those later).

public class Point
{
    public int X { get; init; }
    public int Y { get; init; }
}

var p = new Point { X = 5, Y = 3 }; // OK
p.Y = 7; // ERROR, Y is not settable

Also, we could consider requiring that an init-only auto-property that is not initialized by the constructor or has an initializer, must be initialized upon construction.

var p = new Point { X = 5 }; // ERROR, Y must be initialized

Open questions:

  • An implementation strategy cannot just erase to setters, since previous/other compilers would not respect the immutability.

Strawman: validation accessors for auto-properties

One common way to "fall of the cliff" with auto-properties is to need validation (or other) logic in the setter. We could extend auto-properties so that a set { ... } body can be provided. What still identifies it as an auto-property is that the get; accessor is empty.

A setter body in an auto-property doesn't have to - and doesn't get to - assign to the backing field, which remains anonymous. It is assigned automatically before (or after?) the specified setter body runs. The value contextual keyword is available as in all setters, and can be used for side effects (such as raising exceptions).

    public string Name { get; set { if (value is null) throw new ArgumentNullException(nameof(Name)); } }

This would of course blend well with init-only properties, where validation logic could be placed directly in the init accessor of an auto-property.

Strawman: object initializers for direct constructor parameters

When a connection between a constructor parameter and a property is specified (e.g. through direct constructor parameters), we could let a caller initialize the property through an object initializer, but have it mapped to a constructor parameter. For a class with direct constructor parameters:

public class Point
{
    public int X { get; }
    public int Y { get; }
    public Point(X, Y){ }
}

We could have:

var p = new Point { X = 3, Y = 5 };

Translate to:

var p = new Point(3, 5);

Non-destructive mutation and data classes

With immutable objects it is common to want to produce a new object from an old one, with just a few properties changed. There's an attractive object-initializer-like syntax we could use for that:

var p2 = p1 with { X = 4 };

The idea is that p2 is an exact copy of p1 - including its runtime type and the value of properties that are not statically known at this point in the code - except for the changes specified in the object initializer part of the expression.

The question is what that means? There are two seemingly competing approaches. One overall question is whether "withing" is something that needs to be opted into? Can anyone copy any object by saying o with {}? If you need to opt in, what does that look like?

Strawman: withers through copy-and-update

For classes that rely on object initializers for property initialization and validation the desired behavior would be:

  1. Copy the object exactly, through MemberwiseClone (can be done without opt-in but is it "safe"?) or some required virtual clone method on the class.
  2. Overwrite properties according to the object initializer, only allowing the ones with a set or init accessor to be changed.

Validation happens per member, as the properties are called.

Strawman: withers through virtual factories

For classes that rely on constructors for property initialization and validation the desired behavior would be:

  1. Call a virtual factory method on the statically known class of the object to be copied, passing in a value for all the statically known properties: the one from the object initializer if provided, its existing value otherwise
  2. The virtual method is overridden in the runtime type of the object to call that type's constructor with all the argument passed to it, and the existing values of all other properties on that type.

Validation happens again on all values passed through, even the ones that weren't changed, since the constructor that's ultimately called can't tell the difference.

For this kind of wither, several things would need to be in place for the compiler to know what to generate, both for the implementation of the virtual factory (let's call it With) and for the call to it:

  • Whether a wither is desired. There needs to be some sort of opt-in, as withers aren't generated on existing type declarations today.
  • Which constructor the wither implementation should call. Essentially, one constructor needs to be the annointed the "main" one - in practice this probably means requiring a primary constructor. That changes the meaning of primary constructors from just a shorthand to a special designation, which in terms means that it is less ok to "fall off the cliff" where a primary constructor cannot express the desired constructor semantics.
  • Which property or field corresponds to each parameter of the primary constructor. Since the generated With method needs to generate a constructor call, it needs to be able to collect the arguments to it from a) the properties assigned in the with expression and b) the existing property values in the source object. In practice it seems all the primary constructor parameters need to be property parameters

I'm going to tentatively "burn" the data modifier for the purpose of designating that a wither is desired. Later I'm going to hang more off of that.

public data class Point(X, Y)
{
    public int X { get; }
    public int Y { get; }
}
var p2 = p1 with { Y = 2 };

Would generate:

public class Point(X, Y)
{
    public int X { get; }
    public int Y { get; }

    public virtual Point With(int X, int Y) => new Point(X, Y);
}
var p2 = p1.With(p1.X, 2);

Data classes inheriting other data classes are required to override the wither of the base class:

public data class Person(Name)
{
    public string Name { get; }
}
public data class Student(..., ID) : Person
{
    public int ID { get; }
}

Generates:

public abstract class Person(Name)
{
    public string Name { get; }
    public virtual Person With(string Name) => new Person()
}
public class Student(..., ID) : Person
{
    public int ID { get; }
    
    public sealed override Person With(string Name) => With(Name, this.ID);
    public virtual Student With(string Name, int ID) => new Student(Name, ID);
}

Strawman: Auto-generated deconstructors

For "positional" data types, in particular small ones, it is often convenient to have a positional deconstructor that is the "inverse" of the primary constructor.

In order to auto-generate a deconstructor the compiler would need to know:

  • Whether a deconstructor should be generated.
  • Which constructor to "mirror". In practice there needs to be a primary constructor.
  • Which property or field corresponds to each parameter of the primary constructor

Those are exactly the same requirements as for withers, when implemented as virtual factories! It seems reasonable that deconstructors are controlled by the same opt-in as withers. In that case, all data classes would generate a wither as well as a deconstructor:

public data class Point(X, Y)
{
    public int X { get; }
    public int Y { get; }
}

Would generate:

public class Point(X, Y)
{
    public int X { get; }
    public int Y { get; }

    public virtual Point With(int X, int Y) => new Point(X, Y);
    public void Deconstruct(out int X, out int Y) => (X, Y) = (this.X, this.Y);
}

Strawman: Abbreviated data members

As proposed the data keyword requires that all primary constructor parameters map to a property or field, so they may be of the form X (referencing a declared member X) or public int X { get; } (if we allow members to be declared directly as constructor parameters), but we would not allow ordinary constructor parameters int X.

This means that the syntax int X is "free" to be used otherwise in data class primary constructor parameters. We could make it a shorthand for declaring a public getter-only property on the class, just as proposed in the records proposal:

public data class Point(int X, int Y);

Generates the same as above - in particular the members

    public int X { get; }
    public int Y { get; }

For explicit member declarations, we could also consider coopting the "default" meaning of int X; to generate public init-only properties. This would allow a similar shorthand for data members that aren't part of the primary constructor, and thus for data classes that are less "positional" and more "nominal":

public data class Point { int X; int Y; }

This would be shorthand for the class declaration:

public data class Point
{
    public int X { get; init; }
    public int Y { get; init; }
}

Strawman: Implied inherited constructors

In derived data classes we could not only require the primary constructor inherits is base members with ..., we could simply make it so, letting you - or making you - leave out the ... and just concatenating the constructor parameters by default. So:

public class Student(int ID) : Person;

Means the same as

public class Student(..., int ID) : Person;

Strawman: data classes as value classes

Across all of the above, there are two new main "kinds" of classes proposed: Value classes which automatically support value-based equality over a set of members, and data classes which automatically support non-destructive mutation and deconstruction. Both are brought into play with a modifier = value and data respectively - and while both operate over a set of members, it is not obvious that those would necessarily be the same members.

How can we most seemlessly and naturally combine them?

It does not seem appetizing to suggest that you need to use both the value and data modifiers if you want both sets of functionality. After all, both features are useful primarily in scenarios where data is immutable (otherwise value-based equality is "dangerous" when combined with e.g. dictionaries, and non-destructive mutation is really unnecessary when you have old-fashioned destructive mutation!), and it is going to be very common to want to apply them both.

Would it be reasonable to say that all value classes are data classes? Probably not. Data classes come with a lot of restrictions that don't seem warranted for value classes. You can certainly imagine classes that just want to add value equality, without being forced into primary constructors, mapping between constructor parameters and members, etc. Also, you may not want to allow your objects to be copied! Turning that off would not be easy, as you couldn't just provide your own implementation of something to take precedence over a generated one - the "something" is a With method that you don't want to have even exist!

Would it be reasonable to say that all data classes are value classes? Probably. The whole notion of non-destructive mutation sort of implies that object identity doesn't really matter, and that multiple physical objects can represent the same "value" at different times. If you really do want to keep reference equality by default, we could let you explicitly (and easily) implement it yourself, and let that implementation prevent one from being generated.

So the proposal is that data on a class also implies value equality. On which members, though?

I'd propose that this is somewhat left to the user, in the following way:

  1. For any explicit member declarations, the value keyword would need to be manually applied.
  2. For abbreviated data members (in the primary constructor or the class body), value is implied.

Thus:

public data class Point(int X, int Y);    // `X` and `Y` are abbreviated and participate in equality
public data class Point { int X; int Y; } // `X` and `Y` are abbreviated and participate in equality
public data class Point
{
    public value int X { get; init; }     // value means X` participates in equality
    public int Y { get; init; }           // Y does not ask to participate in equality
}

This does leave a small wrinkle, similar to one we saw with value classes above: What if I want a derived data class with no new members that is "equality compatible" with its base class (doesn't override EqualityContract)? For value classes we solved it by whether the value modifier was on the type, but that's a no go here.

My best proposal is to allow the (...) or () empty primary constructor to be omitted, and for that to mean that value equality isn't overridden. Not the most obvious syntactic hint, but right there for the taking:

public data class GraduateStudent : Student;

This declaration overrides the With method, but not the EqualityContract.

Conclusion

The above many sub-proposals together span most or all of what we've talked about records doing. In the end, data classes become the combined feature that lets you get (nearly) the same brevity as the all-in-one proposals, while many aspects are still factored out to be usable independently, most notably value equality.

There are many details to iron out but I think this paints a fairly promising picture of how full-blown records with unprecedented inheritance resilience can be achieved in a gradual, "cliff-less" fashion.

LDM notes:

@MadsTorgersen MadsTorgersen added this to the 9.0 candidate milestone Feb 1, 2020
@MadsTorgersen MadsTorgersen self-assigned this Feb 1, 2020
@HaloFour
Copy link
Contributor

HaloFour commented Feb 1, 2020

There's a lot to love here, especially that complex data structures can be achieved through lots of little features that can be combined together in interesting ways.

One notable version that appears to be missing would be the feature that enables "case classes", or very abbreviated data carriers, such as data class Upc(string Value); where positional construction/deconstruction, equality and public readonly members are all generated by the compiler. These are a common construct in Scala and the basis for ADTs in Scala 3.0. I know that DUs are on the list of things that the C# team are interested in so it would be nice to see how these data carriers and enums could come together to solve similar problems.

Also, I'm still dubious on potential designs around "init-only" properties. Don't get me wrong, I like the idea of extending object initialization, but the designs I've seen so far all seem to involve trickery such as exposing but trying to hide mutator methods. I'd hope that whatever solution could fit into the existing ecosystem and downlevel compilers.

@qrli
Copy link

qrli commented Feb 1, 2020

For direct constructor:

1st, to make it less repetitive and more identifiable from other members, could you consider some alternative syntax like:

public class Point
{
    public int X { get; }
    public int Y { get; }
    
    public new(X, Y) 
    { 
        // ... validation 
    }
}

2nd, there could be times both a property name and some other parameters are needed. e.g.

public class Point
{
    public int X { get; }
    public int Y { get; }
    
    public new(X, Y, bool skipValidation = false) 
    { 
	if (skipValidation) return;
        // ... validation 
    }
}

3rd, the direct constructor can also solve the common dependency injection case, if we declare dependency as public property. If it can allow non-public properties and fields, that would be perfect. E.g.

public class ProductController: Controller
{
	public ProductService Products { get; }
	private readonly ILogger<ProductController> logger;
	
	public ProductController(
		Products, // will work as in OP's proposal
		logger, // if non-public can be allowed
		IOptions<MyOption> options) // some parameter which will be read once.
	{
	}
}

For primary constructor:

public abstract class Person(public string Name { get; });

While gaining the ability, I'd hope to avoid multiple ways to do the exactly same thing without enough difference, which is a main issue with C++. The terse form class Person(string Name) is appreciateable, but with with property declaration inside, it feels better to be a normal property declaration.

@orthoxerox
Copy link

I have a few reservations about the features suggested here. ... in particular seems rather inconvenient to read post factum, especially when the base class is defined not a few lines above, but in a whole different file.

@canton7
Copy link

canton7 commented Feb 1, 2020

For with, I'd like to suggest taking a leaf out of Rust's book and using something like the following syntax:

var p2 = new Point { X = 1, ..p1 };

To someone unfamiliar with with syntax, I think this more obviously indicates that a new object is being constructed: they can get as far as "This is creating a new Point with X = 1", which puts them in a good place to guess the rest. With the proposed with keyword, it's not even clear that a new object is being created.

I think it would also be clearer to someone skimming code: we're used to looking for new to indicate construction.

The case of cloning an object is also a bit clearer with this syntax IMO:

var p2 = p1 with {};
var p2 = new Point { ..p2 };

@qrli
Copy link

qrli commented Feb 1, 2020

@canton7 Javascript also has a spread operator to do basically the same. But in a strong-typed language, it has a drawback that you have to know and write the exact type, which typically is expected to be the same as p1. But the exact type may not always be known at compile time.

@TonyValenti
Copy link

The concern that I have is that it seems that changing the orders of your property could break the with call since it boils down to a method. I think it would be better if members were set on a mutable, anonymous struct and that struct was passed in as a single parameter to the constructor.

@Clockwork-Muse
Copy link

... in my admittedly somewhat limited experience, record-like data structures in languages tend to be non-inheritable, because value equality breaks immediately in the face of inheritance. Which means you'd want some sort of transform-and-compare on some specific set of properties. I assume that shapes would help in this area as well.

@quinmars
Copy link

quinmars commented Feb 1, 2020

There are many interesting points and I'm looking forward to the records feature. What I don't like is the proposed "validation accessors for auto-properties". The syntax is very limited. What if you want to assign an empty string in the given example instead of throwing? How do you get the previous/old value? I prefer here proposal #140. The given example could become:

public string Name { get; set => field = value ?? throw new ArgumentNullException(nameof(Name)); }

And with proposal #2145 we might simply write:

public string Name! { get; set; }

@svick
Copy link
Contributor

svick commented Feb 2, 2020

I don't quite understand the need for EqualityContract. In my opinion, two objects of different runtime types should not compare as equal, even if they both have the same members. If there are some special cases where this is desirable, those can be implemented manually, but the common case shouldn't be made more complicated just to make those special cases easier.


Direct constructor parameters and primary constructors have great synergy:

public abstract class Person(Name)
...

I'm not sure this synergy is great. Code that is placed in such a prominent place (the first line of a class declaration), should stand on its own, it should not depend on something inside the body of the class. Otherwise, it makes understanding the code harder, not easier.

@Joe4evr
Copy link
Contributor

Joe4evr commented Feb 2, 2020

@svick Making two objects of different runtime types not compare as equal is the purpose of the EqualityContract, making the common case easier, not the special cases.

@svick
Copy link
Contributor

svick commented Feb 2, 2020

@Joe4evr But you don't need EqualityContract for that, GetType() would suffice. Also, the OP explicitly considers situations where different runtime types would compare as equal, e.g.:

How to decide if a derived class that inherits value equality and adds no new members should override the EqualityContract or not?

@canton7
Copy link

canton7 commented Feb 2, 2020

Just because noone's mentioned it yet: the proposed EqualityContract breaks the Liskov Substitution Principle (as does GetType, or indeed any approach which maintains symmetry).

Value equality among inherited types is fundamentally thorny (and often best avoided altogether, or at least carefully considered), and I'm not convinced it's something that the language itself should be getting involved with.

(This doesn't apply to value equality among DU members, of course)

@Richiban
Copy link

Richiban commented Feb 2, 2020

The spread operator is perfect for Javascript / Typescript because of the lack of typing or structural typing respectively. In those braces you can spread almost any object.

In C# however, you would only be able to spread on object of the same type as the target, which means it only really makes sense to be able to spread a single source object.

"With" syntax makes more sense in a nominal typing environment.

@MgSam
Copy link

MgSam commented Feb 3, 2020

  • "Direct constructor parameters" seems like an inferior version of TypeScript's property parameters, which is much more terse and also more clear. It's especially unclear in the context of primary constructors where the parameter list comes before the property declaration.

  • Making init properties required defeats one of the main benefits of initialization syntax - that not every property is required. In the "real world", initialization is often used to configure classes with tons of optional properties that would be painful to build constructors for.

  • My favorite part of this proposal that I haven't seen suggested before (maybe I missed it?) is the ... syntax to declare and pass all the base constructor parameters back up base constructor. It's rather understated here- but the need to do so is a huge syntax burden in C# and other OOP languages and also a very painful example of the brittle base class problem (you add a parameter to the base class constructor, you break every inheritor). This solution solves that long-standing problem. Regardless of the outcome of the records proposals, I think it should be its own proposal and considered on its merits.

  • I don't understand the need for EqualityContract. The benefit for all this additional complexity was not well explained in the proposal.

  • I agree with @quinmars that the proposed auto property validation syntax doesn't really give you very much when what people really want is access to the hidden field. There have been tons of issues over the years opened asking for this; a little puzzling to see it not even mentioned as a consideration.

@orthoxerox
Copy link

@MgSam

Init properties are there to reuse the initializer syntax. If you have more than, say, four properties, constructors become cumbersome.

EqualityContract lets you ensure you always compare two values of the same runtime type. Otherwise you could compare an instance of the base class on the left with an instance of the derived class on the right using the base class logic.

@yaakov-h
Copy link
Member

yaakov-h commented Feb 3, 2020

What benefit does EqualityContract offer over emitting something like && other.GetType() == typeof(ThisClassRightHere)?

@CyrusNajmabadi
Copy link
Member

It allows the subclasses to decide the contract. For example, i may decide in my case that my equality contract is such that i have:

class Base { } sealed class Derived1 : Base { } sealed class Derived2 : Base { }

And i want Derived1/Derived2 to be equatable (because htey represent the same value-oriented data, perhaps with different impls for efficiency).

I explicitly do not want a pregenerated check that the types must be equal. instead, i want to say that i can compare the types as long as they agree on the equality contract.

@dsaf
Copy link

dsaf commented Feb 3, 2020

Will single-value primitive type wrappers carry on having a performance overhead or will you be able to optimise them?

public data class PersonName(Value)
{
    public string Value { get; }
}

#259
#410
#1170
#1695

@YairHalberstadt
Copy link
Contributor

Would all of these features apply to structs? What differences would there be between structs and classes?

@svick
Copy link
Contributor

svick commented Feb 3, 2020

@CyrusNajmabadi

And i want Derived1/Derived2 to be equatable (because htey represent the same value-oriented data, perhaps with different impls for efficiency).

I explicitly do not want a pregenerated check that the types must be equal. instead, i want to say that i can compare the types as long as they agree on the equality contract.

To me, that seems to be niche use case and I think it's not a good enough reason to make the feature more complicated for everyone.

@ValentinLazar
Copy link

ValentinLazar commented Feb 3, 2020

Would using data and value as reserved keywords cause breaking changes to existing programs? Those seem like pretty common names of variables, so just wondering what impact this would have.

@HaloFour
Copy link
Contributor

HaloFour commented Feb 3, 2020

@ValentinLazar

They'd be contextual keywords in that they only act like keywords when used as modifiers, where they're not currently legal. You could still have identifiers data and value, and value would still be the implicit parameter name for property setters.

@MadsTorgersen
Copy link
Contributor Author

@HaloFour :

One notable version that appears to be missing would be the feature that enables "case classes", or very abbreviated data carriers, such as data class Upc(string Value); where positional construction/deconstruction, equality and public readonly members are all generated by the compiler.

The intention is that this works. data implies withers, deconstructors and value equality, and allow string Value to imply a public getter-only property.

@MadsTorgersen
Copy link
Contributor Author

@qrli:

public new(X, Y)

I guess this would be an orthogonal proposal to allow constructors to be specified with the new keyword instead of the type name. I don't know that it's worth it, really, but it wouldn't help or hinder any of the proposals in my write-up. Once you start using primary constructors, it wouldn't make a difference.

@MadsTorgersen
Copy link
Contributor Author

@qrli:

there could be times both a property name and some other parameters are needed. e.g.
[...]
public new(X, Y, bool skipValidation = false)

Other than your proposed new syntax, my write-up is intending to allow that. Direct constructor parameters can be mixed freely with ordinary constructor parameters, as in:

public Point(X, Y, bool skipValidation = false)

@YairHalberstadt
Copy link
Contributor

Should you be allowed to use value/key on an interface to define value equality on the interface members?

For example:

value interface IPoint
{
     public int X { get; }
     public int Y { get; }
}

public data class CartesianPoint(int X, int Y) : IPoint;

public data class PolarPoint(int Radius, int Angle) : IPoint
{
    public int X => ...
    public int Y => ...
}
...
new PolarPoint(0, 0).Equals(new CartesianPoint(0, 0)

@MrJul
Copy link

MrJul commented Feb 4, 2020

Strawman: Abbreviated data members

[...]

For explicit member declarations, we could also consider coopting the "default" meaning of int X; to generate public init-only properties. This would allow a similar shorthand for data members that aren't part of the primary constructor, and thus for data classes that are less "positional" and more "nominal":

public data class Point { int X; int Y; }

This would be shorthand for the class declaration:

public data class Point
{
    public int X { get; init; }
    public int Y { get; init; }
}

I personally don't like that adding the data keyword to the class changes the declaration from a private read-write field to a public init-only property: imho the containing type shouldn't completely change what a declaration means. (It's not a problem with the primary constructor since it's a completely new syntax form.)

Plus, it makes it impossible to add a private field to the class without having to rewrite it entirely using the old syntax.

@Joe4evr
Copy link
Contributor

Joe4evr commented Feb 4, 2020

Plus, it makes it impossible to add a private field to the class without having to rewrite it entirely using the old syntax.

Says who?

@MadsTorgersen
Copy link
Contributor Author

@YairHalberstadt:

Should you be allowed to use value/key on an interface to define value equality on the interface members?

Maybe? Not sure what it would mean exactly.

@MadsTorgersen
Copy link
Contributor Author

@MrJul:

I personally don't like that adding the data keyword to the class changes the declaration from a private read-write field to a public init-only property: imho the containing type shouldn't completely change what a declaration means. (It's not a problem with the primary constructor since it's a completely new syntax form.)

I get that criticism. At the same time, I want the abbreviation, and I can't think of a better way to trigger it.

@MadsTorgersen
Copy link
Contributor Author

@MrJul:

Plus, it makes it impossible to add a private field to the class without having to rewrite it entirely using the old syntax.

Just add a private keyword and you're no longer in abbreviation land.

@MadsTorgersen
Copy link
Contributor Author

@Richiban:

@orthoxerox

One could argue that if new Point(1, 2).Equals(new Point(1, 2)) then new Point(1, 2).Equals(new ColoredPoint(1, 2, Color.Red)) must return true as well

Whole loada nope from me on this one.

I don't think @orthoxerox is arguing that we should embrace this, and neither would I. I believe the LSP requires that you can substitute a subtype for a supertype and the code would still work, not that the behavior would be the same. Otherwise method overriding would pretty much be banned.

@theunrepentantgeek
Copy link

theunrepentantgeek commented Feb 9, 2020

Reading through the LDM notes for Jan 29, 2020, I was surprised to see them two approaches considered for "wither" implementation; were other approaches previously considered and rejected, or is there still scope for new ideas to be introduced?

FWIW, the idea that popped into my head was only allow withers to be used on types that have a copy constructor, and to solve (as a prerequisite) initialization syntax for immutable types.

To reuse the same running example, a copy constructor for Point would be this:

public class Point
{
    public Point(Point original)
    {
        X = original.X;
        Y = original.Y;
    }
	
    // elided
}

For record types, this is trivially generated; for custom types, authors can easily opt-in to support of withers by writing one themselves.

Using a copy constructor avoids the dangers of using MemberwiseClone() (e.g. having two instances share an internal List<T>) as well as performance issues.

Subsequent modification of the clone could take either of two different paths. If the property is writable, it can be directly set. If not, the same technique as introduced for initialization expressions of get-only properties could be reused.

To illustrate, for a mutable Point class:

var p1 = new Point(2, 4);
var p2 = p1 with { Y = 14 };

would generate

var p1 = new Point(2, 4);
var p2 = new Point(p1);
p1.Y = 14;

Does this approach have a fatal flaw that I'm not seeing?

@alrz
Copy link
Contributor

alrz commented Feb 11, 2020

withers through copy-and-update

Is there going to be a way to use withers though only update? - mutating an existing value and return

Save((db.Get() ?? new()) with { Property = newValue });

Even though immutable with would be desirable in a lot of cases, there are places that we still need mutable records, an obvious example is database entities, or aspnet options,

@theunrepentantgeek
Copy link

Is there going to be a way to use withers though only update?

As I've read it so far, the whole point of withers is to allow easy creation of a near clone of an existing object without modification of the original, regardless of whether the object is immutable or mutable.

Making the syntax return a different object if immutable, but the same object if mutable, would not only undermine half the motivating scenarios for the feature, but would be extremely confusing and likely a source of many subtle bugs.

@alrz
Copy link
Contributor

alrz commented Feb 11, 2020

@theunrepentantgeek

If you define your own With method, there's nothing to stop you from returning the same object. So there's no guarantee that you get a new object since it's all pattern-based and completely customizable.

What I'm saying is that the behavior largely depends on the target object. Only if it's a proper record you can be sure that it'll return a new object, So I don't mind the difference if I'm using with with a POCO.

@DavidArno
Copy link

@alrz,
To my mind, x = y with { variant } implies a contract that y will be left unmodified. So whilst a custom With method could be written that does modify y and return it, this would be a blatant PoLA violation, ie it would be Bad Code™

@DavidArno
Copy link

I've come late to this thread and I've skipped the comments, so apologies if I'm repeating the thoughts of others.

Whilst I understand where Mads is coming from with his cascade of features, I found myself a little lost toward the end. I'm hoping that I'll be able to write a record in one of two forms:

data struct Point(int X, int Y);
data class Point(int X, int Y);

And for both, I get:

  • A readonly struct and a sealed class with getter-only properties, respectively.
  • Value-based equality, including ==, so that:
var a = new Point(1, 2);
var b = new Point(1, 2);
a == b && a.Equals(b)

with the caveat that == can only be implemented if all the properties of the record also implement == (much like the way tuple equality works).

  • Support for var a = new Point(1, 2) and var a = new Point { X = 1, Y = 2 }.
  • Built-in deconstruct, (x, y) = new Point(1, 2).
  • Built-in wither support, var y = new Point(1, 2) with { Y = 1 }.
  • Ability to implicitly cast a tuple of (int, int) to a Point. In addition, for a record of just one parameter, that implicit cast should be for the type of that parameter and work both ways (ie like a one-element tuple deconstruct).
  • The ability to override any of the default compiler-generated behaviour through the body of the record:
data class Email(string Address)
{
    Email(string address) 
    {
        if (! address is valid email) throwAddress = address;
    }
}

Other people may see benefit to all the other features discussed, but they worry me. It risks turning a simple idea (a compact way of declaring a value/domain object) into something that takes a very long time to implement and that may therefore not make it in time for v9...

@thargol1
Copy link

Can Records also be used for structs? I would like to do more semantic programming so instead of int personId I would like to use PersonId personId;

So would this be possible:

public data struct PersonId(int Value);

and than would be it be possible to add 'features' without coding them explicitly:

public data struct PersonId(int Value) : IEquatable<PersonId>, IComparable<PersonId>;
// No implementation of IEquatable and IComparable is specified so a default is generated.

Features could be a set on well known interface like IEquatable, IComparable, IClonable, INotifyPropertyChanged and others.


and if a Record contains only one member, could we someway force conversion operators in it:

public data struct PersonId(int Value) : implicit operator int;
// implicit conversion from and to 'int' is generated

and would it be possible to enforce validation:

public data struct Email(string Value) where ValidatorTools.IsEmail(Value);

(I'd prefer this would be implemented using a static TryParse method and not with exceptions.)


and would I be able to combine it:

public data struct PersonId(int Value) where Value>=100000 && Value<=99999 : IEquatable<PersonId>, IComparable<PersonId>;

and in the end would this code work:

public data struct PersonId(int Value) : implicit operator int;
public data struct StudentId(int Value) : implicit operator int;

PersonId p = 23;
StudentId s = 14;

if (p == s) {} // generates compile error.

s = p; // generates compile error

Sorry for the long comment... I'm can explain myself better with examples than with words.

@Clockwork-Muse
Copy link

@thargol1

and than would be it be possible to add 'features' without coding them explicitly:

This is essentially what Rust does with "derivable" traits (interfaces). Note that for things like equality/comparison, you would not be able to do that with an inheritable class tree (Rust's structs don't have inheritance).

I actually really want a good chunk of that stuff, although alas, I don't think contract syntax is coming anytime soon. But making "wrappers" simpler would go a long way to helping remove primitive obsession.


@alrz - > Even though immutable with would be desirable in a lot of cases, there are places that we still need mutable records, an obvious example is database entities, or aspnet options,

It's not obvious that either of those places need mutable types to me.
For a database entity, you're not modifying the actual record directly, but sending something through an interface (most of the time/memory cost is going to be in serialization and transmission, not manipupation). Even an immutable clone of a proxy object should still be able to call some sort of Save() method - although personally I'd probably just be doing repository.UpdateOrAdd(modifiedObject).

@qrli
Copy link

qrli commented Feb 12, 2020

@alrz Your case can be as simple as:

Upsert(o => o.Property = newValue);

@dsaf
Copy link

dsaf commented Feb 17, 2020

Will Enumerable.SequenceEqual be used for collection member equality?

@Igorbek
Copy link

Igorbek commented Apr 5, 2020

I was reading the Jan 29 LDM, and it was a concern regarding withers about their possible backwards compatibility. And I want to suggest considering a way to deal with that which is a very common approach when binary compatibility is important. And it seems to me it might be a great fit for the feature as a separate general feature.
So the idea is instead of parameters for each property to use a mirror struct that has all the same fields or properties than the original one and used to pass properties for withers.

Record case:

public data class Point(int X, int Y);
var p2 = p1 with { X = 2 };

generates

public class Point
{
  // struct for withers
  struct WithParameters
  {
    public int _x, _y;
    public bool _x_provided, _y_provided;
    public int X { set { _x = value; _x_provided = true; } }
    public int Y { set { _y = value; _y_provided = true; } }
  }

  public virtual Point With(WithParameters p) => new Point(p._x_provided ? p._x : X, p._y_provided ? p._y : Y);
}

var p2 = p1.With(new Point.WithParameters { X = 2 });

So it

  • binary compatible when new fields added
  • can be implemented for any type
  • statically constructible at the use site
  • in general, allows hiding/adding/renaming names used in withers
  • with a new special resolution based on used fields can have multiple withers (p with { X, Y } or p with { R, Rho }
  • still allows inheritance
  • of course, when auto-generated for records, should properly have correct modifiers when they make sense (readonly, private)

Some possible alternatives/enhancements:

  • the struct may have the constructor with a single parameter - original object, that way it may initialize fields when 'provided' isn't needed
  • the struct can be returned instead (var temp = p.With(); temp.X = 1; var p2 = temp.Build();), but in this case we're limited to 1 overload
  • 'provided' may be passed explicitly (p.With(new Point.WithParameters { X = 1, XProvided = true }), but it would require a magic name (alternatively, use tuple (bool provided, T value))

And the feature can be coded manually independently from record types:

class Data
{
  public int Id { get; }
  public string Name { get; }
  public byte[] ExpensiveData { get; }

  class WithState // I can use class instead or different name
  {
    public string Name; // just field
    public byte[]? ExpensiveData; // I can vary how I define emptyness

    public WithState(Data d)
    {
      Name = d.Name;
    }
  }

  public Data With(WithState w) => w.ExpensiveData == null ? new Data(GetNextId(Id), w.Name) :  new Data(GetNextId(Id), w.Name, w.ExpensiveData);
}

data with { ExpensiveData = new [1] } // cannot use Id
// translates to
data.With(new Data.WithState { ExpensiveData = new [1] })

@CyrusNajmabadi
Copy link
Member

So the idea is instead of parameters for each property to use a mirror struct that has all the same fields or properties than the original one and used to pass properties for withers.

This was discussed in a later LDM:

https://github.com/dotnet/csharplang/blob/master/meetings/2020/LDM-2020-03-23.md
https://github.com/dotnet/csharplang/blob/master/meetings/2020/LDM-2020-03-30.md

@Igorbek
Copy link

Igorbek commented Apr 5, 2020

So the idea is instead of parameters for each property to use a mirror struct that has all the same fields or properties than the original one and used to pass properties for withers.

This was discussed in a later LDM:

https://github.com/dotnet/csharplang/blob/master/meetings/2020/LDM-2020-03-23.md
https://github.com/dotnet/csharplang/blob/master/meetings/2020/LDM-2020-03-30.md

Ah, that's almost exactly what I suggested. Thank you. That's sad that they think init-only had advantages over builders.

@jcouv jcouv removed this from the 9.0 candidate milestone Nov 11, 2020
@jcouv
Copy link
Member

jcouv commented Nov 11, 2020

Closing as the C# 9 records feature is now tracked by #39

@jcouv jcouv closed this as completed Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests