Skip to content

Default Type Describer

kevin-montrose edited this page Apr 10, 2021 · 4 revisions

Default Type Describer

Introduction

An instance of DefaultTypeDescriber is used by Cesil's default Options to enumerate members, create type instances, perform dynamic conversions, and otherwise describe types during read and write operations.

The DefaultTypeDescriber tries to do "what's expected" for a .NET (de)serializer, and will often fit your needs. It has numerous extension points for when the default behavior is almost, but not quite, what is needed. For radically different needs, Cesil also provides the ManualTypeDescriber and SurrogateTypeDescriber classes, and the option of directly implementating ITypeDescriber.

Accessing DefaultTypeDescriber

While you can create new instances of DefaultTypeDescriber using it's parameter-less constructor, there is also a pre-allocated shared instance on the TypeDescribers static class. The shared instance is used by Options.Default and Options.DynamicDefault.

Serializing With Static Types

Controlling static serialization is done with the EnumerateMembersToSerialize(TypeInfo) method. Additional extension points are documented below, but all behavior can also be customized by overriding this method directly.

Choosing Members

The DefaultTypeDescriber serializes all:

This behavior can be customized by overriding ShouldSerialize(TypeInfo, PropertyInfo) and/or ShouldSerialize(TypeInfo, FieldInfo).

Reading Members

The DefaultTypeDescriber reads properties using the Getter backed by their get method, and with a Getter backed directly by a field for fields.

This behavior can be customized by overriding GetGetter(TypeInfo, PropertyInfo) and/or GetGetter(TypeInfo, FieldInfo).

Naming Columns

Each member is given their source name, unless they are decorated with a DataMemberAttribute with a non-null Name.

This behavior can be customized by overriding GetSerializationName(TypeInfo, PropertyInfo) and/or GetSerializationName(TypeInfo, FieldInfo).

Per-Row Conditional Serialization

For any properties that are chosen for serialization if a method exists on the row being serialized named ShouldSerializeXXX where XXX is the name of the property, that method will be invoked for each row that is serialized. It does not matter if the ShouldSerializeXXX method is public, it will be discovered if it has a non-public visibility.

The ShouldSerializeXXX method must return a bool - if it returns false the property is not included in the row being serialized.

If the ShouldSerializeXXX method is an instance method, it cannot take any parameters. If it is a static method, it may take 1 parameter that is of a type assignable from the row type.

By default, fields will never have a ShouldSerializeXXX method.

This behavior can be customized by overriding GetShouldSerialize(TypeInfo, PropertyInfo) and/or GetShouldSerialize(TypeInfo, FieldInfo).

Formatting

The default Formatter, obtained via calling Formatter.GetDefault(TypeInfo) with FieldInfo.FieldType or PropertyInfo.PropertyType as appropriate, is used for all selected members.

This behavior can be customized by overriding GetFormatter(TypeInfo, PropertyInfo) and/or GetFormatter(TypeInfo, FieldInfo).

Ordering

If a member has an explicit [DataMemberAttribute] with an Order, != -1 that is respected.

If these is a mix of explicit orders and null orders, members with explicit orders are moved to the front.

If no orders are specified, members will be ordered such that properties are serialized first and then fields. Order within those sections is not specified.

This behavior can be customized by overriding GetOrder(TypeInfo, PropertyInfo) and/or GetOrder(TypeInfo, FieldInfo).

Emitting Default Values

By default, any selected member's value will always be written.

If a member has been decorated with [DataMemberAttribute] and a has EmitDefaultValue set to false then the default value of a member will not be written.

The default value for any reference type is null, and for value types is either 0 or (for user defined structs) a struct having all it's fields set to their default values.

This behavior can be customized by overriding GetEmitDefaultValue(TypeInfo, PropertyInfo) and/or GetEmitDefaultValue(TypeInfo, FieldInfo).

Deserializing With Static Types

Controlling static deserialization is done with the EnumerateMembersToDeserialize(TypeInfo) method. Additional extension points are documented below, but all behavior can also be customized by overriding this method directly.

Providing Instances

When deserializing, instances of the row type to populate must be provided. DefaultTypeDescriber will use the parameter-less constructor of the row type by default.

This behavior can be customized by overriding GetInstanceProvider(TypeInfo).

Choosing Members

The DefaultTypeDescriber deserializes all:

  • public instance properties, or field and properties with the DataMemberAttribute
    • with setters that take a single parameter
    • with property type that have a default Parser
    • unless decorated with IgnoreDataMember

This behavior can be customized by overriding ShouldDeserialize(TypeInfo, PropertyInfo) and/or ShouldDeserialize(TypeInfo, FieldInfo).

Setting Members

For selected members, fields will be set directly and properties will be set via their set method.

This behavior can be customized by overriding GetSetter(TypeInfo, PropertyInfo), GetSetter(TypeInfo, FieldInfo), or GetSetter(TypeInfo)

Parsing

The default Parser, obtained via calling Parser.GetDefault(TypeInfo) with FieldInfo.FieldType or PropertyInfo.PropertyType as appropriate, is used for all selected members.

This behavior can be customized by overriding GetParser(TypeInfo, PropertyInfo), GetParser(TypeInfo, FieldInfo), or GetParser(TypeInfo).

Ordering

Ordering is always calculated by the DefaultTypeDescriber, but will be discarded if there's an explicit header row during deserialization.

Otherwise, ordering logical is identical to (and shared with) serializing logic.

Required Members

If a member is decorated with a DataMemberAttribute with IsRequired set to true, then a member is required during deserialization.

If a member is required then, if a cell is empty, an exception will be thrown rather than leaving the member set to it's default value.

This behavior can be customized by overriding GetIsRequired(TypeInfo, PropertyInfo) and/or GetIsRequired(TypeInfo, FieldInfo).

Per-Row Member Resets

For any properties that are chosen for deserialization if a method exists on the row being deserialized named ResetXXX where XXX is the name of the property, that method will be invoked deserialized prior to the member being assigned.

The ResetXXX method must return void.

If the ResetXXX method is an instance method, it cannot take any parameters. If it is a static method, it may take 1 parameter that is of a type assignable from the row type.

By default, fields will never have a ResetXXX method.

This behavior can be customized by overriding GetReset(TypeInfo, PropertyInfo) and/or GetReset(TypeInfo, FieldInfo).

Serializing With dynamic

Controlling dynamic serialization is done with the GetCellsForDynamicRow(in WriteContext, dynamic) method. Additional extension points are documented below, but all behavior can also be customized by overriding this method directly.

Kinds Of Dynamic Values

There are, broadly, two "kinds" of dynamic values:

  1. Those types participating in the Dynamic Language Runtime (DLR)
  2. "Normal" .NET types, whose type is simply unknown at compile time

The kind of value will influence exact behavior, as documented below. Generally speaking, using static serialization where possible will yield better performance and clearer code.

Enumerating Members

For DLR types (kind #1 above), DynamicMetaObject.GetDynamicMemberNames() will be used to enumerate members unless a type is "well known" in which case a faster, but logically equivalent, alternative will be used.

For all other types (kind #2 above), EnumerateMembersToSerialize will be used to discover members.

Filtering Members

By default, all members are included unless an error is encountered fetching a value or determining a formatter.

This behavior can be customized by overriding ShouldIncludeCell(string, in WriteContext, dynamic).

Formatting

The default Formatter, obtained via calling Formatter.GetDefault(TypeInfo) with the runtime type of the retrieved value.

This behavior can be customized by overriding GetFormatter(TypeInfo, string, in WriteContext, dynamic).

Ordering & Naming

Ordering and naming are implicit in the IEnumerable<DynamicCellValue> returned by GetCellsForDynamicRow(in WriteContext, dynamic).

In order to customize that behavior, override the that method directly.

Deserializing With dynamic

When working with dynamic deserialization, no customization is done at "read time." Instead, all customization happens at conversion time.

Converting Rows

The DefaultTypeDescriber supports converting dynamic rows to:

  • ValueTuple
    • Any arity is supported
  • Tuple
    • Any arity is supported
  • IEnumerable<T>
    • Assuming all cells can be converted to T
  • IEnumerable
  • Any custom type with a constructor of the same arity as the row
    • Assumes the cells in the row are in the same order as the constructor parameters, and can be converted to the parameter types
  • Any custom type with a zero-parameter constructor
    • Assumes that all the cells in the row are in named columns
    • Any properties (public, non-public, static, or instance) that share a name with a column will be assigned the matching cell value
  • record types, assuming the row's cells can be converted to invoke it's default constructor
    • Other public properties on a record will also be set if possible, but are treated as optional

This behavior can be customized by overriding GetCellsForDynamicRow(WriteContext, dynamic, Span<DynamicCellValue>).

Converting Cells

When converting individual cells, DefaultTypeDescriber will use a Parser backed by:

This behavior can be customized by overriding GetDynamicCellParserFor(in ReadContext, IEnumerable<ColumnIdentifier>, TypeInfo).

Special Handling Of Single Column Rows

For convenience, the DefaultTypeDescriber special cases reading and writing rows of certain well known types to elide the need for a wrapper row type.

For example, if you have a CSV where each row has a single int column rather than doing something like:

class IntWrapper
{
   public int Column { get; set; }
}

// ...

CesilUtils.Enumerate<IntWrapper>(/* some reader */);  // CesilUtils defaults to the DefaultTypeDescriber

you can instead do the following:

CesilUtils.Enumerate<int>(/* some reader */);  // CesilUtils defaults to the DefaultTypeDescriber

Without a wrapper type column names become implicit - the DefaultTypeDescriber will assign a name based on the row type. The name chosen will be is either the value of TypeInfo.Name, or that value prefixed with "Nullable" if the type is a Nullable<T>. For example, string becomes "String" and int? becomes "NullableInt32".

A type is considered well known if it has a Default Parser or Default Formatter.

This behavior is a property of the DefaultTypeDescriber, not Cesil in general. If you use a custom ITypeDescriber you will have to implementing this behavior yourself.

Caches

Each DefaultTypeDescriber instance maintains a numer of internal caches - these caches are used to speed up member enumeration and dynamic operations.

Under certain rare use cases, these caches may grow without bound. For these cases, the ClearCache() method is provided.

When more precise control of caching behavior is desired, it is recommended that clients switch to their own ITypeDescriber implementation. A middle ground is also available - when subclassed the DefaultTypeDescriber will disable most caching behavior, allowing a client to provide their own in its place.

Extension Guidance

When extending DefaultTypeDescriber remember that it can be accessed from many different threads simultaneously, including threads that are far removed from the act of deserialization in cases involving dynamic.

Generally, assume that: