Skip to content

Latest commit

 

History

History
92 lines (73 loc) · 4.37 KB

custom-columntypes.md

File metadata and controls

92 lines (73 loc) · 4.37 KB

Implementing Custom ColumnTypes

Sif's columnar abstraction includes two broad categories of ColumnType:

  1. ColumnTypes can store any type of data, including custom types defined by client code
  2. FixedWidthColumnTypes can also store any type of data, but provide the additional guarantee that all values of the FixedWidthColumnType serialize to a buffer with a known, fixed Size() (Note: the deserialized value need not be fixed in size).

FixedWidthColumnTypes provide substantially more reliable guarantees about runtime memory usage, as values of this type are only deserialized when they are accessed, and are immediately serialized again. These types should only be implemented and used when serialization and deserialization are highly efficient, and acceptable in trade for highly predictable runtime memory usage.

All other ColumnTypes, which may be variable-width (though do not need to be), are deserialized and serialized at most once per Stage, only if a Task in that Stage accesses the modifies a column with that ColumnType. If only accessed, but not modified,re-serialization will be skipped

Sif supports, and encourages, the definition of custom fixed and variable-width ColumnTypes which adhere to the base GenericColumnType interface, and one of the two following interfaces (GenericVariableWidthColumnType or GenericFixedWidthColumnType):

// GenericColumnType specifies typed and untyped versions of its
// methods. This interface makes for cleaner internal code for
// Sif, but is also intended to help future-proof the implementation
// of ColumnTypes (in the event that Golang ever considers adding
// covariance or generic method support to their generics implementation)
// In general, untyped methods should just wrap their typed counterparts.
type GenericColumnType[T any] interface {
	// Produces a string representation of a value of this Type
	ToStringT(v T) string // Typed version
	ToString(v interface{}) string // Untyped version

	// Defines how this type is deserialized
	// The untyped version should generally just call the typed version
	DeserializeT([]byte) (T, error)	// Typed version
	Deserialize([]byte) (interface{}, error) // Untyped version
}

// GenericVariableWidthColumnType describes how variable-length data should be serialized
type GenericVariableWidthColumnType[T any] interface {
	GenericColumnType[T]
	// Defines how this type is serialized
	// The untyped version should type check before calling the typed version.
	SerializeT(v T) ([]byte, error) // Typed version
	Serialize(v interface{}) ([]byte, error) // Untyped version
}

// GenericFixedWidthColumnType adds a size guarantee via the Size() method
type GenericFixedWidthColumnType[T any] interface {
	GenericColumnType[T]
	// returns size in bytes of a serialized value of this ColumnType
	Size() int
	// Supports serialization to a reusable buffer of guaranteed Size()
	SerializeFixedT(v T, dest []byte) error
	SerializeFixed(v interface{}, dest []byte) error
}

The primary effort of defining a GenericColumnType is in defining Serialize and Deserialize methods. Any serialization approach may be used, as long as the end result is a []byte.

Defining a Custom ColumnType

For reference implementations of various ColumnTypes, check out the coltype package.

It is worth mentioning that, in addition to implementing the GenericColumnType[T] interface, client code should also supply a factory function which returns a GenericColumnAccessor[T], allowing values from columns of this type to be accessed in a typed fashion:

// ...ToStringT, SerializeT, DeserializeT, etc.

// Bool returns a GenericColumnAccessor for a FixedWidthColumnType which stores a boolean value
func Bool(colName string) sif.GenericColumnAccessor[bool] {
	return sif.CreateColumnAccessor[bool](&boolType{}, colName)
}

Using a ColumnType

Using one's own ColumnType (fixed or variable-width) is as simple as using factory function to produce a GenericColumnAccessor[T], and then including it in a Schema:

myBoolColumn := Bool("my_column")
schema, err := schema.CreateSchema(myBoolColumn)

Values may be stored in and retrieved from Rows using the GenericColumnAccessor[T]:

ops.Map(func(row sif.Row) error {
	if myBoolColumn.IsNil(row) {
		return nil
	}
	aBool, err := myBoolColumn.From(row)
	if err != nil {
		return err
	}
	return myBoolColumn.To(row, aBool)
})