Sif
's columnar abstraction includes two broad categories of ColumnType
:
ColumnType
s can store any type of data, including custom types defined by client codeFixedWidthColumnType
s can also store any type of data, but provide the additional guarantee that all values of theFixedWidthColumnType
serialize to a buffer with a known, fixedSize()
(Note: the deserialized value need not be fixed in size).
FixedWidthColumnType
s provide substantially more reliable guarantees about runtime memory usage, as values of this type are only deserialized when they are accessed, and are immediately serialized again. These types should only be implemented and used when serialization and deserialization are highly efficient, and acceptable in trade for highly predictable runtime memory usage.
All other ColumnType
s, which may be variable-width (though do not need to be), are deserialized and serialized at most once per Stage
, only if a Task
in that Stage
accesses the modifies a column with that ColumnType
. If only accessed, but not modified,re-serialization will be skipped
Sif
supports, and encourages, the definition of custom fixed and variable-width ColumnType
s which adhere to the base GenericColumnType
interface, and one of the two following interfaces (GenericVariableWidthColumnType
or GenericFixedWidthColumnType
):
// GenericColumnType specifies typed and untyped versions of its
// methods. This interface makes for cleaner internal code for
// Sif, but is also intended to help future-proof the implementation
// of ColumnTypes (in the event that Golang ever considers adding
// covariance or generic method support to their generics implementation)
// In general, untyped methods should just wrap their typed counterparts.
type GenericColumnType[T any] interface {
// Produces a string representation of a value of this Type
ToStringT(v T) string // Typed version
ToString(v interface{}) string // Untyped version
// Defines how this type is deserialized
// The untyped version should generally just call the typed version
DeserializeT([]byte) (T, error) // Typed version
Deserialize([]byte) (interface{}, error) // Untyped version
}
// GenericVariableWidthColumnType describes how variable-length data should be serialized
type GenericVariableWidthColumnType[T any] interface {
GenericColumnType[T]
// Defines how this type is serialized
// The untyped version should type check before calling the typed version.
SerializeT(v T) ([]byte, error) // Typed version
Serialize(v interface{}) ([]byte, error) // Untyped version
}
// GenericFixedWidthColumnType adds a size guarantee via the Size() method
type GenericFixedWidthColumnType[T any] interface {
GenericColumnType[T]
// returns size in bytes of a serialized value of this ColumnType
Size() int
// Supports serialization to a reusable buffer of guaranteed Size()
SerializeFixedT(v T, dest []byte) error
SerializeFixed(v interface{}, dest []byte) error
}
The primary effort of defining a GenericColumnType
is in defining Serialize
and Deserialize
methods. Any serialization approach may be used, as long as the end
result is a []byte
.
For reference implementations of various ColumnTypes
, check out the coltype
package.
It is worth mentioning that, in addition to implementing the GenericColumnType[T]
interface,
client code should also supply a factory function which returns a GenericColumnAccessor[T]
, allowing values from columns of this type to be accessed in a typed fashion:
// ...ToStringT, SerializeT, DeserializeT, etc.
// Bool returns a GenericColumnAccessor for a FixedWidthColumnType which stores a boolean value
func Bool(colName string) sif.GenericColumnAccessor[bool] {
return sif.CreateColumnAccessor[bool](&boolType{}, colName)
}
Using one's own ColumnType
(fixed or variable-width) is as simple as using factory function to produce a GenericColumnAccessor[T]
, and then including it in a Schema:
myBoolColumn := Bool("my_column")
schema, err := schema.CreateSchema(myBoolColumn)
Values may be stored in and retrieved from Row
s using the GenericColumnAccessor[T]
:
ops.Map(func(row sif.Row) error {
if myBoolColumn.IsNil(row) {
return nil
}
aBool, err := myBoolColumn.From(row)
if err != nil {
return err
}
return myBoolColumn.To(row, aBool)
})