diff --git a/proto/substrait/algebra.proto b/proto/substrait/algebra.proto index 17e92a3c5..43e7896ea 100644 --- a/proto/substrait/algebra.proto +++ b/proto/substrait/algebra.proto @@ -340,9 +340,10 @@ message ExchangeRel { // // This is for use at the root of a `Rel` tree. message RelRoot { - // A relation + // A relation. Rel input = 1; - // Field names in depth-first order + // Field names in depth-first order. If the relation doesn't return anything + // (zero outputs, for example DdlRel), this should be empty. repeated string names = 2; } @@ -361,6 +362,9 @@ message Rel { ExtensionMultiRel extension_multi = 10; ExtensionLeafRel extension_leaf = 11; CrossRel cross = 12; + WriteRel write = 13; + DdlRel ddl = 14; + ReferenceRel reference = 15; } } @@ -436,12 +440,13 @@ message WriteRel { } // The schema of the table (must align with Rel input (e.g., number of leaf fields must match)) + // FIXME: I don't really have a problem with it being here because Substrait defines all kinds of things redundantly already, but why explicitly specify the whole schema here but not for RelRoot? It just specifies the names, which is enough, because the types can be derived from the input relation. NamedStruct table_schema = 3; // The type of operation to perform WriteOp op = 4; - // The relation that determines the tuples to add/remove/modify + // The relation that determines the records to add/remove/modify // the schema must match with table_schema. Default values must be explicitly stated // in a ProjectRel at the top of the input. The match must also // occur in case of DELETE to ensure multi-engine plans are unequivocal. @@ -452,29 +457,41 @@ message WriteRel { enum WriteOp { WRITE_OP_UNSPECIFIED = 0; - // The insert of new tuples in a table + // The insert of new records in a table WRITE_OP_INSERT = 1; - // The removal of tuples from a table + // The removal of records from a table + // FIXME: how are the records to be deleted identified? It's stated that schema(input) must match the table schema, so is it equality? Does that mean that it's impossible to differentiate between different records with the same contents, i.e. it's not possible to match positionally? WRITE_OP_DELETE = 2; - // The modification of existing tuples within a table + // The modification of existing records within a table + // FIXME: likewise for WRITE_OP_DELETE, how are the records that are to be updated identified? WRITE_OP_UPDATE = 3; - // The Creation of a new table, and the insert of new tuples in the table + // The Creation of a new table, and the insert of new records in the table WRITE_OP_CTAS = 4; } enum OutputMode { OUTPUT_MODE_UNSPECIFIED = 0; - // return no tuples at all + // return no records at all OUTPUT_MODE_NO_OUTPUT = 1; // this mode makes the operator return all the tuple INSERTED/DELETED/UPDATED by the operator. // The operator returns the AFTER-image of any change. This can be further manipulated by operators upstreams - // (e.g., retunring the typical "count of modified tuples"). + // (e.g., retunring the typical "count of modified records"). // For scenarios in which the BEFORE image is required, the user must implement a spool (via references to // subplans in the body of the Rel input) and return those with anounter PlanRel.relations. - OUTPUT_MODE_MODIFIED_TUPLES = 2; + // FIXME: if I'm interpreting "before-image", "after-image", and that "spool" mechanism correctly, I'm not sure how those things help. For example, in an update operation, I'd expect the operator to return the previous values of the replaced records, because that's the information that would otherwise be thrown away. Likewise for a delete, it should return the rows that are deleted, not the emptiness that I would call the "after-image". + OUTPUT_MODE_MODIFIED_RECORDS = 2; } } +// This rel is used to create references, +// in case we refer to a RelRoot field names will be ignored +message ReferenceRel { + // Zero-based index into the Plan.relations list. Must be less than the index + // of the relation tree that this reference appears in, in order to avoid + // cycles and forward references. + int32 rel_reference = 1; +} + // The argument of a function message FunctionArgument { oneof arg_type { @@ -1163,10 +1180,4 @@ message AggregateFunction { // Use only distinct values in the aggregation calculation. AGGREGATION_INVOCATION_DISTINCT = 2; } - - // This rel is used to create references, - // in case we refer to a RelRoot field names will be ignored - message ReferenceRel { - int32 subtree_ordinal = 1; - } } diff --git a/proto/substrait/plan.proto b/proto/substrait/plan.proto index 8614e08ef..de5a2ca0b 100644 --- a/proto/substrait/plan.proto +++ b/proto/substrait/plan.proto @@ -14,9 +14,13 @@ option java_package = "io.substrait.proto"; // Either a relation or root relation message PlanRel { oneof rel_type { - // Any relation (used for references and CTEs) + // A relation with an output that should be referred to by one or more + // relation references in subsequent relation trees, but for which the + // result is not logically returned by the plan. Rel rel = 1; - // The root of a relation tree + // The root of a relation tree. The result is logically returned by the + // plan using the enclosed field names. The result may also be used by + // relation references in subsequent relation trees. RelRoot root = 2; } } diff --git a/site/docs/relations/basics.md b/site/docs/relations/basics.md index a68db2801..200c29fdc 100644 --- a/site/docs/relations/basics.md +++ b/site/docs/relations/basics.md @@ -1,6 +1,12 @@ # Basics -Substrait is designed to allow a user to construct an arbitrarily complex data transformation plan. The plan is composed of one or more relational operations. Relational operations are well-defined transformation operations that work by taking zero or more input datasets and transforming them into zero or more output transformations. Substrait defines a core set of transformations, but users are also able to extend the operations with their own specialized operations. +Substrait is designed to allow a user to construct an arbitrarily complex data transformation plan. The plan is composed of one or more relational operations. Relational operations are well-defined transformation operations that work by taking zero or more input datasets and transforming them, typically returning a single result set. Substrait defines a core set of transformations, but users are also able to extend the operations with their own specialized operations. + +At the plan root, Substrait defines a list of relation trees. These trees can either be roots (`RelRoot root`) or subtrees (`Rel rel`). Functionally speaking, executing a plan involves computing the result sets for each of these trees by depth-first traversal of each relation tree, in the order in which they are defined. Relation references can be used to refer to the result set of a previously evaluated subtree. After all relations have been evaluated, the result set of each *root* relation that actually returns a result set is returned to the user, and any intermediate result sets produced by subtrees are discarded. Please note that this is only a functional description; in practice, the relations will likely be evaluated out of order and in parallel for performance. + +A common question is how Substrait deals with column and field names, aliases, name conflicts at the output of a join relation, and so on. The answer is that it doesn't: it avoids these problems entirely by referring to fields only by their positional index in the schema internally. However, at the periphery, i.e. read relations, write relations, and for the data sets returned by root relations, field names are attached for interoperability with systems that do match based on names. Names are bound by depth-first traversal of the schema, assigning a name to every struct field and column encountered. + +## Common properties Each relational operation is composed of several properties. Common properties for relational operations include the following: @@ -10,8 +16,6 @@ Each relational operation is composed of several properties. Common properties f | Hints | A set of optionally provided, optionally consumed information about an operation that better informs execution. These might include estimated number of input and output records, estimated record size, likely filter reduction, estimated dictionary size, etc. These can also include implementation specific pieces of execution information. | Physical | | Constraint | A set of runtime constraints around the operation, limiting its consumption based on real-world resources (CPU, memory) as well as virtual resources like number of records produced, the largest record size, etc. | Physical | - - ## Relational Signatures In functions, function signatures are declared externally to the use of those signatures (function bindings). In the case of relational operations, signatures are declared directly in the specification. This is due to the speed of change and number of total operations. Relational operations in the specification are expected to be <100 for several years with additions being infrequent. On the other hand, there is an expectation of both a much larger number of functions (1,000s) and a much higher velocity of additions. @@ -23,6 +27,8 @@ Each relational operation must declare the following: * Does the operator produce an output (by specification, we limit relational operations to a single output at this time) * What is the schema and field ordering of an output (see emit below)? +Relations may return zero or one result set. If they do not return a result, the relation behaves like a relation that always yields zero rows of a schema with zero fields. + ### Emit: Output Ordering A relational operation uses field references to access specific fields of the input stream. Field references are always ordinal based on the order of the incoming streams. Each relational operation must declare the order of its output data. To simplify things, each relational operation can be in one of two modes: diff --git a/site/docs/relations/logical_relations.md b/site/docs/relations/logical_relations.md index 4ac7244af..dbe60cf41 100644 --- a/site/docs/relations/logical_relations.md +++ b/site/docs/relations/logical_relations.md @@ -323,7 +323,7 @@ doing `ReferenceRel(0) JOIN D`. This allows to avoid the redundancy of `A JOIN B | Signature | Value | | -------------------- |---------------------------------------| -| Inputs | 1 | +| Inputs | 0 | | Outputs | 1 | | Property Maintenance | Maintains all properties of the input | | Direct Output Order | Maintains order | @@ -333,7 +333,7 @@ doing `ReferenceRel(0) JOIN D`. This allows to avoid the redundancy of `A JOIN B | Property | Description | Required | |-----------------------------|--------------------------------------------------------------------------------| --------------------------- | -| Referred Rel | A zero-indexed positional reference to a `Rel` defined within the same `Plan`. | Required | +| Referred Rel | A zero-indexed positional reference to a `Rel` defined within the same `Plan`. The index must be less than the index of the relation tree that the reference appears in; put differently, you can only refer to trees that have already been declared. This avoids cyclic dependencies and forward references. | Required | === "ReferenceRel Message" @@ -343,14 +343,14 @@ doing `ReferenceRel(0) JOIN D`. This allows to avoid the redundancy of `A JOIN B ## Write Operator -The write operator is an operator that consumes one output and writes it to storage. This can range from writing to a Parquet file, to INSERT/DELETE/UPDATE in a database. +The write operator is an operator that consumes one output and writes it to storage. Currently, only named tables and extensions are supported, but the intention is to also support writing files in the future. -| Signature | Value | -| -------------------- |---------------------------------------------------------| -| Inputs | 1 | -| Outputs | 1 | -| Property Maintenance | Output depends on OutputMode (none, or modified tuples) | -| Direct Output Order | Unchanged from input | +| Signature | Value | +| -------------------- |----------------------------------------------------------| +| Inputs | 1 | +| Outputs | 1 | +| Property Maintenance | Output depends on OutputMode (none, or modified records) | +| Direct Output Order | Unchanged from input | ### Write Properties @@ -360,8 +360,8 @@ The write operator is an operator that consumes one output and writes it to stor | Write Type | Definition of which object we are operating on (e.g., a fully-qualified table name). | Required | | CTAS Schema | The names of all the columns and their type for a CREATE TABLE AS. | Required only for CTAS | | Write Operator | Which type of operation we are performing (INSERT/DELETE/UPDATE/CTAS). | Required | -| Rel Input | The Rel representing which tuples we will be operating on (e.g., VALUES for an INSERT, or which tuples to DELETE, or tuples and after-image of their values for UPDATE). | Required | -| Output Mode | For views that modify a DB it is important to control, which tuples to "return". Common default is NO_OUTPUT where we return nothing. Alternatively, we can return MODIFIED_TUPLES, that can be further manipulated by layering more rels ontop of this WriteRel (e.g., to "count how many tuples were updated"). This also allows to return the after-image of the change. To return before-image (or both) one can use the reference mechanisms and have multiple return values. | Required for VIEW CREATE/CREATE_OR_REPLACE/ALTER | +| Rel Input | The Rel representing which records we will be operating on (e.g., VALUES for an INSERT, or which records to DELETE, or records and after-image of their values for UPDATE). | Required | +| Output Mode | For views that modify a DB it is important to control, which records to "return". Common default is NO_OUTPUT where we return nothing. Alternatively, we can return MODIFIED_RECORDS, that can be further manipulated by layering more rels ontop of this WriteRel (e.g., to "count how many records were updated"). This also allows to return the after-image of the change. To return before-image (or both) one can use the relation reference mechanisms to use the relation subtree passed to the write input more than once. | Required for VIEW CREATE/CREATE_OR_REPLACE/ALTER | ### Write Definition Types