Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,76 @@ message Rel {
}
}

// A base table for writing. The list of string is used to represent namespacing (e.g., mydb.mytable).
// This assumes shared catalog between systems exchanging a message.
// it also includes a base schema, and default values
message NamedTableWrite {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the rationale for the fields in this message? I thought the DdlRel.query field already defines the schema except for the names; this schema-and-names pattern is also seen in RelRoot adding names to its Rel input's schema. I'm also not sure why the default values are not implicit in the query.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note that my DB background is mostly as a user, so I'm not familiar with some of the terminology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons:

  1. for a CREATE TABLE we would not have a input and only specify the schema (which must include defaults, which are missing in Rel)
  2. I missed to reuse this field in WriteRel, where we could support INSERT that specify only a subset of the fields.

The alternative could be expecting input to include constants for all defaults in the "right places" (which I think is what you suggest). That could work, but feels less natural as most systems assume they can specify only a subset of fields in the INSERT, and if I am doing a CREATE TABLE feels more natural to feel-out the defaults list than to create a query of constants.

repeated string names = 1;
// The columns that will be modified (representing after-image of a schema change)
NamedStruct base_schema = 2;
// The default values for the columns (representing after-image of a schema change)
repeated Expression.Literal defaults = 3;

substrait.extensions.AdvancedExtension advanced_extension = 10;
}

message DdlRel {
// Definition of which type of scan operation is to be performed
oneof write_type {
NamedTableWrite named_table = 1;
ReadRel.ExtensionTable extension_table = 2;
}

//TODO add constraints/indexes/etc..?

// The type of operation to perform
DdlOp op = 3;

// The body of the CREATE VIEW / CTAS
Rel input = 4;

enum DdlOp {
DDL_OP_UNSPECIFIED = 0;
DDL_OP_CREATE_TABLE = 1;
DDL_OP_DROP_TABLE = 2;
DDL_OP_ALTER_TABLE = 3;

DDL_OP_CREATE_VIEW = 4;
DDL_OP_CREATE_OR_REPLACE = 5;
DDL_OP_DROP_VIEW = 6;
}
}

// The operator that modifies the schema/content of a database
message WriteRel {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#239 has the goal of multiple outputs from the Substrait plan, and in particular it distinguishes between the table it writes out to the table it passes through. Could this goal be met here? What table does WriteRel here pass to a relation that has it in its Rel input field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the Rel output, however that is typically used to report on which tuples have been modified (often simply counting them and returning that as an aggregate, or reporting before/after images for an update, etc.. lots of semantics exist, hence allowing for an arbitrary query on it).

I think it "might" work, but I would suggest you take a look at it once we merge this, and evolve it as needed. I would probably think of a multi-output as a spool followed by simple writers, but this is just me.

// Definition of which type of write operation is to be performed
oneof write_type {
NamedTableWrite named_table = 1;
ReadRel.LocalFiles local_files = 2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think ReadRel.LocalFiles works here. For one thing, the start and length field are meaningless for writing, and likely so are future specific read options (e.g., for ParquetReadOptions. If the desire is to reuse fields, common fields (for reading and writing) should be refactored into a common place, such as CommonRel.LocalFiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going for this based on some previous comment to avoid breaking changes where possible. Moving LocalFiles outside of ReadRel will be a breaking change.

ReadRel.ExtensionTable extension_table = 3;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be refactored to a common place (for reading and writing) too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going for this based on some previous comment to avoid breaking changes where possible. Moving LocalFiles outside of ReadRel will be a breaking change.

}

// The type of operation to perform
WriteOp op = 4;

// The relation that determines the tuples to add/remove/modify
// remaining columns are left as-is or to default (for INSERT).
Rel input = 5;

// The rel that specifies the output of the operator. It allows references
// to DELETED.<col_name> or INSERTED.<col_name>. It can also compute simply count of
// affected tuples (per common behavior of most RDBMS). Defaults to no output.
Rel output = 6;

enum WriteOp {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for merge operations, which define the written result as a function of both the query result's row and the one already stored? For example, such a merge operation could append to a string column, or increment a numeric column, or even used for distributed statistical aggregation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that does exist and I think eventually we should cover it, but looking at the potential richness of semantics (e.g., see example for SQLServer) it does look a bit complex, so I was leaving this for a later patch (as we are already taking on a few things here).

WRITE_OP_UNSPECIFIED = 0;
WRITE_OP_INSERT = 1;
WRITE_OP_INSERT_OR_REPLACE = 2;
WRITE_OP_DELETE = 3;
WRITE_OP_UPDATE = 4;
}
}

// The argument of a function
message FunctionArgument {
oneof arg_type {
Expand Down
38 changes: 31 additions & 7 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,17 +300,16 @@ The aggregate operation groups input data on one or more sets of grouping keys,
%%% proto.algebra.AggregateRel %%%
```


## Write Operator

The write operator is an operator that consumes one output and writes it to storage. A simple example would be writing Parquet files. It is expected that many types of writes will be added over time.

| Signature | Value |
| -------------------- | --------------- |
| Inputs | 1 |
| Outputs | 0 |
| Property Maintenance | N/A (no output) |
| Direct Output Order | N/A (no output) |
| Signature | Value |
| -------------------- |----------------------|
| Inputs | 1 |
| Outputs | 1 |
| Property Maintenance | N/A (no output) |
| Direct Output Order | Unchanged from input |

### Write Properties

Expand All @@ -326,6 +325,13 @@ The write operator is an operator that consumes one output and writes it to stor

Write definition types are built by the community and added to the specification. This is a portion of specification that is expected to grow rapidly.


=== "WriteRel Message"

```proto
%%% proto.algebra.WriteRel %%%
```

#### Virtual Table

| Property | Description | Required |
Expand All @@ -343,7 +349,25 @@ Write definition types are built by the community and added to the specification
| Format | Enumeration of available formats. Only current option is PARQUET. | Required |


## DDL Operator

The operator that defines modifications of a database schema (CREATE/DROP/ALTER for TABLE and VIEWS).

| Signature | Value |
| -------------------- |-----------------|
| Inputs | 1 |
| Outputs | 0 |
| Property Maintenance | N/A (no output) |
| Direct Output Order | N/A |

=== "DdlRel Message"

```proto
%%% proto.algebra.DdlRel %%%
```

## Discussion Points

* How to handle correlated operations?