-
Notifications
You must be signed in to change notification settings - Fork 192
feat: add initial support for DDL and data modification operator (#128) #236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e930a7a
e5c5c6c
a3a0096
4a624b5
5ff4306
be055bb
f2acc99
6bddcf5
c9469ee
7f5d120
e6179cf
8a5ff26
187375a
baf2337
dfe98a9
2b56fe8
5bdf49d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -364,6 +364,76 @@ message Rel { | |
| } | ||
| } | ||
|
|
||
| // A base table for writing. The list of string is used to represent namespacing (e.g., mydb.mytable). | ||
| // This assumes shared catalog between systems exchanging a message. | ||
| // it also includes a base schema, and default values | ||
| message NamedTableWrite { | ||
| repeated string names = 1; | ||
| // The columns that will be modified (representing after-image of a schema change) | ||
| NamedStruct base_schema = 2; | ||
| // The default values for the columns (representing after-image of a schema change) | ||
| repeated Expression.Literal defaults = 3; | ||
|
|
||
| substrait.extensions.AdvancedExtension advanced_extension = 10; | ||
| } | ||
|
|
||
| message DdlRel { | ||
| // Definition of which type of scan operation is to be performed | ||
| oneof write_type { | ||
| NamedTableWrite named_table = 1; | ||
| ReadRel.ExtensionTable extension_table = 2; | ||
| } | ||
|
|
||
| //TODO add constraints/indexes/etc..? | ||
|
|
||
| // The type of operation to perform | ||
| DdlOp op = 3; | ||
|
|
||
| // The body of the CREATE VIEW / CTAS | ||
| Rel input = 4; | ||
|
|
||
| enum DdlOp { | ||
| DDL_OP_UNSPECIFIED = 0; | ||
| DDL_OP_CREATE_TABLE = 1; | ||
| DDL_OP_DROP_TABLE = 2; | ||
| DDL_OP_ALTER_TABLE = 3; | ||
|
|
||
| DDL_OP_CREATE_VIEW = 4; | ||
| DDL_OP_CREATE_OR_REPLACE = 5; | ||
| DDL_OP_DROP_VIEW = 6; | ||
| } | ||
| } | ||
|
|
||
| // The operator that modifies the schema/content of a database | ||
| message WriteRel { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #239 has the goal of multiple outputs from the Substrait plan, and in particular it distinguishes between the table it writes out to the table it passes through. Could this goal be met here? What table does
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That would be the I think it "might" work, but I would suggest you take a look at it once we merge this, and evolve it as needed. I would probably think of a multi-output as a spool followed by simple writers, but this is just me. |
||
| // Definition of which type of write operation is to be performed | ||
| oneof write_type { | ||
| NamedTableWrite named_table = 1; | ||
| ReadRel.LocalFiles local_files = 2; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was going for this based on some previous comment to avoid breaking changes where possible. Moving |
||
| ReadRel.ExtensionTable extension_table = 3; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be refactored to a common place (for reading and writing) too.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was going for this based on some previous comment to avoid breaking changes where possible. Moving |
||
| } | ||
|
|
||
| // The type of operation to perform | ||
| WriteOp op = 4; | ||
|
|
||
| // The relation that determines the tuples to add/remove/modify | ||
| // remaining columns are left as-is or to default (for INSERT). | ||
| Rel input = 5; | ||
|
|
||
| // The rel that specifies the output of the operator. It allows references | ||
| // to DELETED.<col_name> or INSERTED.<col_name>. It can also compute simply count of | ||
| // affected tuples (per common behavior of most RDBMS). Defaults to no output. | ||
| Rel output = 6; | ||
|
|
||
| enum WriteOp { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a need for merge operations, which define the written result as a function of both the query result's row and the one already stored? For example, such a merge operation could append to a string column, or increment a numeric column, or even used for distributed statistical aggregation.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that does exist and I think eventually we should cover it, but looking at the potential richness of semantics (e.g., see example for SQLServer) it does look a bit complex, so I was leaving this for a later patch (as we are already taking on a few things here). |
||
| WRITE_OP_UNSPECIFIED = 0; | ||
| WRITE_OP_INSERT = 1; | ||
| WRITE_OP_INSERT_OR_REPLACE = 2; | ||
| WRITE_OP_DELETE = 3; | ||
| WRITE_OP_UPDATE = 4; | ||
| } | ||
| } | ||
|
|
||
| // The argument of a function | ||
| message FunctionArgument { | ||
| oneof arg_type { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the rationale for the fields in this message? I thought the
DdlRel.queryfield already defines the schema except for the names; this schema-and-names pattern is also seen inRelRootadding names to itsRel input's schema. I'm also not sure why the default values are not implicit in the query.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should note that my DB background is mostly as a user, so I'm not familiar with some of the terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two reasons:
CREATE TABLEwe would not have ainputand only specify the schema (which must include defaults, which are missing inRel)WriteRel, where we could supportINSERTthat specify only a subset of the fields.The alternative could be expecting
inputto include constants for all defaults in the "right places" (which I think is what you suggest). That could work, but feels less natural as most systems assume they can specify only a subset of fields in theINSERT, and if I am doing aCREATE TABLEfeels more natural to feel-out thedefaultslist than to create a query of constants.