Skip to content

Comments

feat: add initial support for DDL and data modification operator (#128)#236

Closed
curino wants to merge 17 commits intosubstrait-io:mainfrom
curino:modops_2
Closed

feat: add initial support for DDL and data modification operator (#128)#236
curino wants to merge 17 commits intosubstrait-io:mainfrom
curino:modops_2

Conversation

@curino
Copy link
Contributor

@curino curino commented Jun 28, 2022

No description provided.

@curino
Copy link
Contributor Author

curino commented Jun 28, 2022

For this I had to move Literal out of Expression, as it was needed for the defaults.

@jvanstraten
Copy link
Contributor

You can just use Expression.Literal for the type instead. Moving it out would be an unnecessary breaking change for code generated from the protos.

@curino
Copy link
Contributor Author

curino commented Jun 28, 2022

You can just use Expression.Literal for the type instead. Moving it out would be an unnecessary breaking change for code generated from the protos.

Good point.

@jacques-n
Copy link
Contributor

I think this is might be closer to what you have in mind...

Generally looks good. I wouldn't include pure DDL operations in the WriteRel, they feel different (create/drop view, create table, drop table, alter table). Basically, there seem to be two main types of writes: writes that take a set of records as input (e.g. CTAS) and writes that don't (CREATE TABLE). Keeping the second separate feels better than combining here. In general, I have a little question of whether we need to even support DDL at the Substrait level. It feels very database specific (and not requiring any of the other parts of Substrait).

I also don't really understand what the output Rel is. I'd expect something more like:

OutputPattern {

oneof pattern {
  None none = 1;
  RecordCount record_count = 2;
  PartitionList partition_list = 3;
  StreamList stream_list = 4;
}

  message None {}

  message RecordCount{}

  message PartitionList {
    Type.Struct structure = 1;
  }

  message StreamList {
    Type.Struct structure = 1;
  }

}

That's just a rough sketch.

@curino
Copy link
Contributor Author

curino commented Jun 28, 2022

@jacques-n, I am ok to move the DDL into a separate type of operator ``DDLOp` or whatnot. The only issue is splitting CTAS and CREATE TABLE, that feels a bit odd.

I think Substrait providing a minimum coverage of DDL is essential for backends to be able to depend solely on Substrait as their interface. Talking with folks at SIGMOD the pitch that was resonating very well was "You build the backend, be compatible with Substrait and don't worry about a thing"... if they don't get DDL coverage there the dream is half fullfilled. Similar needs arise for tools in governance/provenance like the Atlas integration that Ashvin is working on, or for other tools that need visibility of the whole workload for example for view/index selection tools (I know other folks would like to use Substrait for that). Bottomline, a minimum support for DDL allows to consider Substrait a complete replacement for the entire parsing/interpretation stack in the system. A partial coverage makes the value prop much less appealing (as folks would need to integrate with other existing parser, and then Substrait is just another dependency to add).

As for the output, what you have is fine (a bit low-level), but if you look at the OUTPUT clause in SQL Server or the RETURNING clause in Oracle allow you to specify a richer output (e.g, projections of only some columns or even expression over them, like COUNT(INSERTED.* - DELETED.*) or what-have-you). Having another Rel there is a compact way to allow that (again need to double-check everything, but that is the spirit).

@westonpace
Copy link
Member

@jacques-n, I am ok to move the DDL into a separate type of operator ``DDLOp` or whatnot. The only issue is splitting CTAS and CREATE TABLE, that feels a bit odd.

I think I agree with @jacques-n here but maybe I'm coming at things from a mechanical / implementation perspective. Coming from the analytics world, the table metadata is often stored completely separate from the table data. For example, hive, or iceberg. The DDL work is then carried out by a completely different system.

I think Substrait providing a minimum coverage of DDL is essential for backends to be able to depend solely on Substrait as their interface.

On the read side of things we don't have any API right now for reading table metadata so it seems a bit asymmetric. Perhaps there could be layers to the API? Arrow/Acero, for instance, isn't going to have support for any DDL writing or reading, so having it in a separate layer would make it easier to simply not implement.

}

// The operator that modifies the schema/content of a database
message WriteRel {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the analogue / opposite of a "named table read". In Arrow we also have a write rel but it is the analogue / opposite of a "local files read". We take in options like "destination directory", "file format", "partitioning", etc.

It would be nice to structure things in a way that the ReadRel mirrors the WriteRel which, in this case, I think would mean introducing a WriteType and having this content (table, base_schema, defaults) be inside a NamedTableWrite message or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonpace

Regarding WriteType I like, let me and @jcamachor try.

As for DDL, I am not worried about the "read" path, as this is handled by most (all?) RDBMS by simple using SELECT on a special DB that is about metadata of the system (usually called information_schema), so things would look like:

SELECT table_name, table_schema, table_type
FROM information_schema.tables

(And there are other tables for views, indexes, constraints, columns, etc..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we tried to implement both the split suggest by @jacques-n and the NamedTableWrite suggested by @westonpace.

I am still very convinced that DDL would be very very nice to have (and seems to add minimal complexity).

Copy link
Contributor Author

@curino curino Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like this version, I can propagate to website etc. We will also likely do in a separate patch the isthmus changes needed to test all this out.

Once again, we have been co-developing this with @jcamachor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually called information_schema

If one of the goals of Substrait is to try to remedy the current situation where each system has its own flavor of SQL, rather than just being binary SQL, I don't think that should be the way to go. It also doesn't extend to metadata queries for file data sources.

I'm trying not to get too involved in these additions because with my background I feel like an outsider looking in, but as that outsider, generally I feel like lots of things are currently being specified based on how SQL has always pragmatically done things, rather than based on what makes sense. For example, why should something like a drop table or insert into return a table? Virtually every other language has a statement-vs-expression-like concept to allow for things that don't return anything. Elsewhere I joked about SELECT * FROM (DROP TABLE *) to masquerade a malicious plan as something immutable; I don't know if SQL allows this but if these DDL statements are modeled as regular 1-input 1-output relations I see no reason why Substrait wouldn't. I personally liked the original proposal that extended the PlanRel oneof a lot more than using WriteRel for this reason. Just my two cents I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we could "extend to metadata queries for file data sources" by introducing an operator that extract metadata from a file as a relation, and then allow the standard SELECT * FROM information_schema.tables type tricks, which I find very pleasant in RDBMSs as you get to use all the tools and language expressivity you have for data also for metadata.

drop table (and CREATE/ALTER for table and VIEWS) do not return a rel (in the latest patch), but INSERT/DELETE/UPDATE do. Why does SQL support that I am not sure, but poking around I have seen folks in blogs/discussions do interesting things to compactly execute a statement and verify properties about what happened in the DBMS, avoiding odd workaround like selecting into temporary tables, executing and then dropping temp tables, etc. Intuitively it helps keep the language as "close" as possible, which I think is very convenient in many ways (I think @jacques-n hints at that below).

@jacques-n
Copy link
Contributor

Couple of comments here:

The reason I think CREATE, ALTER, DROP should be distinct from DELETE, UPDATE, INSERT: one takes an input tree and one does not. In some ways, this is akin to the Arrow Flight distinction between "actions" and get/put.

I'm fine with DDL but I agree @jvanstraten that it should be less SQL specific. I also suggest we start with the big ones: DROP a table, change a table schema and create a table. For the schema change, we should think about the operation int the context of Iceberg and how we would express it here.

WRT to the top-level concept some more thoughts... I'm really mixed on introducing this at that level. Composability of relations is a very powerful thing, even if those relations are writes. I've definitely built multiple systems modeled this way and when you think about complex write paths like Apache Iceberg, treating parquet write as a transformer (transforming many records into pointers to parquet files) that then hands those off to another writer (transforms parquet pointers into manifest pointers) handing off to another writer (transforms manifest pointers into manifest list pointers), this makes things much cleaner from an implementation & scaling perspective. I also agree that there are times where that composability is unnecessary. Introducing these concepts as two discrete things feels excessive and thus I was inclined to only keeping as a rel (rel's feel "cheaper" in the spec than introducing top concepts with special semantics at other layers).

I think the returning concept needs more work. Returning could return operation data, a subset of the fields in the write rel input tree, or something else. I don't actually understand the suggestion to use rel in this position as I don't know how the returning rel would refer to the input tree (which it seems like it needs to).

@curino
Copy link
Contributor Author

curino commented Jun 30, 2022

I love the discussion going on here!

I can go either way on the top-level vs not argument. @jacques-n proposal (everything is a Rel) makes things very compact, which I like, though separating things at the top level did feel easier to follow for a new user looking at it.

As for CREATE/ALTER/DROP I am fine to keep it separate as it allows us to at least remove the output field. However given the CTAS and VIEWs we still need a Rel there. The semantics I have in mind for schema changes is simply that we define the full after-image of the schema with the NamedTableWrite. I can use some help in understanding how to represent this correctly for Iceberg/files setup as I am less familiar with those. My gut feeling is that whether you represent things as tables or not, Substrait is inherently relational so it is ok to think about the world as tables. Systems can implement those as appropriate (including No-ops if some operations don't apply to their backend data storage mechanism).

The way I was thinking about the returning semantics was for the output rel to "build upon" the query rel, as follows:

  1. For INSERT it can return the raw tuple inserted, or any projections / aggregate / etc. over them
  2. For DELETE it can return the before-image of the tuples being removed, or any projection/aggregation/etc. over them)
  3. For UPDATE the assumption of building upon the query is a bit limiting, as we can only return after-images (as that is the semantics for the query that can produces the after image of the change (with the semantics of that being installed in-place by the engine). In SQL Serve and other engines one is also allowed to output values from the before-image (or a mix/expression of both). I think it is ok to have a limited support for this scenarios. Alternatively we can argue that output builds straight from the raw tables/files and so you can produce whatever output you want, but then making sure the query and the output are aligned correctly is a tricky business (the validator would have to do some serious introspection in the queries to prove that).

One more thought around the "closeness" of Rel is that we could skip the output Rel and have a single boolean to say return tuples or not, because we can then wrap it in another Rel and shape the returning relation as much as we like. This would simply the message a bit (but has the same limitation for UPDATE returning semantics as in (3) above).

@rtpsw
Copy link

rtpsw commented Jul 4, 2022

I posted #239 and got pointed here. I could probably provide some feedback.

}

// The operator that modifies the schema/content of a database
message WriteRel {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#239 has the goal of multiple outputs from the Substrait plan, and in particular it distinguishes between the table it writes out to the table it passes through. Could this goal be met here? What table does WriteRel here pass to a relation that has it in its Rel input field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the Rel output, however that is typically used to report on which tuples have been modified (often simply counting them and returning that as an aggregate, or reporting before/after images for an update, etc.. lots of semantics exist, hence allowing for an arbitrary query on it).

I think it "might" work, but I would suggest you take a look at it once we merge this, and evolve it as needed. I would probably think of a multi-output as a spool followed by simple writers, but this is just me.

// Definition of which type of write operation is to be performed
oneof write_type {
NamedTableWrite named_table = 1;
ReadRel.LocalFiles local_files = 2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think ReadRel.LocalFiles works here. For one thing, the start and length field are meaningless for writing, and likely so are future specific read options (e.g., for ParquetReadOptions. If the desire is to reuse fields, common fields (for reading and writing) should be refactored into a common place, such as CommonRel.LocalFiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going for this based on some previous comment to avoid breaking changes where possible. Moving LocalFiles outside of ReadRel will be a breaking change.

oneof write_type {
NamedTableWrite named_table = 1;
ReadRel.LocalFiles local_files = 2;
ReadRel.ExtensionTable extension_table = 3;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be refactored to a common place (for reading and writing) too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going for this based on some previous comment to avoid breaking changes where possible. Moving LocalFiles outside of ReadRel will be a breaking change.

// affected tuples (per common behavior of most RDBMS). Defaults to no output.
Rel output = 6;

enum WriteOp {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for merge operations, which define the written result as a function of both the query result's row and the one already stored? For example, such a merge operation could append to a string column, or increment a numeric column, or even used for distributed statistical aggregation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that does exist and I think eventually we should cover it, but looking at the potential richness of semantics (e.g., see example for SQLServer) it does look a bit complex, so I was leaving this for a later patch (as we are already taking on a few things here).

// A base table for writing. The list of string is used to represent namespacing (e.g., mydb.mytable).
// This assumes shared catalog between systems exchanging a message.
// it also includes a base schema, and default types
message NamedTableWrite {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the rationale for the fields in this message? I thought the DdlRel.query field already defines the schema except for the names; this schema-and-names pattern is also seen in RelRoot adding names to its Rel input's schema. I'm also not sure why the default values are not implicit in the query.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note that my DB background is mostly as a user, so I'm not familiar with some of the terminology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons:

  1. for a CREATE TABLE we would not have a input and only specify the schema (which must include defaults, which are missing in Rel)
  2. I missed to reuse this field in WriteRel, where we could support INSERT that specify only a subset of the fields.

The alternative could be expecting input to include constants for all defaults in the "right places" (which I think is what you suggest). That could work, but feels less natural as most systems assume they can specify only a subset of fields in the INSERT, and if I am doing a CREATE TABLE feels more natural to feel-out the defaults list than to create a query of constants.

@curino
Copy link
Contributor Author

curino commented Jul 20, 2022

I created a new PR (#252) to fix the Lint commits issues (tried the other way, but always something going wrong, so cut a new branch and moved the patch).

@jacques-n
Copy link
Contributor

This is superseded by #252

@curino
Copy link
Contributor Author

curino commented Jul 28, 2022

Folks, can you please comment (or approve ;-)) #252 ?

@jacques-n
Copy link
Contributor

Superseded by #252

@jacques-n jacques-n closed this Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants