feat(proto): add write relation to support multiple outputs#239
feat(proto): add write relation to support multiple outputs#239rtpsw wants to merge 1 commit intosubstrait-io:mainfrom
Conversation
|
cc @icexelloss |
|
#236 is focused on write rel. Let's get that merged first. Would be good to get your feedback/thoughts on the direction there. |
@jacques-n, thanks for pointing me to this PR. I'll post there. |
|
@rtpsw Can you remind me why we need this again? |
See issue description. Though it's not on our critical path, it has been useful in my local work, and I expect it would be useful more generally. It's possible this issue would get merged into #236 . |
|
Sorry for the late response. I think #284 + #252 should conceptually work. @jacques-n, could you explain how the use case I'm targeting here would map to Substrait messages, in particular using |
|
I'm not sure, but isn't the current idea that a plan should only have one root relation? Nothing in the protos prevents multiple root relations and I don't think the website says anything about it, but if that's the case then that technically disallows the use case you put forward, and a minor spec change would still be required. Personally I don't see why this should be disallowed; if a consumer doesn't want to support multiple roots then it can just reject a plan with multiple in it. Otherwise, a WriteRel that passes all data through is not necessary, because you could write the hypothetical
The actual new relations are not part of the oneof in Rel yet, that's a bug that's being addressed in #288. Also, WriteRel as currently defined isn't actually the complementary operation of ReadRel. You mention "check-point and debug output," so I imagine you're also interested in file sinks, which currently fundamentally don't work because WriteRel is more of a ReadModifyWriteRel or UpdateRel than a strict write relation, and thus file output was conveniently left out. |
|
We should distinguish between the use case, which is additional outputs for check-point and debug purposes, from the draft proposal in this PR. While I'm no longer requesting the current PR, due to the other PRs described above, I'm still interested in the use case (though not at a high priority). I think the use case is not handled until a way to get check-point and debug output is spelled out. So, my question stands and #238 should be left open.
I agree a solution for check-point and debug purposes could be based on files. However, I don't think the file has to be a sink. The update or read-modify-write operation fits a random-access/re-writable (relational) file. |
See #238