Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow Streaming Analytics #285

Open
Max-Meldrum opened this issue Nov 3, 2021 · 1 comment
Open

Arrow Streaming Analytics #285

Max-Meldrum opened this issue Nov 3, 2021 · 1 comment
Labels
Discussion domain: arrow Anything related to Arrow

Comments

@Max-Meldrum
Copy link
Member

We are currently using Arrow mainly for converting Protobuf state into queryable Arrow data. I think it would be interesting to see how we could use Arrow for streaming operations and how this can be exposed in the API (e..g, Dataframes?).

As of now, the only thing we have is ArrowWindow that exposes (Arc<Schema>, Vec<RecordBatch>) per Window trigger.

Relevant references:

https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/trill-vldb2015.pdf
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/dynamic_tables/

@Max-Meldrum Max-Meldrum added Discussion domain: arrow Anything related to Arrow labels Nov 3, 2021
@Max-Meldrum
Copy link
Member Author

Maybe we could define a ArrowStream type which would support a number of transformations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion domain: arrow Anything related to Arrow
Projects
None yet
Development

No branches or pull requests

1 participant