-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WEBSITE] DataFusion 16.0.0 blog post #294
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format?
See also: |
It is a work in progress, but I think it is no coherent enough to gather some community input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be great to add a section to this document about planned feature work
|
||
## Community Growth | ||
|
||
The three months since [our last update](https://arrow.apache.org/blog/2022/10/25/datafusion-13.0.0/) again saw significant growth in the DataFusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if someone could help clean this section up and clearly explain the growth of the community; There is a wonderful story there to tell
closing the gap quickly. Performance highlights from the last three | ||
months: | ||
|
||
* XX% Faster Sorting and Merging using the new [Row Format](https://arrow.apache.org/blog/2022/11/07/multi-column-sorts-in-arrow-rust-part-1/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tustvold do you have any suggstions about what numbers to use here?
* Basic filter selectivity analysis (#3868) | ||
|
||
|
||
In the coming few months, we plan work on: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback from the rest of the community would be great
- Implement current_date scalar function (#4022) | ||
- Compressed CSV/JSON support (#3642) | ||
|
||
The community has also been investing in sqllogic based tests to help keep DataFusion's quality high with less work (TODO add some more detail / lnks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xudong963 I wonder if you have any thoughts on how to word this better
|
||
# Substrait | ||
|
||
TODO motivating introduction of substrait and why this is interesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andygrove perhaps you can help with content for the substrait area
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substrait isn't going to make it into 16, and maybe this should be a separate post? I started a google doc https://docs.google.com/document/d/1vK0AyDBhIibmKZ2scGN3jBypBvqMPuztbBzZ1eh0dKM/edit?usp=sharing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, maybe there is a chance it makes it in. Assuming the vote passes tomorrow, we could get the first PR merged 🤔
DataFusion has basic python bindings which has the potential to expand datafusion to more end users a major missing piece are the python bindings | ||
|
||
|
||
# python bindings and growing the community and ecosystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to work in a mention of the python bindings and encouraging a champion for them to step forward, however it felt more like it should be a separate post 🤔 -- @andygrove what do you think about a post describing the python bindings, why they are cool, and trying to find people to help drive that project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't figure out how to work this in so I think we should write another post -- maybe on a different site
I took the content / notes I had and put them in a google doc: https://docs.google.com/document/d/1zNfK8pIOqgHURX2lHK0JhSKTaCH3t23tYDKpGLWdFRY/edit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I agree. I can work on the Python post
Co-authored-by: Andy Grove <[email protected]>
|
||
Growth of new systems based on as the engine in [many open source and commercial projects](https://github.com/apache/arrow-datafusion#known-uses) and was one of the early open source projects to provide this capability. | ||
|
||
Several new databases built on datafusion (synnada.ai, greptimedb, probably others) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is what I am aware of:
Databases: greptimedb (new), IOx (GA)
Data platform: Synnada (new)
Use case: Backend for PRQL (relatively new?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- added in ffe2e0a. Still needs polish
I will start contributing to this tomorrow |
Ok I think this one is now ready for some more review -- it is plausibly ready to publish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read from start to finish. Apart from the small preposition comment I left inline, it looks great. Looking forward to seeing this published.
…ite into alamb/datafusion_update_16
I plan to merge this tomorrow unless there are any other comments |
Closes apache/datafusion#4804
This blog post highlights some improvements and features in DataFusion the last 3 releases 😅
Rendered: https://arrow.apache.org/blog/2023/01/19/datafusion-16.0.0/