Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WEBSITE] DataFusion 16.0.0 blog post #294

Merged
merged 13 commits into from
Jan 19, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 7, 2023

Closes apache/datafusion#4804

This blog post highlights some improvements and features in DataFusion the last 3 releases 😅

Rendered: https://arrow.apache.org/blog/2023/01/19/datafusion-16.0.0/

@alamb alamb changed the title [WEBSITE] DataFusion 16.0.0 blog post [WEBSITE] DataFusion 16.0.0 blog post (WIP) Jan 7, 2023
@github-actions
Copy link

github-actions bot commented Jan 7, 2023

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@alamb
Copy link
Contributor Author

alamb commented Jan 7, 2023

It is a work in progress, but I think it is no coherent enough to gather some community input

Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be great to add a section to this document about planned feature work


## Community Growth

The three months since [our last update](https://arrow.apache.org/blog/2022/10/25/datafusion-13.0.0/) again saw significant growth in the DataFusion.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if someone could help clean this section up and clearly explain the growth of the community; There is a wonderful story there to tell

closing the gap quickly. Performance highlights from the last three
months:

* XX% Faster Sorting and Merging using the new [Row Format](https://arrow.apache.org/blog/2022/11/07/multi-column-sorts-in-arrow-rust-part-1/)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tustvold do you have any suggstions about what numbers to use here?

* Basic filter selectivity analysis (#3868)


In the coming few months, we plan work on:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback from the rest of the community would be great

_posts/2023-01-07-datafusion-16.0.0.md Outdated Show resolved Hide resolved
_posts/2023-01-07-datafusion-16.0.0.md Outdated Show resolved Hide resolved
- Implement current_date scalar function (#4022)
- Compressed CSV/JSON support (#3642)

The community has also been investing in sqllogic based tests to help keep DataFusion's quality high with less work (TODO add some more detail / lnks)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xudong963 I wonder if you have any thoughts on how to word this better


# Substrait

TODO motivating introduction of substrait and why this is interesting
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andygrove perhaps you can help with content for the substrait area

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substrait isn't going to make it into 16, and maybe this should be a separate post? I started a google doc https://docs.google.com/document/d/1vK0AyDBhIibmKZ2scGN3jBypBvqMPuztbBzZ1eh0dKM/edit?usp=sharing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, maybe there is a chance it makes it in. Assuming the vote passes tomorrow, we could get the first PR merged 🤔

DataFusion has basic python bindings which has the potential to expand datafusion to more end users a major missing piece are the python bindings


# python bindings and growing the community and ecosystem
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to work in a mention of the python bindings and encouraging a champion for them to step forward, however it felt more like it should be a separate post 🤔 -- @andygrove what do you think about a post describing the python bindings, why they are cool, and trying to find people to help drive that project?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out how to work this in so I think we should write another post -- maybe on a different site

I took the content / notes I had and put them in a google doc: https://docs.google.com/document/d/1zNfK8pIOqgHURX2lHK0JhSKTaCH3t23tYDKpGLWdFRY/edit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I agree. I can work on the Python post


Growth of new systems based on as the engine in [many open source and commercial projects](https://github.com/apache/arrow-datafusion#known-uses) and was one of the early open source projects to provide this capability.

Several new databases built on datafusion (synnada.ai, greptimedb, probably others)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what I am aware of:

Databases: greptimedb (new), IOx (GA)
Data platform: Synnada (new)
Use case: Backend for PRQL (relatively new?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- added in ffe2e0a. Still needs polish

@andygrove
Copy link
Member

I will start contributing to this tomorrow

@alamb alamb changed the title [WEBSITE] DataFusion 16.0.0 blog post (WIP) [WEBSITE] DataFusion 16.0.0 blog post Jan 10, 2023
@alamb
Copy link
Contributor Author

alamb commented Jan 10, 2023

Ok I think this one is now ready for some more review -- it is plausibly ready to publish

Copy link

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read from start to finish. Apart from the small preposition comment I left inline, it looks great. Looking forward to seeing this published.

@alamb
Copy link
Contributor Author

alamb commented Jan 17, 2023

I plan to merge this tomorrow unless there are any other comments

@alamb alamb merged commit 768be07 into apache:master Jan 19, 2023
@alamb alamb deleted the alamb/datafusion_update_16 branch January 19, 2023 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blog post about datafusion 16 release
4 participants