-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add examples from TPC-H #666
Conversation
These examples are looking really nice @timsaucer. Don't feel that you have to wait until all of them are implemented before we start merging into main. We could do this in stages if you like. |
Thanks for the feedback. I am seeing a few differences between a couple of the results I'm getting and what's in the answers file, so I want to get those resolved before merging. I also want to put something in the readme pointing out which examples contain different features to make it easy for people to find things. At the rate I'm going, I'll probably have the last 10 done before mid week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only got through part of it so far, but awesome work.
I've added to the main readme in the examples folder, so I think this PR is good to go pending review. |
("C_ADDRESS", pyarrow.string()), | ||
("C_NATIONKEY", pyarrow.int32()), | ||
("C_PHONE", pyarrow.string()), | ||
("C_ACCTBAL", pyarrow.float32()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not important for example usage, but the numeric fields should be decimal not float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incredible work @timsaucer. This is super helpful to help new users understand how to use the DataFrame API.
Which issue does this PR close?
This PR does not close #440 but it helps to address one part of it.
Rationale for this change
One of the difficulties for new users to DataFusion is to find helpful examples. This PR adds in a series of examples that are based on the queries performed for the TPC-H benchmark. Those are known examples and we have a script in place to generate data for users to work with. By adding these examples, we will give new users both dataframe to work with and a series of examples showing Data Fusion in operation.
What changes are included in this PR?
This PR makes one change to the generator script to update it's docker image location.
All other changes are within the examples folder.
Are there any user-facing changes?
No user facing changes.