Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFusion weekly project plan (Andrew Lamb) - Jan 8, 2024 #8786

Closed
7 tasks
alamb opened this issue Jan 8, 2024 · 13 comments
Closed
7 tasks

DataFusion weekly project plan (Andrew Lamb) - Jan 8, 2024 #8786

alamb opened this issue Jan 8, 2024 · 13 comments
Assignees

Comments

@alamb
Copy link
Contributor

alamb commented Jan 8, 2024

Follow on to Jan 1, 2024

Boilerplate Overview

The idea of this ticket is make my plans for DataFusion visible, largely for my own personal organizational needs, but also to:

  1. Try some different ways to communicate / coordinate in the community
  2. Help provide an interesting summary of what is happening in DataFusion this week

It would be great if anyone else who has plans like this for DataFusion could try to make them visible somehow as well 🙏 (feel free to copy / modify the format)

My (personal) plans for this week

Project Queue (list of future projects)

Projects I plan to prioritize reviewing / helping

Algorithm for (my) prioritizing PR reviews

Note there are many committers who can and do review and merge PRs, so this is not the priorities of the project as a whole, just the approximate algorithm I am using

Priority:

  1. Bug fixes (where something is just incorrect), especially regressions (where it used to work and now does not)
  2. Improvements directly related to features needed for InfluxDB (my employer)
  3. Documentation and test improvements (I view these as very strategically important)
  4. PRs that I think are strategically important
  5. Other new features / additions to functionality

The current strategically important projects in my head are:

Thus, if you are interested in contributing to DataFusion and are interested in a fast turn around time I would recommend looking into bug fixes / test improvements / documentation / etc.

If you propose adding new functionality, the review cycle will likely be longer. You can make it a shorter cycle by looking at the comments on other recent PRs and following the same model (e.g. ensure there are tests in sqllogictest for example, the CI passes, includes documentation, etc)

@matthewmturner
Copy link
Contributor

matthewmturner commented Jan 8, 2024

@alamb FYI I am away for a few days and then will get back into planning performance work.

Let me know if there's any area in particular you would like to tackle next.

@matthewmturner
Copy link
Contributor

I had in mind checking for more places to replace results with boolean and then seeing if I could extract some of the optimization ideas from @zeodotr on #7698. Or are you keen to make the DFSchema cheaper to copy?

@alamb
Copy link
Contributor Author

alamb commented Jan 8, 2024

I had in mind checking for more places to replace results with boolean and then seeing if I could extract so

I think that is the best idea. I would personally suggest focusing on the planning benchmarks and trying to improve whatever they show as the tall pole

@matthewgapp
Copy link
Contributor

@alamb just bumping the recursive CTE PR :) #7581

I fixed an issue that was raised and brought it up to date with main.

@alamb
Copy link
Contributor Author

alamb commented Jan 9, 2024

@alamb just bumping the recursive CTE PR :) #7581

I fixed an issue that was raised and brought it up to date with main.

Thank you @matthewgapp -- I keep hoping someone else can help with that review too that has an interest in recursive CTEs. I'll try and take a look later today

@maruschin
Copy link
Contributor

@alamb hi, сould you suggest an area where to direct my attention. I spend a lot of time looking for issue to start working on.

@alamb
Copy link
Contributor Author

alamb commented Jan 11, 2024

@alamb hi, сould you suggest an area where to direct my attention. I spend a lot of time looking for issue to start working on.

Thanks @maruschin !

If you have time, looking into the bug report here #8819 would be super helpful

Debugging this report #8702 would also be helpful (as would in general making the datafusion-cli output generation code simpler)

There is also supporting partitioned writing that seems like it would be great to implement: #8493

@alamb
Copy link
Contributor Author

alamb commented Jan 14, 2024

Done in #8864

@alamb alamb closed this as completed Jan 14, 2024
@alamb alamb reopened this Jan 14, 2024
@alamb alamb closed this as completed Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants