Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION]: move sqlparser to Apache (DataFusion) governance #1294

Closed
alamb opened this issue May 30, 2024 · 41 comments
Closed

[DISCUSSION]: move sqlparser to Apache (DataFusion) governance #1294

alamb opened this issue May 30, 2024 · 41 comments

Comments

@alamb
Copy link
Contributor

alamb commented May 30, 2024

(disclaimer: I am biased being the one who merges sqlparser prs and also am the Apache DataFusion PMC chair)

Problem Statement

sqlparser seems to have become the defacto sql parsing library in Rust (5.5M downloads at the time of this writing) 🎉

However the sqlparser-rs project doesn't have sufficient maintainer capacity. I (@alamb) do enough to keep it from going entirely dormant, but that is really not sufficient for a healthy project.

Here are the specific problems:

  1. Having contributors wait weeks for review feedback is a bad experience for everyone involved and for that I apologize.
  2. There is not enough capacity to drive large projects (e.g. token locations) forward

Challenges with current governance structure (or lack thereof)

  1. There is no clear way to add additional maintainers
  2. Some employers (for example Apple) only permit contributions to explicitly vetted projects with clear governance (e.g. ASF)

Past discussions:

When DataFusion was part of the Apache Arrow project, we didn't have the correct space to bring SQL parser at that time

Now that DataFusion is its own top level project (with @andygrove and myself on the PMC) there is a natural space to do thos

Specific Proposal:

  1. Move the sqlparser-rs code (and commit history) into the Apache DataFusion project and under its Governance. This would require an IP clearance process to run and would take time.
  2. Move sqlparser-rs repository to apache/datafusion-sqlparser
  3. Archive this repository, and leave links to apache/datafuson-sqlparser
  4. Continue to release sqlparser versions approximately monthly.

Benefits of ASF governance;

  1. More people can approve/merge PRs (committers to DataFusion)
  2. Clear governance structure (rather than sqlparser-rs today which seems to be mostly me)
  3. Clear path to add additional maintenaners (e.g. committers)

Drawbacks

  1. There is a danger that sqlparser becomes "captured" by DataFusion and only accepts features needed for DataFusion
  2. There is additional overhead to the ASF process (releases, in particular, take additional non trivial overhead)

There is plenty of experience with the ASF release process in DataFusion so I don't think that is a major hurdle. I also think DataFusion in general and sqlparser in particular has a long history of accepting features that benefit all users not just maintainers, so I am not worried about this either (but I am of course biased)

cc @Dandandan @tobyhede @andygrove @maxcountryman @nickolay

@andygrove
Copy link
Member

Thank you for restarting this conversation @alamb. I am also biased as the original author of sqlparser-rs and DataFusion, but I do think the project will have more success under ASF governance, so I am in favor of this proposal.

@lovasoa
Copy link
Contributor

lovasoa commented May 30, 2024

Hello! I'm not against it, and would love to be added as a maintainer under the asf governance, if we need some balance to avoid too much bias towards datafusion.

sqlparser-rs is a crucial component of SQLPage, and I'll keep making pull request either way.

@vaibhawvipul
Copy link

I am in support of this.

@alamb
Copy link
Contributor Author

alamb commented May 30, 2024

Hello! I'm not against it, and would love to be added as a maintainer under the asf governance, if we need some balance to avoid too much bias towards datafusion.

Yes, thank you @lovasoa -- I would expect to discuss adding committers to DataFusion who focused on sqlparser-rs (as we have committers focused on another subproject, comet, and we had committers focused on datafusion when it was part of arrow)

@jmhain
Copy link
Contributor

jmhain commented May 30, 2024

I just realized I never actually responded to #1243. So: I'm happy to participate as a maintainer of sqlparser-rs regardless of where it ends up. I have a slight preference for keeping it independent but the arguments in favor of moving it under DataFusion seem reasonable.

@maxcountryman
Copy link
Contributor

My preference would be to keep the project independent but I'm not able to commit much time to it and it's admittedly a preference mostly based on the idea of how things might appear were it to be moved under the DataFusion project.

@nickolay
Copy link
Contributor

I like how @maxcountryman put it, but I believe the decision must be done by those maintaining the project.

I will take the opportunity to express my gratitude to @alamb for keeping the project going for as long as he has. Thanks for doing this, Andrew!

@tobyhede
Copy link
Contributor

You have my full support.
I am personally so thankful for the endless hours you have put into sqlparser @alamb.

I would love to be involved in any way I can.

@cisaacson
Copy link

cisaacson commented Jun 3, 2024

@alamb I think this is a very good idea. One question though: If a fork is created now, it is just of the sqlparser-rs project. If you go this way would it mean creating a fork of the entire DataFusion project? That could be OK but I would like to know what would be involved to make custom changes to sqlparser (or other components for that matter). Since the entire DataFusion project changes fairly frequently it may be some work to keep up with those changes.

In looking into this further I see the workspace is defined in the datafusion/Cargo.toml workspace. So it looks like someone will need a fork of the whole DataFusion project to make something work in any case. This should be manageable. Unless advised otherwise that's what I will do.

Regardless I am fully supportive of the idea of moving sqlparser-rs under the DataFusion umbrella so it is a full-fledged Apache project.

@alamb
Copy link
Contributor Author

alamb commented Jun 4, 2024

If you go this way would it mean creating a fork of the entire DataFusion project?

No I would expect that sqlparser-rs remains in its own repository (just the github organization would be different)

@alamb
Copy link
Contributor Author

alamb commented Jul 9, 2024

@dharanad
Copy link

I am in favor of this proposal

@alamb
Copy link
Contributor Author

alamb commented Jul 29, 2024

We have started the ip clearance process: https://docs.google.com/document/d/1XPp1zG8yfkCfjusQH7_qr8FMBDDEQ5psR7HhZ1LuVC8/edit#heading=h.nlygv2z3fdg2

Mailing list thread: https://lists.apache.org/thread/1s1qv6gxy0ltzrqq9skxt7lomlq9w4gj

@lewiszlw
Copy link
Member

I think moving sqlparser-rs into datafusion repo will get more visibility and seems reasonable. Not very sure separate repo can substantially improve the current situation. But we can do this in followup action.

@alamb
Copy link
Contributor Author

alamb commented Sep 4, 2024

@iffyio @lovasoa and @jmhain -- we are trying to move this process along and part of doing so is to get documentation in place for the Apache Software Foundation that all the code in the repo is ok from an ip clearance perspective.

As the most active contributors recently, would you be willing to submit an "Individual Contributor Licence Agreement" (ICLA )to the ASF? The document can be found here: https://www.apache.org/licenses/contributor-agreements.html

This document re-affirms that you have donated all code under the Apache Software License, and is the first step in becoming a committer on the project.

There is more details in @andygrove 's doc here:
https://docs.google.com/document/d/1XPp1zG8yfkCfjusQH7_qr8FMBDDEQ5psR7HhZ1LuVC8/edit

@tobyhede
Copy link
Contributor

tobyhede commented Sep 6, 2024

@alamb On it.

@tobyhede
Copy link
Contributor

tobyhede commented Sep 9, 2024

@alamb I've signed the ICLA

@andygrove
Copy link
Member

@iffyio has submitted an ICLA

@alamb
Copy link
Contributor Author

alamb commented Sep 10, 2024

@andygrove what are the next steps here ? Is there anything i can do to help the ip clearance form (e.g. make ticket and @ mention the various sqlparser cotributors)? Help submit the ip clearance form?

@andygrove
Copy link
Member

@alamb The only blocker has been waiting for the ICLAs to be filed. The next step is converting this doc to XML and creating a PR (IIRC). Out of the original active contributors, only @lovasoa is unable to complete an ICLA at this time (they are too busy, which perhaps means they are no longer an active contributor?). I don't know if there are new active contributors since this process started. I will need to check.

@git-hulk
Copy link
Member

@alamb @andygrove I have submitted iCLA to ASF for another project(Apache Kvrocks), so it should be good to count me(apache id: hulk) as recorded if requested iCLA for all contributors.

@lovasoa
Copy link
Contributor

lovasoa commented Sep 11, 2024

I sent my completed ICLA :)

@jmhain
Copy link
Contributor

jmhain commented Sep 15, 2024

Just sent my completed ICLA as well.

@andygrove
Copy link
Member

Thanks, everyone. I will start on the next step of this process in the next day or two.

@alamb
Copy link
Contributor Author

alamb commented Sep 17, 2024

Relevant mailing list discussion: https://lists.apache.org/thread/vv279vtm6q7cfnbgwx4dlm7yj9j93ly4

New repository is here: https://github.com/apache/datafusion-sqlparser-rs

My planned next step (if no one beats me to it) is to create a PR with the code (and all the commits) from sqlarser-rs into the new repository:

Some other things (I will file tickets to orgnaize this):

  1. Migrate existing open issues over somehow
  2. Update readme
  3. Update release process
  4. Add a note in the existing repositories README with pointer to the new repo
  5. Potentially update the crates.io ownership information
  6. Work on getting sqlparser contributors commit rights

@lovasoa
Copy link
Contributor

lovasoa commented Sep 18, 2024

Why not migrating the existing repository instead of creating a new one ? This would keep the issues, pull requests, etc...

@alamb
Copy link
Contributor Author

alamb commented Sep 19, 2024

🤔 that is a good idea. The trick is to figure out if the apache (destination organization) can accept such transfers. I will open a ticket to find out

@git-hulk
Copy link
Member

🤔 that is a good idea. The trick is to figure out if the apache (destination organization) can accept such transfers. I will open a ticket to find out

Yes, it can. I did migrate another project into ASF: https://issues.apache.org/jira/browse/INFRA-25024, hope this would help.

@alamb
Copy link
Contributor Author

alamb commented Sep 19, 2024

Thanks @git-hulk and @lovasoa -- I have filed https://issues.apache.org/jira/browse/INFRA-26136 to request the transfer.

@andygrove
Copy link
Member

andygrove commented Sep 19, 2024

Here is the XML form for the IP clearance process:

https://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/datafusion-sqlparser.xml

As part of this process I would like to remind all contributors that they are responsible for ensuring that a Corporate CLA is recorded if such is required to authorize their contributions under their individual CLA

@andygrove
Copy link
Member

HTML form: https://incubator.apache.org/ip-clearance/datafusion-sqlparser.html

We need to do a couple more things before we can start the IP clearance vote:

  • Add ASF headers to source code in this repo
  • Provide a snapshot of the code (and checksum) for review. I propose that we create a GitHub release and that automatically makes an archive available

@andygrove
Copy link
Member

Don't we need to complete the IP clearance process before moving the code into the new repo?

@alamb
Copy link
Contributor Author

alamb commented Sep 19, 2024

Don't we need to complete the IP clearance process before moving the code into the new repo?

We certainly need to complete the IP clearance process before making a new release. I am not sure about moving into the apache github organization

I will create a PR later today to add Apache License headers to the files in sqlparser-rs

@alamb
Copy link
Contributor Author

alamb commented Sep 19, 2024

Filed #1436 to track migration issues

@alamb
Copy link
Contributor Author

alamb commented Sep 20, 2024

I will create a PR later today to add Apache License headers to the files in sqlparser-rs

The PR is here: #1437

@andygrove
Copy link
Member

The IP clearance vote has passed. Thanks to everyone who helped with this process!

https://lists.apache.org/thread/4qhyk23d0g01g3ccsffsxygwd6vj9b8k

@andygrove
Copy link
Member

I see that the repo has now moved 🎉

@alamb
Copy link
Contributor Author

alamb commented Sep 30, 2024

I plan to review / update #1436 tomorrow with next steps

@alamb
Copy link
Contributor Author

alamb commented Oct 2, 2024

Now that the repo has moved and the project has been accepted, let's track the transition work in #1436. Gere is the proposed plan going forward:

#1436 (comment)

Comments more than welcome

@mobuchowski
Copy link
Contributor

@alamb as maintainer of other OSS project that uses sqlparser-rs (OpenLineage) just wanted to thank you for the continued maintanence, support and merging a few of our PRs :)

@alamb
Copy link
Contributor Author

alamb commented Nov 6, 2024

@mobuchowski -- thank you for the kind words

The last few months @iffyio has really been the key maintainer. 🙏 Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests