-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10331: [Rust] [DataFusion] Re-organize DataFusion errors #8481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks @jorgecarleitao
|
@kszucs, would it be possible to avoid force-pushing to PRs after a release (even if master is force-pushed)? IMO it is very confusing to the contributor (committer or not). |
|
If a PR has a base commit that is one of the master commits that is part of the rebased master (i.e. some commit between the date the RC was cut and the date of the release), then the PR is completely borked after the master-rebase. We've seen this happen many times in the past. We could perhaps come up with some more intelligent script that only rebases PRs that are based on a commit hash that no longer exists in master |
|
We are already force-pushing to master on every release, which goes against best practices of an open-source project. AFAIK, in open source, there is a strong expectation that PRs are managed by individual contributors, and committers of the project only request contributors to make changes to them, or kindly ask before pushing (not force-pushing) directly to the PR. We are inverting all expectations and force-pushing to PRs(!!). Furthermore, those force-pushes actually break the PRs (as you can see above). IMO this drives any reasonable contributor to be pissed off at the team for what they just did:
I suggest that we:
In general, it is the contributor's responsibility to keep the PRs in a "ready to merge" state, rebasing them to master as master changes. IMO a force-push to master corresponds to a change in master, and thus it is the contributor's responsibility to rebase their PRs against the new master. |
|
I see your points. Keeping the main branch flat is important and the commits after the release tag should have the right version numbers since git is not available in all scenarios, hence the rebase after a release. We could certainly let the contributors to rebase their own branches, but in certain cases the contributors may be confused what to do after seeing many unrelated commits in their pull requests so additional maintainer roundtrips might be required to ask/advise for pull request rebases. From this perspective I rather find it useful although I'm also against force pushes in general. I suggest you to bring it up on the mailing list so the topic can reach a broader set of developers and maintainers. I'm sure that we can come up with a better solution after there is a consensus. |
|
I agree with raising the matter on the mailing list -- this is a project governance issue and so needs to be discussed there. We did not arise at the current practices idly and there are pros/cons to whatever approach is taken. |
This PR:
ExecutionErrortoDataFusionErrorDataFusionError::ParserErrortoDataFusionError::SQLDataFusionError::InternalErrortoDataFusionError::InternalDataFusionError::ExecutionErrortoDataFusionError::ExecutionDataFusionError::Planthat is used during planningDataFusionError::InvalidColumnthat was not being used.DataFusionError::General, replacing them by the appropriate errorsDataFusionError::Internalto incentivize users to file a bug report when it happensThe design behind this PR is that the error variants should correspond to what happened:
Internal: a Datafusion's internal invariant was violated (e.g. downcast failed) => file a bug reportPlan: planning was incorrectNotImplemented: something is not implemented and we know about it. Ideally, we should have an associated JIRA issueExecution: an error during execution. We should avoid raising these, but sometimes it is impossible.IoError: stuff related with reading and writingI went through every error that we return in
DataFusionand verified that it is assigned correctly to one of these variants.I am a bit uncertain about the
ParquetErrorandArrowError. IMOArrowErrorshould be mapped toDataFusionError::Execution, as it only happens during execution, andParquetErrorshould be mapped to anIoError.I also think that we should split
NotImplementedin two:NotImplementedandNotSupportedas e.g.float16is something that we will likely never support, while "modulus" is just not implemented yet.