Skip to content

Conversation

@jorgecarleitao
Copy link
Member

This PR:

  • Renamed ExecutionError to DataFusionError
  • Renamed DataFusionError::ParserError to DataFusionError::SQL
  • Renamed DataFusionError::InternalError to DataFusionError::Internal
  • Renamed DataFusionError::ExecutionError to DataFusionError::Execution
  • Adds a new error variant DataFusionError::Plan that is used during planning
  • Removes DataFusionError::InvalidColumn that was not being used.
  • Removes DataFusionError::General, replacing them by the appropriate errors
  • Improves the message of DataFusionError::Internal to incentivize users to file a bug report when it happens
  • Extended the documentation of every variant

The design behind this PR is that the error variants should correspond to what happened:

  • Internal: a Datafusion's internal invariant was violated (e.g. downcast failed) => file a bug report
  • Plan: planning was incorrect
  • NotImplemented: something is not implemented and we know about it. Ideally, we should have an associated JIRA issue
  • Execution: an error during execution. We should avoid raising these, but sometimes it is impossible.
  • IoError: stuff related with reading and writing

I went through every error that we return in DataFusion and verified that it is assigned correctly to one of these variants.

I am a bit uncertain about the ParquetError and ArrowError. IMO ArrowError should be mapped to DataFusionError::Execution, as it only happens during execution, and ParquetError should be mapped to an IoError.

I also think that we should split NotImplemented in two: NotImplemented and NotSupported as e.g. float16 is something that we will likely never support, while "modulus" is just not implemented yet.

@github-actions
Copy link

@nevi-me nevi-me requested a review from andygrove October 17, 2020 18:47
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks @jorgecarleitao

@jorgecarleitao
Copy link
Member Author

@kszucs, would it be possible to avoid force-pushing to PRs after a release (even if master is force-pushed)? IMO it is very confusing to the contributor (committer or not).

@wesm
Copy link
Member

wesm commented Oct 20, 2020

If a PR has a base commit that is one of the master commits that is part of the rebased master (i.e. some commit between the date the RC was cut and the date of the release), then the PR is completely borked after the master-rebase. We've seen this happen many times in the past. We could perhaps come up with some more intelligent script that only rebases PRs that are based on a commit hash that no longer exists in master

@jorgecarleitao
Copy link
Member Author

We are already force-pushing to master on every release, which goes against best practices of an open-source project.

AFAIK, in open source, there is a strong expectation that PRs are managed by individual contributors, and committers of the project only request contributors to make changes to them, or kindly ask before pushing (not force-pushing) directly to the PR. We are inverting all expectations and force-pushing to PRs(!!). Furthermore, those force-pushes actually break the PRs (as you can see above). IMO this drives any reasonable contributor to be pissed off at the team for what they just did:

  • force-pushing to master
  • force-pushing to their PRs
  • breaking their PRs's CI
  • no prior notice or request of any of the above

I suggest that we:

  1. stop force-pushing to PRs
  2. stop pushing to PRs without the contributor's explicit consent
  3. document in the contributing guidelines that master is force-pushed on every release, and the steps that folks need to take to bring their PRs to the latest master.

In general, it is the contributor's responsibility to keep the PRs in a "ready to merge" state, rebasing them to master as master changes. IMO a force-push to master corresponds to a change in master, and thus it is the contributor's responsibility to rebase their PRs against the new master.

@kszucs
Copy link
Member

kszucs commented Oct 20, 2020

I see your points.

Keeping the main branch flat is important and the commits after the release tag should have the right version numbers since git is not available in all scenarios, hence the rebase after a release.

We could certainly let the contributors to rebase their own branches, but in certain cases the contributors may be confused what to do after seeing many unrelated commits in their pull requests so additional maintainer roundtrips might be required to ask/advise for pull request rebases. From this perspective I rather find it useful although I'm also against force pushes in general.

I suggest you to bring it up on the mailing list so the topic can reach a broader set of developers and maintainers. I'm sure that we can come up with a better solution after there is a consensus.

@wesm
Copy link
Member

wesm commented Oct 20, 2020

I agree with raising the matter on the mailing list -- this is a project governance issue and so needs to be discussed there. We did not arise at the current practices idly and there are pros/cons to whatever approach is taken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants