Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Possibility to strip datafusion error name #10186

Merged
merged 6 commits into from
Apr 27, 2024
Merged

Conversation

comphead
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Currently Datafusion wraps the original error message and includes Datafusion name on top of it.
If the original message is My Query Error the user after unwrapping the DataFusionError will receive wrapped message
Execution error: My Query Error or Error during planning: My Query Error, etc. Sometimes it is required to fetch the original message without Datafusion error name.

What changes are included in this PR?

DataFusionError impl expanded and includes the public method strip_error_name which unwraps error into original message

Are these changes tested?

Are there any user-facing changes?

@comphead
Copy link
Contributor Author

The implementation might not be super elegant though, please share your thoughts and opininions

@comphead
Copy link
Contributor Author

@andygrove cc

assert_eq!(res.strip_error_name(), "Err");

// Test only top level stripped
let res: Result<(), DataFusionError> = plan_err!("Err");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these are two scenarios that you are testing, should they go in two different test cases ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @edmondop I'd say the second scenario is variation of first one. Not sure if every variation requires its own test tbh

/// Error has a format `{error_name}{`error`}
/// The method strips the `error_name` from level and outputs 'error``
pub fn strip_error_name(self) -> String {
self.to_string().replace(self.get_error_name(), "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if we could avoid including the error message in the first place rather than removing it later. There is the risk of accidentally removing text that was not added by impl Display for DataFusionError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this in the first place.

  • We can fix the formatter so it will skip the error name for some format options, that doesn't work with to_string() which has default formatter.

  • we can tweak impl Display with cargo feature and return name as blank if the feature enabled and that looks little bit overcompicated.

  • we can refer to env variable and depending on the value define the error name but this might be expensive.

  • we can probably introduce a DF config param?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure exactly what the usecase is, but if seems like if the only need is:

Sometimes it is required to fetch the original message without Datafusion error name.

Rather than trying to fix the message after it has been created, how about we add a function that just provides the underlying message? Something like

impl DataFusionError {
  /// retrieve the underlying error message for this error, without the
  /// type prefix. 
  ///
  /// For example, if err.to_string() returns `Error during planning: My Query Error` 
  /// this function would return only `My Query Error`
  fn message(&self) -> &str { .. }
}

And then the impl could be

impl Display for DataFusionError {
    fn fmt(&self, f: &mut Formatter) -> std::fmt::Result {
        match *self {
            DataFusionError::ArrowError(ref desc, ref backtrace) => {
                let backtrace = backtrace.clone().unwrap_or("".to_owned());
                write!(f, "Arrow error: {}{backtrace}", self.message())
            }
            #[cfg(feature = "parquet")]
            DataFusionError::ParquetError(ref desc) => {
                write!(f, "Parquet error: {}", message)
            }
...
}

Maybe fn message(&self) would have to return Cow<str> or something to avoid copying in the external cases....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the simplest solution.

Our use case (Comet) is that we are using thiserror and we are delegating to DataFusionError for the error string and it currently uses the Display trait.

@comphead I don't know enough about thiserror yet to know if/how we could call the new method proposed here. Do you know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it works even without thiserror magic, we can call .message() method in From<> error conversion

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but agree that it would be better to rename error_name to error_prefix

@andygrove andygrove merged commit 513443d into apache:main Apr 27, 2024
23 checks passed
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants