Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compare result based on semantics #93

Open
ZENOTME opened this issue Oct 20, 2022 · 14 comments
Open

compare result based on semantics #93

ZENOTME opened this issue Oct 20, 2022 · 14 comments

Comments

@ZENOTME
Copy link
Contributor

ZENOTME commented Oct 20, 2022

Background

I'm trying to introduce INTERVAL type in Postgres-extend engine and find a problem caused by the way we test result.

In sqllogictest::AsyncDB, our run interface requires to return a string to compare with the expect result.
async fn run(&mut self, sql: &str) -> Result<String, Self::Error>

But there are some case in which the String is different but the semantics is the same.
Such asinterval '30 days'and interval ' 720:00:00', there are different string but the same semantics.

For some type(such as interval), there are different string format to express a same thing in same semantics. So do we need to add a way to compare the 'semantics' rather than compare the 'string'. I try to think a way to fix it but I don't wether it's worth or necessary. Because this case only exist in 'Interval', 'timstamptz' and other time-related type. So we can also declare the format must be equal to result from psql.

Maybe we can fix it by...

make the result can be multi-format. Such as:

async fn run(&mut self, sql: &str) -> Result<Ans, Self::Error>
enum Ans{
  str(String),
  Interval(...)
}
@skyzh
Copy link
Member

skyzh commented Oct 20, 2022

Why not format the timestamp types into strings?

@skyzh
Copy link
Member

skyzh commented Oct 20, 2022

I'd suggest implementing a custom result type and a custom comparator in sqllogictest. It's impossible to cover all data types in this framework.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Oct 20, 2022

Why not format the timestamp types into strings?

I'm not sure what you means. A timestamp type can be format into different string with the same semantics.

I'd suggest implementing a custom result type and a custom comparator in sqllogictest. It's impossible to cover all data types in this framework.

Sounds like a good way to solve this. I can try it.

@skyzh
Copy link
Member

skyzh commented Oct 20, 2022

I'm not sure what you means. A timestamp type can be format into different string with the same semantics.

You can format it to a fixed format (e.g., represented by hours, or represented by days, not sure how we can do this) before comparing.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Oct 20, 2022

I'm not sure what you means. A timestamp type can be format into different string with the same semantics.

You can format it to a fixed format (e.g., represented by hours, or represented by days, not sure how we can do this) before comparing.

So does it also mean '*.slt' file should use the fixed format?

@skyzh
Copy link
Member

skyzh commented Oct 20, 2022

Yes. All should use days / hours, probably, if we only want to work on strings instead of interpreting the actual values.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Oct 20, 2022

Yes. All should use days / hours, probably, if we only want to work on strings instead of interpreting the actual values.

If we can unify the '*.slt' file, this way seems to be a most simple way to solve this.

I will:

  1. unify the '*.slt' file, it can let us introduce new type(interval,timestamptz) quickly.
  2. try to implement custom result type later, it seems a good way for us to support new type in the future.

@xxchan
Copy link
Member

xxchan commented Oct 22, 2022

To summarize, when a type has more than one representations for the same value, an engine can convert it to a canonial form. The limitation is that the test cases should also be written in canonical form, which will also make some engines imcompatible, e.g., postgres and postgres-extended. A better way is to let the engine provide a custom comparator, but cmp(&str,&str) is hard to do. It can be better to let the engine provide a result type and the comparator can be cmp(&ResT, &str). It's slightly better but still need to do some parsing. 🤔

The behavior can be tweaked by using different engines. e.g., if some users do want to test the results AS-IS, they can just use a non-canonical engine.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Oct 23, 2022

It's slightly better but still need to do some parsing. The behavior can be tweaked by using different engines. e.g., if some users do want to test the results AS-IS, they can just use a non-canonical engine.

After introducing the custom comparator, we don't need the canonical form. We can directly make postgres-extended to be a non-canonical engine.

@xxchan
Copy link
Member

xxchan commented Oct 23, 2022

After introducing the custom comparator, pg-extended is still kind of canonical.

Assume A1 has canonical form A.

// before
expected actual pg    pg-extended
A1       A1     pass  fail
A        A1     fail  pass

// after
expected actual pg    pg-extended
A1       A1     pass  pass (change here)
A        A1     fail  pass

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Oct 23, 2022

Why not change all
A A1
to
A1 A1
for the type have introduced the custom comparator.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Oct 28, 2022

pub trait CustomResult:ToString {
    fn cmp(&self,str:&str)->bool;
}

impl CustomResult for String {
    fn cmp(&self,str:&str)->bool{
        self == str
    }
}
pub type BoxResult = Box<dyn CustomResult>;

I try to create a CustomResult, but I find a this will caused we can't use unstable_sort() in runner. The reason is we can't compare the trait object.
So I thinking about two solution:

  1. reimplement a runner for CustomResult that can't support raw_sort, user should responsible to use 'order by'.
  2. I think the function of raw sort is to compare the result in ignoring order. We can just compare one row of object to all row of string. Such as:
output:
  [
     1: [object1 , object2 , object3]
  ]
expect:
  [
     1: [string1, string2, string3]
     2: [string4,string5, string6]
  ]
for output_row : output{
  for expect_row : expect{
     if compare(output_row,expect_row).is_success {
       ...
       break;
     }
    ...
  }
}

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Nov 8, 2022

For sqllogictest, I think the binary format isn't appropriate for us. So I try to modify the rust-postgres to request text format result in extended query protocol and use it in extended-engine locally. It can run all the e2e test in risingwave. I try to push this modification to rust-postgres but there are no reaction now. So I think whether we can maintain a downstream version of rust-postgres first.

@melgenek
Copy link
Contributor

Hi there!
I wanted to share one more use case for the problem that you're solving here.
Datafusion uses sqllogictest-rs to run some .slt tests to check compatibility between Datafusion and Postgres.

For now, we went the "canonical form" way. The string representation is what is compared, and both Datafusion and Postgres have to produce the same string representations.
In order to achieve the same text representation, we defined a set of typed value to string conversions for:

  • booleans
  • strings ((empty) and values)
  • decimals. Postgres and Datafusion sometimes use different underlying number types, which causes precision errors. So the conversions produce "canonical" rounded values.

The drawback of this approach is that I had to rewrite the postgres-extended to use these "canonical" conversions.
This approach also wouldn't work if the Postgres implementation is not based on types.

The current cross-engine cross-type logic is basically

typed result 1 -> canonical string
                                      -> compare strings
typed result 2 -> canonical string

Regarding the @ZENOTME's idea of custom comparators: the problem with them is that each engine implementation would need to know how to parse an opaque "expected result" string in order to compare it. It's a quite fragile approach because an engine cannot predict what users write in the outputs.

Additionally, the errors are now quite clear, because there are +/- showing the diffs between the expected and actual results. One would need to define both a comparator and a string representation in order to show errors.

The comparator logic also arguably seems more complicated to implement and maintain:

expected string -> parse to typed value -> compare to typed result 1
expected string -> parse to typed value -> compare to typed result 2

Defining "canonical" type representations seems like a big task. But, in general, the "canonical value" approach seems to be somewhat easier both to understand and implement for multiple engines, and multiple representations per type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants