Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add sqllogictests crate #7134

Merged
merged 10 commits into from
Aug 2, 2023
Merged

Conversation

tshauck
Copy link
Contributor

@tshauck tshauck commented Jul 29, 2023

Hi! Opening this PR a bit early to see if it's going in there right direction. This is from the issue here #7133 and the discussion here #7032.

The gist is there's a sqllogictest folder that adds a crate for datafusion-sqllogictest. The main export is DataFusionTestRunner that implements the sqllogictest-rs AsyncDB. See below for an example of it being used in a branch of one of my projects.

My main question is how to handle the existing code that's internal to core and the ancillary postgres engine code... should the former be replaced with this and the latter stay, or should the latter come along into this new crate? Certainly open to direction here and anything else big or small.

Thanks!

Which issue does this PR close?

Closes #7133

Rationale for this change

This change makes it possible for crates that use datafusion to use its sqllogictest engine. E.g. presuming this is merged, the end user could do the following...

async fn run_tests() -> Result<(), DataFusionError> {
    let test_options = TestOptions::default();

    // Iterate through the test files and run the tests.
    let test_files = std::fs::read_dir(&test_options.test_dir).unwrap();

    let exon_context = Arc::new(SessionContext::new_exon());

    for test_file in test_files {
        let test_file = test_file.unwrap();

        // if the file doesn't end with an slt extension skip it
        if test_file.path().extension().unwrap() != "slt" {
            continue;
        }

        let mut runner = sqllogictest::Runner::new(|| async {
            Ok(DataFusionTestRunner::new(exon_context.clone()))
        });

        runner.run_file_async(test_file.path()).await.unwrap();
    }

    Ok(())
}

#[tokio::main]
pub async fn main() -> Result<(), DataFusionError> {
    run_tests().await
}

This is on a branch of the exon library: https://github.com/wheretrue/exon/blob/bfb985c89f228e8ce86dbfd96e2a6e0698e780dc/exon/tests/sqllogictests/src/main.rs

What changes are included in this PR?

A new crate is added that exposes DataFusionTestRunner which is used in the example above.

I haven't yet updated the existing test code to use this crate, but presumably would want to? I'm also not sure where the postgres code would go in that scenario. Should it also go in this crate or stay where it is?

Are these changes tested?

I've only done manual testing thus far, but will add some unittests presuming this is ok direction wise.

Are there any user-facing changes?

A new crate would be available.

@alamb
Copy link
Contributor

alamb commented Jul 30, 2023

Thank you @tshauck -- I will review this more carefully tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tshauck -- this looks amazing! I very much think this is the right approach.

I wonder if maybe we could just pull pull the entire datafusion runner into this new crate entirely (and run it via cargo run --bin sqllogictest 🤔 ) We could leave a backwards compatible stub in datafusion/core/tests/sqllogictest.rs for a while

cc @melgenek in case you would like to comment

I haven't yet updated the existing test code to use this crate, but presumably would want to?

Yes, I think that would be valuable so that the implementations do not drift

I'm also not sure where the postgres code would go in that scenario. Should it also go in this crate or stay where it is?

If possible, I would recommend putting it into the new datafusion-sqllogictest crate too. The downside of this approach would be more dependencies for the posgresql libraries for some users who don't want it. Maybe we could put it behind a feature flag if that is a problem

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 1, 2023
@tshauck
Copy link
Contributor Author

tshauck commented Aug 1, 2023

Great, thanks for the feedback. I've done a bit of that refactoring in 4f1507d (move postgres into new crate behind feature, have core use the new crate, etc).

There's a couple of test errors that I don't understand yet, so not quite ready.

@tshauck
Copy link
Contributor Author

tshauck commented Aug 1, 2023

Hi -- requesting re-review of this please. I got the test issue sorted (an overzealous name change on my part) so the sqllogictests seem to pass.

The general tests cargo test fail with one error, that I'm not sure is related to this PR or my machine:

---- scalar::tests::size_of_scalar stdout ----
thread 'scalar::tests::size_of_scalar' panicked at 'assertion failed: `(left == right)`
  left: `64`,
 right: `48`', datafusion/common/src/scalar.rs:4645:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Happy to dive in if it does look to be related -- thanks!

@tshauck tshauck requested a review from alamb August 1, 2023 13:55
@alamb
Copy link
Contributor

alamb commented Aug 1, 2023

🤔 I think the CI failure https://github.com/apache/arrow-datafusion/actions/runs/5727724777/job/15532456889?pr=7134 is due to this file not having the apache license header:

NOT APPROVED: datafusion/sqllogictest/src/engines/datafusion_engine/mod.rs (./datafusion/sqllogictest/src/engines/datafusion_engine/mod.rs): false

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tshauck -- I think once this PR passes CI it will be ready to merge. I had a few comments about adding some more documentation / breadcrumbs round the code to make it easier to understand in the future but I think those are nice to haves

I also pull this branch locally and verified my dev workflow is unaffected

cc @melgenek and @xxchan

@@ -0,0 +1,7 @@
mod error;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file needs the apache license header (just copy/paste from another crate)

Also if you could add some high level crate comment like

/// DataFusion driver for sqllogictest

or something I think that would help future readers

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow on it might make sense to pull the datafusion sqllogictest runner binary into this new crate too (to break the dev dependency on datafusion-core) but I'll contemplate in the future

@alamb
Copy link
Contributor

alamb commented Aug 2, 2023

Hi @tshauck -- I took the liberty of fixing the doctest CI failure

https://github.com/apache/arrow-datafusion/actions/runs/5734076157/job/15548843494?pr=7134

And merging main into this PR -- I am hoping to get it merged soon

@tshauck
Copy link
Contributor Author

tshauck commented Aug 2, 2023

@alamb Thanks! Sorry for the back and forth... I think that bit was a copy-paste so was a bit confused.

@alamb
Copy link
Contributor

alamb commented Aug 2, 2023

@alamb Thanks! Sorry for the back and forth... I think that bit was a copy-paste so was a bit confused.

No worries -- thank you for sticking with this @tshauck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

External use of sqllogictest machinery
2 participants