Skip to content

Conversation

@andylam-db
Copy link
Contributor

@andylam-db andylam-db commented Nov 29, 2023

What changes were proposed in this pull request?

Create CrossDbmsQueryTestSuite, which extends SQLQueryTestHelper and DockerIntegrationSuite. CrossDbmsQueryTestSuite is a trait class that allows testing golden files against other DBMS.

PostgreSQLQueryTestSuite is an implementation of CrossDbmsQueryTestSuite. For starters, sql files in the subquery sql-tests are automatically opted into this test. In this PR, all files except for exists-having.sql are ignored, otherwise this PR would have 10K+ line changes (I would like to do that in the next PR, if possible). I had to change the syntax for view creation in exists-having.sql slightly, and this is reflected in the analyzer-results file, but crucially, the query output (in the results file) are not changed.

Note that this will not be applicable to many of the current sql tests we have due to:

  • Incompatible SQL syntax between spark sql and postgres.
  • Incompatible data types.
  • Difference in numerical precision with doubles.
  • Missing functions in either system.
  • Test files with specific configs, such as ANSI, count bug etc.

Why are the changes needed?

For correctness checking of our SQLQueryTestSuites, we want to run SQLQueryTestSuites with Postgres as a reference DBMS. This can be easily extensible to other DBMS.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

This is a test-related PR, does not affect system behaviors.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Nov 29, 2023
@andylam-db andylam-db marked this pull request as ready for review November 30, 2023 00:46
@andylam-db andylam-db changed the title Add CrossDbmsQueryTestSuites, which allows generating golden files with Postgres/other DBMS [SPARK-46179] Add CrossDbmsQueryTestSuites, which allows generating golden files with Postgres/other DBMS Nov 30, 2023
@andylam-db andylam-db changed the title [SPARK-46179] Add CrossDbmsQueryTestSuites, which allows generating golden files with Postgres/other DBMS [SQL][SPARK-46179] Add CrossDbmsQueryTestSuites, which allows generating golden files with Postgres/other DBMS Nov 30, 2023
@HyukjinKwon HyukjinKwon changed the title [SQL][SPARK-46179] Add CrossDbmsQueryTestSuites, which allows generating golden files with Postgres/other DBMS [SPARK-46179][SQL] Add CrossDbmsQueryTestSuites, which allows generating golden files with Postgres/other DBMS Nov 30, 2023
Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this, it will be useful to increase test coverage!

@andylam-db
Copy link
Contributor Author

Then shall we always let Spark generate golden files? What we need to do is to let other DBMS run the query and compare results with golden files (exclude the schema part).

The reason why we wanted to use DBMS to generate golden files is because we didn't want the CI/CD/developer to have to install some version of postgres/other DBMS just to run tests.

@cloud-fan
Copy link
Contributor

because we didn't want the CI/CD/developer to have to install some version of postgres/other DBMS just to run tests.

Isn't it better to trust CI/CD to install the proper version of DBMS and run the tests, instead of humans? AFAIK we are already doing it to run tests for JDBC data sources. We already have postgres tests in https://github.com/apache/spark/blob/master/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala

@andylam-db
Copy link
Contributor Author

andylam-db commented Jan 3, 2024

We are already doing it to run tests for JDBC data sources

@cloud-fan I didn't know about those tests! I made the changes to add this as an integration test instead, as you suggested. The two tests (SQLQueryTestSuite and PostgresIntegrationSuite) are a bit disconnected (reasons below). But I think they can be improved in the future. I would like to push this PR as is, if possible.

  1. It's maybe a bit weird that a test in spark/connector uses/affects test files in spark/core.
  2. There's some code duplication to read golden files, but I think not too much.

@andylam-db andylam-db changed the title [SPARK-46179][SQL] Add CrossDbmsQueryTestSuites, which allows generating golden files with other DBMS, starting off with Postgres [SPARK-46179][SQL] Add CrossDbmsQueryTestSuites, which runs other DBMS against golden files with other DBMS, starting off with Postgres Jan 3, 2024
@andylam-db andylam-db changed the title [SPARK-46179][SQL] Add CrossDbmsQueryTestSuites, which runs other DBMS against golden files with other DBMS, starting off with Postgres [SPARK-46179][SQL] Add CrossDbmsQueryTestSuites, which runs other DBMS against golden files with other DBMS, starting with Postgres Jan 3, 2024
@andylam-db andylam-db requested a review from cloud-fan January 4, 2024 00:12
}
}

/** A test case. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these are just code move around, no actual changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct.

@cloud-fan
Copy link
Contributor

It's maybe a bit weird that a test in spark/connector uses/affects test files in spark/core.

I think it's OK. We have JDBC tests in spark/connector that depend on tests in sql/core

There's some code duplication to read golden files, but I think not too much.

Is there a way to refactor it?

@andylam-db
Copy link
Contributor Author

andylam-db commented Jan 4, 2024

Is there a way to refactor it?

I tried to move as much code as possible to the trait SQLQueryHelper. Duplicated code is 20+ lines, I don't think it's an issue.

@andylam-db
Copy link
Contributor Author

@cloud-fan Can we merge?

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in f9ca519 Jan 5, 2024
vinodkc pushed a commit to vinodkc/spark that referenced this pull request Jan 5, 2024
…S against golden files with other DBMS, starting with Postgres

### What changes were proposed in this pull request?

Create `CrossDbmsQueryTestSuite`, which extends `SQLQueryTestHelper` and `DockerIntegrationSuite`. `CrossDbmsQueryTestSuite` is a trait class that allows testing golden files against other DBMS.

`PostgreSQLQueryTestSuite` is an implementation of `CrossDbmsQueryTestSuite`. For starters, sql files in the subquery sql-tests are automatically opted into this test. In this PR, all files except for `exists-having.sql` are ignored, otherwise this PR would have 10K+ line changes (I would like to do that in the next PR, if possible). I had to change the syntax for view creation in `exists-having.sql` slightly, and this is reflected in the `analyzer-results` file, but crucially, the query output (in the `results` file) are not changed.

Note that this will not be applicable to many of the current sql tests we have due to:
- Incompatible SQL syntax between spark sql and postgres.
- Incompatible data types.
- Difference in numerical precision with doubles.
- Missing functions in either system.
- Test files with specific configs, such as ANSI, count bug etc.

### Why are the changes needed?

For correctness checking of our SQLQueryTestSuites, we want to run SQLQueryTestSuites with Postgres as a reference DBMS. This can be easily extensible to other DBMS.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This is a test-related PR, does not affect system behaviors.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#44084 from andylam-db/crossdbms.

Authored-by: Andy Lam <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@andylam-db andylam-db changed the title [SPARK-46179][SQL] Add CrossDbmsQueryTestSuites, which runs other DBMS against golden files with other DBMS, starting with Postgres [SPARK-46179][SQL] Add CrossDbmsQueryTestSuites, which runs other DBMS against golden files, starting with Postgres Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants