Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SchemaProvider::table async #4607

Merged
merged 9 commits into from
Jan 5, 2023
Merged

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Dec 13, 2022

Which issue does this PR close?

Closes #3777.

Rationale for this change

What changes are included in this PR?

Makes SchemaProvider::table async.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Dec 13, 2022
@@ -35,7 +37,7 @@ pub trait SchemaProvider: Sync + Send {
fn table_names(&self) -> Vec<String>;

/// Retrieves a specific table from the schema by name, provided it exists.
fn table(&self, name: &str) -> Option<Arc<dyn TableProvider>>;
async fn table(&self, name: &str) -> Option<Arc<dyn TableProvider>>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is the change to support async catalogs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tustvold
Copy link
Contributor Author

This is temporarily on hold pending the work I am doing on cleaning up the config, see #4617. I will come back to this once that is complete

@tustvold tustvold marked this pull request as ready for review January 3, 2023 21:50
@tustvold
Copy link
Contributor Author

tustvold commented Jan 3, 2023

I think this is now ready for review, it would be amazing if this could make this weeks release as this has been a frequently requested feature

@tustvold tustvold changed the title POC: Async catalog Make SchemaProvider::table async Jan 3, 2023
/// Creates a [`LogicalPlan`] from the provided SQL string
///
/// See [`SessionContext::sql`] for a higher-level interface that also handles DDL
pub async fn create_logical_plan(&self, sql: &str) -> Result<LogicalPlan> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically broken out into a separate function so that database's that want to not handle DDL, such as IOx, don't have to re-implement this functionality. FYI @alamb

}

// Always include information_schema if available
if self.config.information_schema() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4606 means there isn't a cost to doing this

@alamb
Copy link
Contributor

alamb commented Jan 3, 2023

Will review this tomorrow. Note the CI is failing

ControlFlow::Continue(())
}

fn pre_visit_statement(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit gross, but I hope to split apart the query planning from the other types of query, that will clean this up.

@waynexia
Copy link
Member

waynexia commented Jan 4, 2023

Makes SchemaProvider::table async.

This change makes sense to me. It's a good solution to only change one interface, I like it!

The only thing left is asynchronous register things, but on the one hand it's less important than getting and deregistering, and another hand there are still ways to workaround (like update-on-getting). So I'm +1 for providing this change as-is. Thanks for making this @tustvold 🥳

@tustvold
Copy link
Contributor Author

tustvold commented Jan 4, 2023

The only thing left is asynchronous register things

We can easily add this as a follow up, as the catalog is largely a detail of SessionContext, I don't imagine it causing any major issues.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

giphy

Other than putting back unsupported_sql_returns_error I think this PR looks good to me. My other comments are just minor suggestions

datafusion/core/tests/sqllogictests/test_files/select.slt Outdated Show resolved Hide resolved
@@ -35,7 +37,7 @@ pub trait SchemaProvider: Sync + Send {
fn table_names(&self) -> Vec<String>;

/// Retrieves a specific table from the schema by name, provided it exists.
fn table(&self, name: &str) -> Option<Arc<dyn TableProvider>>;
async fn table(&self, name: &str) -> Option<Arc<dyn TableProvider>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion/core/src/execution/context.rs Outdated Show resolved Hide resolved
datafusion/core/src/execution/context.rs Show resolved Hide resolved
datafusion/core/tests/sql/errors.rs Show resolved Hide resolved
@tustvold tustvold merged commit fad77a4 into apache:master Jan 5, 2023
@ursabot
Copy link

ursabot commented Jan 5, 2023

Benchmark runs are scheduled for baseline = 087ac09 and contender = fad77a4. fad77a4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

An asynchronous version of CatalogList/CatalogProvider/SchemaProvider
4 participants