Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Support EXPLAIN COPY #7291

Merged
merged 9 commits into from
Aug 21, 2023
Merged

feature: Support EXPLAIN COPY #7291

merged 9 commits into from
Aug 21, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 15, 2023

Which issue does this PR close?

Part of #6539

Rationale for this change

It is important to see what plans are coming out for copy

What changes are included in this PR?

  1. Add support for EXPLAIN plans

Are these changes tested?

Yes

Are there any user-facing changes?

EXPLAIN etc works now for COPY statements

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 15, 2023
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Aug 15, 2023
@alamb alamb changed the title Alamb/explain copy Support EXPLAIN COPY Aug 15, 2023
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) and removed logical-expr Logical plan and expressions optimizer Optimizer rules labels Aug 16, 2023
@alamb alamb marked this pull request as ready for review August 16, 2023 17:01
@@ -44,6 +44,35 @@ fn parse_file_type(s: &str) -> Result<String, ParserError> {
Ok(s.to_uppercase())
}

/// DataFusion specific EXPLAIN (needed so we can EXPLAIN datafusion
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key thing -- we need a datafusion specific EXPLAIN statement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to keep these datafusion specific statements in general vs. moving this upstream to the sqlparser crate? This distinction was difficult for me to understand when first diving into how statements are parsed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the high level rationale is that sqlparser-rs intends to parse sql for one or more existing "standard" sql dialects (like MySQL or Postgres). However, these statements (COPY and CREATE EXTERNAL TABLE are DataFusion specific extensions). I will make a PR with some docs to to try and clarify this rationale

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR with some docs: #7318

EXPLAIN COPY source_table to 'test_files/scratch/table' (format parquet, per_thread_output true)
----
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example of the explain plan working

@@ -101,13 +101,17 @@ set datafusion.explain.physical_plan_only = false


## explain nested
statement error Explain must be root of the plan
query error DataFusion error: Error during planning: Nested explain not supported
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these errors are a little more clear

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Aug 16, 2023
datafusion/expr/src/logical_plan/plan.rs Outdated Show resolved Hide resolved
@@ -74,7 +103,7 @@ pub struct CopyToStatement {
/// The URL to where the data is heading
pub target: String,
/// Target specific options
pub options: HashMap<String, Value>,
pub options: Vec<(String, Value)>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also changed the parser to preserve the order of the options so the output is consistent (rather than whatever order the hashmap decided). Without that the CI failed because the output order was different on windows than it was on Linux -- see this example

@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Aug 17, 2023
Copy link
Contributor

@metesynnada metesynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Converting the hashmap into a vector was a clever move.

datafusion/sql/src/parser.rs Outdated Show resolved Hide resolved
let plan = self.sql_statement_to_plan(statement)?;
let plan = self.statement_to_plan(statement)?;
if matches!(plan, LogicalPlan::Explain(_)) {
return plan_err!("Nested explain not supported");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the error can be Nested EXPLAINs are not supported..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 in 5c46949

@jackwener jackwener changed the title Support EXPLAIN COPY feature: Support EXPLAIN COPY Aug 19, 2023
Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job😍.

Next PR we can add it into doc.

https://arrow.apache.org/datafusion/user-guide/sql/explain.html

Comment on lines +718 to +722
fn explain_to_plan(
&self,
verbose: bool,
analyze: bool,
statement: Statement,
statement: DFStatement,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use ExplainStatement as param?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are two places this is called -- one from the DataFusion ExplainStatement (introduced in this PR) and one from the SQL parser Explain. So I thought it best to support them both and simply pass the parameters on through that are actually used

@alamb alamb merged commit 6aa423b into apache:main Aug 21, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants