-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Support EXPLAIN COPY
#7291
Conversation
fc3318d
to
108b78e
Compare
@@ -44,6 +44,35 @@ fn parse_file_type(s: &str) -> Result<String, ParserError> { | |||
Ok(s.to_uppercase()) | |||
} | |||
|
|||
/// DataFusion specific EXPLAIN (needed so we can EXPLAIN datafusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key thing -- we need a datafusion specific EXPLAIN
statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to keep these datafusion specific statements in general vs. moving this upstream to the sqlparser crate? This distinction was difficult for me to understand when first diving into how statements are parsed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the high level rationale is that sqlparser-rs intends to parse sql for one or more existing "standard" sql dialects (like MySQL or Postgres). However, these statements (COPY
and CREATE EXTERNAL TABLE
are DataFusion specific extensions). I will make a PR with some docs to to try and clarify this rationale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR with some docs: #7318
EXPLAIN COPY source_table to 'test_files/scratch/table' (format parquet, per_thread_output true) | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example of the explain plan working
@@ -101,13 +101,17 @@ set datafusion.explain.physical_plan_only = false | |||
|
|||
|
|||
## explain nested | |||
statement error Explain must be root of the plan | |||
query error DataFusion error: Error during planning: Nested explain not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these errors are a little more clear
@@ -74,7 +103,7 @@ pub struct CopyToStatement { | |||
/// The URL to where the data is heading | |||
pub target: String, | |||
/// Target specific options | |||
pub options: HashMap<String, Value>, | |||
pub options: Vec<(String, Value)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also changed the parser to preserve the order of the options so the output is consistent (rather than whatever order the hashmap decided). Without that the CI failed because the output order was different on windows than it was on Linux -- see this example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Converting the hashmap into a vector was a clever move.
datafusion/sql/src/statement.rs
Outdated
let plan = self.sql_statement_to_plan(statement)?; | ||
let plan = self.statement_to_plan(statement)?; | ||
if matches!(plan, LogicalPlan::Explain(_)) { | ||
return plan_err!("Nested explain not supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the error can be Nested EXPLAINs are not supported.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 in 5c46949
Co-authored-by: Metehan Yıldırım <[email protected]>
…n into alamb/explain_copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job😍.
Next PR we can add it into doc.
https://arrow.apache.org/datafusion/user-guide/sql/explain.html
fn explain_to_plan( | ||
&self, | ||
verbose: bool, | ||
analyze: bool, | ||
statement: Statement, | ||
statement: DFStatement, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use ExplainStatement
as param?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are two places this is called -- one from the DataFusion ExplainStatement
(introduced in this PR) and one from the SQL parser Explain
. So I thought it best to support them both and simply pass the parameters on through that are actually used
Which issue does this PR close?
Part of #6539
Rationale for this change
It is important to see what plans are coming out for copy
What changes are included in this PR?
EXPLAIN
plansAre these changes tested?
Yes
Are there any user-facing changes?
EXPLAIN
etc works now forCOPY
statements