-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23653][SQL] Show sql statement in spark SQL UI #20803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9d2098d
4712379
6f8bc0d
92293c6
89e8e74
df98d83
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -146,6 +146,8 @@ class SparkSession private( | |
| } | ||
| } | ||
|
|
||
| lazy private val substitutor = new VariableSubstitution(sessionState.conf) | ||
|
|
||
| /** | ||
| * A wrapped version of this session in the form of a [[SQLContext]], for backward compatibility. | ||
| * | ||
|
|
@@ -635,7 +637,8 @@ class SparkSession private( | |
| * @since 2.0.0 | ||
| */ | ||
| def sql(sqlText: String): DataFrame = { | ||
| Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText)) | ||
| Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText), | ||
| substitutor.substitute(sqlText)) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi, @LantaoJin .
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You may want to refactor this PR into
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, in general, the initial SQL texts easily become meaningless when another operations are added. In your example, the following case shows a misleading and wrong SQL statement instead of real executed SQL plan. val df = spark.sql("xxxxx")
df.filter(...).collect() // shows sql text "xxxxx"As another example, please try the following. It will show you scala> spark.sql("select a,b from t1").select("a").show
+---+
| a|
+---+
| 1|
+---+
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes. We know this, so current implementation which bind sql text to DF is not good. |
||
| } | ||
|
|
||
| /** | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the exact rule you defined to decide whether or not we should propagate the sql text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And how does the SQL shell execute commands? like
SELECT * FROM ..., does it display all the rows or add a LIMIT before displaying? Generally we should not propagate sql text, as a new DataFrame usually means the plan is changed, the SQL text is not accurate anymore.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. I agree this comment. Before the discuss, let me reproduce the scenario our company met. Team A developed a framework to submit application with sql sentences in a file
In the biz.sql, there are many sql sentences like
There is no case like
val df = spark.sql("xxxxx")spark.range(10).collect()df.filter(..).count()Team B (Platform) need to capture the really sql sentences which are executed in whole cluster, as the sql files from Team A contains many variables. A better way is recording the really sql sentence in EventLog.
Ok, back to the discussion. The original purpose is to display the sql sentence which user inputs.
spark.range(10).collect()isn't a sql sentence user inputs, eitherdf.filter(..).count(). Only "xxxxx" is. So I have two proposals and a further think.Further more, what about open another ticket to add a command option
--sqlfile biz.sqlin spark-submit command. biz.sql must be a file consist by sql sentence. Base this implementation, not only client mode but also cluster mode can use pure sql.How do you think? @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does
com.ebay.SQLFrameworkprocess the sql file? just callspark.sql(xxxx).showor other stuff?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your speculation is almost right. First call val df = spark.sql(), then separates the sql text with pattern matching to there type: count, limit and other. if count, then invoke the df.showString(2,20). if limit, just invoke df.limit(1).foreach, the last type other will do noting.