Skip to content

Conversation

@nblintao
Copy link
Contributor

@nblintao nblintao commented Jul 12, 2016

What changes were proposed in this pull request?

For the web UI, this PR adds the raw SQL query text for each query in the new column "SQL Text" in the SQL Tab. To achieve this, it also passes the SQL text from the parser to the execution UI data, through the logical plan.

If the query has no related query text, show a blank in that field. If there is no query in the table has a query text, not show the column.

If the query text is long (i.e. longer than 140 chars), show the short abstract and display the full version on demand by the link "+more".

If the query text is too long (i.e. longer than 1000 chars), it will be truncated after parsing. This is to avoid downing browsers or web UI servers by running out of memory.

Also, a tooltip is added on the header "SQL Text", saying Shows 140 characters by default. Click "+more" to see more. Long texts are truncated to 1000 characters. Left blank when the query was not issued by SQL..

How was this patch tested?

Tested by trying multiple SQL queries.

A screenshot of the SQL Tab:
sqltextwithtooltip
(Query 3 is identical to Query 2, but its SQL text is shown in detail by clicking "+more")

@SparkQA
Copy link

SparkQA commented Jul 12, 2016

Test build #62180 has finished for PR 14158 at commit 626f3f7.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to insert a "-" rather than leave this blank in the case it's not available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just checked and found out that the common practice in the Spark Web UI is to use a blank instead of a hyphen. I'll fix. Thanks!

@ajbozarth
Copy link
Member

Checked it out and gave it a test run and it looks good, just one issue. When the user only uses the dataframes api it will never show any SQL Text (only "-"), should we check if there is no SQL Text to display and not show the column in that case?

@nblintao
Copy link
Contributor Author

Thanks, @ajbozarth. That's a great point. When no execution in a table has SQL text, this column shouldn't be shown. I'll fix this.

@nblintao
Copy link
Contributor Author

In the commit above, I made three adjustments:

  1. If the query has no related query text, show a blank in that field.
  2. If there is no query in the table has a query text, not show the column.
  3. If the query text is too long (i.e. longer than 140 chars), show the short abstract and display the full version on demand.

A snapshot of the SQL Tab:
longsqltext
(The detailed SQL text for Query 4 shown after clicking "+more")

The following need to be discussed and implemented (suggested by @yhuai):

  1. Add a ToolTip on "SQL Text", saying "Blank if the query is not issued by SQL" (Please help check this sentence if you are a native English speaker).
  2. For now, we could show the abstract of the string at the front end. But we still need to truncate the query texts which are toooooo long (maybe longer than 1000 bytes?). This is to avoid downing browsers or web UI servers by running out of memory.

@SparkQA
Copy link

SparkQA commented Jul 13, 2016

Test build #62207 has finished for PR 14158 at commit beba227.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about a button to copy full query text into clipboard?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? The text can be easily highlighted and copied.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. for very long query that wouldn't fit as per this comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres a "+more" button that expands to show the full query

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I understand you now, if we end up truncating the text at 1000 chars as proposed then yes this may be a useful feature, but if the text is available to copy wouldn't the size issue still exist?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean that it could be too long for clipboard? I don't think that would be an issue...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @felixcheung. When I am looking at the discussions again today, I find I am more clear about your suggestions. But actually, truncating very long queries is not because they are not visually pleasant to be displayed on the page, but it may take too many resources to store and transfer such a long unlimited string. So the queries longer than 1000 chars are truncated after parsing, which will never be recovered. So the idea of clipboard may not be used to resolve it.

But still, I believe the clipboard is a good idea. I do think we could extend the limit to 10k chars instead of 1k if we add a clipboard later.

@ajbozarth
Copy link
Member

Checked out your changes and I like the updates, plus I think the column looks better on the end. Addressing the questions you raised: i agree we should truncate long queries (1000 chars sounds like a decent number), we should make it clear they were truncated though (maybe with a "..."?) I also think a tool tip would be good. Maybe something like: "Left blank when the query was not issued by SQL"?

@nblintao
Copy link
Contributor Author

@ajbozarth @felixcheung Thanks for viewing and discussing! I plan to implement the two updates I mentioned above. The idea of clipboard sounds awesome, but I think we might need more discussion about this new function.

@nblintao
Copy link
Contributor Author

Updated by truncating long texts and adding a tooltip.
The detail description and the screenshot at #14158 (comment) is also updated.

@SparkQA
Copy link

SparkQA commented Jul 15, 2016

Test build #62364 has finished for PR 14158 at commit 41c2daa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ajbozarth
Copy link
Member

I'll try to take a look at this tomorrow

@ajbozarth
Copy link
Member

The tooltip looks odd centered on the column instead of the title text but I believe that's how the whole Web UI works, other than that nit LGTM

@ajbozarth
Copy link
Member

@srowen @JoshRosen what do you think of @nblintao 's pr? Some of my team really likes the addition and would love to see it merged

@ajbozarth
Copy link
Member

@nblintao Could you rebase? I'd like to try and see if we can get this reviewed and merged.

@markhamstra
Copy link
Contributor

Please be certain that this works well even for very large queries. They are not commonplace, but I know that Spark SQL does sometimes get asked to handle SQL queries that are hundreds of lines long.

@ajbozarth
Copy link
Member

Thanks @markhamstra that was already addressed earlier in review

@markhamstra
Copy link
Contributor

It's great if it is already addressed, but I just didn't see anything explicit in the discussion or examples that showed any query of the magnitude that I am talking about. Machine-generated queries can be staggeringly large (e.g. I've seen queries that were so long and complicated that they took Spark SQL more than 9 minutes just to parse), and this UI enhancement must be prepared to handle those.

@nblintao
Copy link
Contributor Author

nblintao commented Oct 13, 2016

Thanks, @ajbozarth. Sure, I'd love to rebase it and get it merged. But I am really busy with my mid-terms and interviews recently. I'll take a look this week and fix it if it's easy to do.

I really appreciate your concern about my pull requests.

@nblintao
Copy link
Contributor Author

@markhamstra Thanks. Here is my related explanation at the top of this PR.

If the query text is too long (i.e. longer than 1000 chars), it will be truncated. This is to avoid downing browsers or web UI servers by running out of memory.

Is this what you mean?

@markhamstra
Copy link
Contributor

@nblintao I really haven't reviewed these changes closely enough to have a specific complaint or concern in mind, but I'm more concerned about what happens when you ask to see "more" when that "more" could be several tens of thousands of characters, not just the few thousand that many would consider to be a very long SQL query.

@nblintao
Copy link
Contributor Author

@markhamstra Queries longer than 1000 chars are truncated to 1000 chars after the parsing. "+more" will only expand 140 chars to at most 1000 chars.

@markhamstra
Copy link
Contributor

@nblintao Got it; thanks. There may be distinct queries that will be entirely the same within the first 1000 characters, but that's just the nature of working with these very large queries -- there are lots of things that make them difficult, but it sounds like you've made sure not to make things difficult for more normal queries as a consequence, so that's all good.

@nblintao
Copy link
Contributor Author

@ajbozarth I finally have a chance to rebase it in the winter break. Could you please have a look? Thanks!

@SparkQA
Copy link

SparkQA commented Dec 29, 2016

Test build #70700 has finished for PR 14158 at commit 83cbb58.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nblintao
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 29, 2016

Test build #70704 has finished for PR 14158 at commit 83cbb58.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments, but from a UI side this looks ok to me. @gatorsmile What do you think of this from the SQL side? Also @srowen and @vanzin second opinion on the UI side?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need this, scala can handle line wrapping for Node return values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? Given the reasoning behind the max length I don't see a reason to include extra lines of code to include the suffix as part of the max length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either way is okay. Here, I am considering keeping the text displayed (including suffix) less than 1000 chars.

@HyukjinKwon
Copy link
Member

Hi @nblintao, is it still active?

@nblintao
Copy link
Contributor Author

@HyukjinKwon Thanks for reminding. I am finals this week. I will fix this after 15th.

@nblintao
Copy link
Contributor Author

I have just rebased. @ajbozarth @HyukjinKwon @gatorsmile @srowen @vanzin

def refresh(): Unit = children.foreach(_.refresh())

// Record the original sql text in the top logical plan for checking in the web UI.
var sqlText: Option[String] = None
Copy link
Member

@gatorsmile gatorsmile May 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using var for this PR should be avoided.

}
}
logicalPlan.sqlText = Some(truncatedSqlText)
logicalPlan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution in this PR looks intrusive to me. If we really want to store the original sql text, we can add it into the QueryExecution. The value can be initialized when we build the QueryExecution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. I agree that QueryExecution is a better place for original SQL text. I have updated my code accordingly. Could you please have a look?

@SparkQA
Copy link

SparkQA commented May 30, 2017

Test build #77510 has finished for PR 14158 at commit 69180bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 30, 2017

Test build #77518 has finished for PR 14158 at commit 114401a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 30, 2017

Test build #77523 has started for PR 14158 at commit ac96aaa.

@jiangxb1987
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 20, 2017

Test build #78291 has started for PR 14158 at commit ac96aaa.

@shaneknapp
Copy link
Contributor

i'll retrigger this once jenkins is back up.

@shaneknapp
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Jun 20, 2017

Test build #78299 has finished for PR 14158 at commit ac96aaa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class QueryExecution(

@jiangxb1987
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 21, 2017

Test build #78332 has finished for PR 14158 at commit ac96aaa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class QueryExecution(

@HyukjinKwon
Copy link
Member

gentle ping @nblintao, is this PR active? If so, I guess the test failure should be fixed if related.

@nblintao
Copy link
Contributor Author

@HyukjinKwon It's still active. I'll fix it when I'm available. Thanks.

@HyukjinKwon
Copy link
Member

hey @nblintao, do you maybe happened to have some time to continue this one?

@nblintao
Copy link
Contributor Author

@HyukjinKwon Sorry for the delay. I'm busy looking for jobs these days. I'll try my best to fix it in October. Thank you for reminding me!

@jiangxb1987
Copy link
Contributor

I'm going to close this PR because it goes stale, please feel free to reopen it or open another PR if anyone have more thoughts on this issue.

@asfgit asfgit closed this in ed1478c Nov 7, 2017
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#11494
Closes apache#14158
Closes apache#16803
Closes apache#16864
Closes apache#17455
Closes apache#17936
Closes apache#19377

Added:
Closes apache#19380
Closes apache#18642
Closes apache#18377
Closes apache#19632

Added:
Closes apache#14471
Closes apache#17402
Closes apache#17953
Closes apache#18607

Also cc srowen vanzin HyukjinKwon gatorsmile cloud-fan to see if you have other PRs to close.

Author: Xingbo Jiang <[email protected]>

Closes apache#19669 from jiangxb1987/stale-prs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants