-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[SPARK-20151][SQL] Account for partition pruning in scan metadataTime metrics #17476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -72,4 +72,14 @@ trait FileIndex { | |
|
|
||
| /** Schema of the partitioning columns, or the empty schema if the table is not partitioned. */ | ||
| def partitionSchema: StructType | ||
|
|
||
| /** | ||
| * Returns an optional metadata operation time, in nanoseconds, for listing files. | ||
| * | ||
| * We do file listing in query optimization (in order to get the proper statistics) and we want | ||
| * to account for file listing time in physical execution (as metrics). To do that, we save the | ||
| * file listing time in some implementations and physical execution calls it in this method | ||
| * to update the metrics. | ||
| */ | ||
| def metadataOpsTimeNs: Option[Long] = None | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's hard to define the semantic of this method for general how about we only put this method in
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought about that but there is no API level guarantee that we'd get PrunedInMemoryFileIndex after partition pruning. It is more just a current implementation detail. I'd rather have something that's more specified in the API. |
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add param doc, as it's not immediately obvious what a user is supposed to supply here.
I'd say something like "time it took to obtain the partitionSpec from the Hive metastore", but maybe that's too specific..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually includes more than that. We do file listing as part of that ...