-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25561][SQL] Implement a new config to control partition pruning fallback (if partition push-down to Hive fails) #22614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
dddffca
2ad9cf4
cb0577b
f42bbec
544b2ad
01e2123
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -746,34 +746,43 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { | |
| getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] | ||
| } else { | ||
| logDebug(s"Hive metastore filter is '$filter'.") | ||
| val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL | ||
| // We should get this config value from the metaStore. otherwise hit SPARK-18681. | ||
| // To be compatible with hive-0.12 and hive-0.13, In the future we can achieve this by: | ||
| // val tryDirectSql = hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean | ||
| val tryDirectSql = hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname, | ||
| tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean | ||
| val shouldFallback = SQLConf.get.metastorePartitionPruningFallback | ||
| try { | ||
| // Hive may throw an exception when calling this method in some circumstances, such as | ||
| // when filtering on a non-string partition column when the hive config key | ||
| // hive.metastore.try.direct.sql is false | ||
| getPartitionsByFilterMethod.invoke(hive, table, filter) | ||
| .asInstanceOf[JArrayList[Partition]] | ||
| } catch { | ||
| case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && | ||
| !tryDirectSql => | ||
| logWarning("Caught Hive MetaException attempting to get partition metadata by " + | ||
| "filter from Hive. Falling back to fetching all partition metadata, which will " + | ||
| "degrade performance. Modifying your Hive metastore configuration to set " + | ||
| s"${tryDirectSqlConfVar.varname} to true may resolve this problem.", ex) | ||
| // HiveShim clients are expected to handle a superset of the requested partitions | ||
| getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] | ||
| case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && | ||
| tryDirectSql => | ||
| throw new RuntimeException("Caught Hive MetaException attempting to get partition " + | ||
| "metadata by filter from Hive. You can set the Spark configuration setting " + | ||
| s"${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS.key} to false to work around this " + | ||
| "problem, however this will result in degraded performance. Please report a bug: " + | ||
| "https://issues.apache.org/jira/browse/SPARK", ex) | ||
| case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] => | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not blindly call getAllPartitions. This will be super slow. We should do some retries. It depends on the errors we got.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ping @srinathshankar @ericl
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gatorsmile From HMS side, the error is always the same "MetaException" and there is no way to tell apart a direct SQL error from an error of "not supported" (unfortunately!). How do you propose we address this?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, it's not blindly calling that API right? It was already being called before if direct sql was disabled. In the other case, it was just throwing an exception. So now instead of erroring out it will work, just more slowly than expected. Unless there's some retry at a higher layer that I'm not aware of.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @sameeragarwal @tejasapatil Could you share what FB does for the retry?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gatorsmile : Sorry for late reply. We had seen issues with this in past and resorted to do exponential backoff with retries. Fetching all the partitions is going to be bad in a prod setting.... even if it makes it through, the underlying problem if left un-noticed is bad for the system health.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you review the newer changes I have done? Basically, yes, I agree that fetching all partitions is going to be bad and hence we'll leave it up to the user. They can disable fetching all the partitions by setting "spark.sql.hive.metastorePartitionPruning.fallback.enabled" to false. In that case, we'll never retry. If it is set to "true", then we'll retry. As simple as that. I don't completely understand "exponential backoff with retries". Do you do this at the HMS level? or at the query level? If HMS filter pushdown fails once, does it mean it will succeed in the future? Maybe this is a future improvement to this where instead of a boolean "fallback-enabled" or "fallback-disabled", we can have multiple levels for trying the fallback with timing etc. Thoughts @tejasapatil
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kmanamcheri : Lets do this:
What do you think ? |
||
| if (shouldFallback) { | ||
| val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL | ||
| // We should get this config value from the metaStore. otherwise hit SPARK-18681. | ||
| // To be compatible with hive-0.12 and hive-0.13, In the future we can achieve this by | ||
| // val tryDirectSql = hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean | ||
| val tryDirectSql = hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname, | ||
| tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean | ||
| if (!tryDirectSql) { | ||
kmanamcheri marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| logWarning("Caught Hive MetaException attempting to get partition metadata by " + | ||
| "filter from Hive. Falling back to fetching all partition metadata, which will " + | ||
| "degrade performance. Modifying your Hive metastore configuration to set " + | ||
| s"${tryDirectSqlConfVar.varname} to true may resolve this problem.") | ||
| } else { | ||
| logWarning("Caught Hive MetaException attempting to get partition metadata " + | ||
| "by filter from Hive. Hive metastore's direct SQL feature has been enabled, " + | ||
| "but it is an optimistic optimization and not guaranteed to work. Falling back " + | ||
| "to fetching all partition metadata, which will degrade performance (for the " + | ||
| "current query). If you see this error consistently, you can set the Spark " + | ||
| s"configuration setting ${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS.key} to " + | ||
| "false as a work around, however this will result in degraded performance.") | ||
| } | ||
| // HiveShim clients are expected to handle a superset of the requested partitions | ||
| getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] | ||
| } else { | ||
| // Fallback mode has been disabled. Rethrow exception. | ||
| throw new RuntimeException("Caught Hive MetaException attempting to get partition " + | ||
| "metadata from Hive. Fallback mechanism is not enabled. You can set " + | ||
| s"${SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED} to true to fetch " + | ||
| "all partition metadata as a fallback mechanism, however this may result in " + | ||
| "degraded performance.", ex) | ||
| } | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @wangyum, too.