[SPARK-25561][SQL] Implement a new config to control partition pruning fallback (if partition push-down to Hive fails) #22614

kmanamcheri · 2018-10-02T18:28:06Z

What changes were proposed in this pull request?

When using partition filter pushdown to Hive Metastore, the pushdown might fail (depending on the type of partition and if direct SQL mode is enabled in HMS). This change implements a new config option in Spark SQL to decide what to do when a partition filter pushdown fails. The fallback behavior would be to fetch all partitions instead. This might degrade performance depending on the total number of partitions and warnings are added as appropriate.

How was this patch tested?

New unit tests were added to confirm behavior of the new config. All Unit tests on the Spark SQL component were run successfully.

kmanamcheri · 2018-10-02T18:41:17Z

@mallman @cloud-fan @ericl @rezasafi

vanzin · 2018-10-02T18:47:18Z

ok to test

SparkQA · 2018-10-02T21:23:46Z

Test build #96868 has finished for PR 22614 at commit dddffca.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2018-10-02T21:36:39Z

Looks ok to me based on discussion in the bug. Will leave here to see if others have any comments.

gatorsmile · 2018-10-02T21:43:59Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

        } catch {
-          case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] &&
-              !tryDirectSql =>
+          case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] =>


We should not blindly call getAllPartitions. This will be super slow. We should do some retries. It depends on the errors we got.

ping @srinathshankar @ericl

@gatorsmile From HMS side, the error is always the same "MetaException" and there is no way to tell apart a direct SQL error from an error of "not supported" (unfortunately!). How do you propose we address this?

Also, it's not blindly calling that API right? It was already being called before if direct sql was disabled. In the other case, it was just throwing an exception. So now instead of erroring out it will work, just more slowly than expected.

Unless there's some retry at a higher layer that I'm not aware of.

cc @sameeragarwal @tejasapatil Could you share what FB does for the retry?

@gatorsmile : Sorry for late reply. We had seen issues with this in past and resorted to do exponential backoff with retries. Fetching all the partitions is going to be bad in a prod setting.... even if it makes it through, the underlying problem if left un-noticed is bad for the system health.

Could you review the newer changes I have done? Basically, yes, I agree that fetching all partitions is going to be bad and hence we'll leave it up to the user. They can disable fetching all the partitions by setting "spark.sql.hive.metastorePartitionPruning.fallback.enabled" to false. In that case, we'll never retry. If it is set to "true", then we'll retry. As simple as that.

I don't completely understand "exponential backoff with retries". Do you do this at the HMS level? or at the query level? If HMS filter pushdown fails once, does it mean it will succeed in the future? Maybe this is a future improvement to this where instead of a boolean "fallback-enabled" or "fallback-disabled", we can have multiple levels for trying the fallback with timing etc. Thoughts @tejasapatil

@kmanamcheri : Lets do this:

We should prefer doing getPartitionsByFilterMethod(). If it fails, we retry with increasing delay across retries.

If retries are exhausted, we could fetch all the partitions of the table. Some people might not want this so lets control this using a conf flag. For those who don't want it, the query could fail at this point.

What do you think ?

dongjoon-hyun · 2018-10-02T22:38:11Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

-        // To be compatible with hive-0.12 and hive-0.13, In the future we can achieve this by:
-        // val tryDirectSql = hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-        val tryDirectSql = hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-          tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean


Ping @wangyum, too.

dongjoon-hyun · 2018-10-02T22:45:43Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

-              s"${tryDirectSqlConfVar.varname} to true may resolve this problem.", ex)
+              "degrade performance. Enable direct SQL mode in hive metastore to attempt " +
+              "to improve performance. However, Hive's direct SQL mode is an optimistic " +
+              "optimization and does not guarantee improved performance.")


@kmanamcheri . Could you show different and more correct warning messages based on tryDirectSql value here?

Good idea. Yes that would be better. I'll add that.

To be honest, I do not think we should issue a warning message and call getAllPartitions. When the number of partitions is huge, getAllPartitions will be super super slow.

sometimes failing is better than spending several hours to get all partitions. Shall we add a config to switch the behavior?

I think the original warning message is more accurate. Direct sql mode isn't just about performance. It's also about enhanced capability, e.g. supporting filtering on non-string type columns. As the original comment states, setting the direct sql config value to true may resolve a problem around metastore-side partition filtering.

Hive has a config "hive.metastore.limit.partition.request" that can limit number of partitions that can be requested from HMS. So I think there is no need for a new config on the Spark side.
Also since direct sql is a best effort approach just failing when direct sql is enabled is not good.

@mallman I haven't tried using that config option. If I am understanding the documentation for HIVE_MANAGE_FILESOURCE_PARTITIONS correctly, if I set that value to false, partitions will not be stored in HMS. That sounds like it is addressing a different issue, no?

If that's the suggested way to deal with non-supported partition filters, then this code should always fail if getPartitionsByFilter fails, no? Why even have a fallback (as we do currently)? SPARK-17992 seems to say that Spark should handle certain cases of partition pushdown failures (such as HMS ORM mode). My argument is that the case should be expanded to include even if hive.metastore.try.direct.sql is enabled to be true.

@dongjoon-hyun @mallman I have updated the log messages to be more descriptive and helpful for the user to indicate what they should try doing. Does that help?

One option if we want to get all fancy is to add a configurable timeout in the fallback case - assuming it's possible to cancel an ongoing call (run in a separate thread + interrupt maybe?).

My main concern with the fallback, really, isn't the slowness, but that in the case where it would be slow (= too many partitions), the HMS might just run itself out of memory trying to serve the request.

Reza mentions the Hive config which I think is the right thing to do by the HMS admin, since it avoids apps DoS'ing the server. Not sure what's the behavior there, but I hope if fails the call if there are too many partitions (instead of returning a subset). IMO that config seems to cover all the concerns here assuming the call will fail when you have too many partitions, no?

Yes, if the query tries to get more partitions than "hive.metastore.limit.partition.request", the query will fail. Using the hive config the user can judge if he wants to get all the partitions or not. I also think that config covers all the concerns

gatorsmile · 2018-10-03T16:28:26Z

Let us add a conf to control it? Failing fast is better than hanging. If users want to get all partitions, they can change the conf by themselves.

kmanamcheri · 2018-10-03T17:44:02Z

Let us add a conf to control it? Failing fast is better than hanging. If users want to get all partitions, they can change the conf by themselves.

@gatorsmile We already have a config option "spark.sql.hive.metastorePartitionPruning". If that is set to false, we will never push down the partitions to HMS. I will add "spark.sql.hive.metastorePartitionPruningFallback" which in addition to the previous one controls the fallback behavior. Irrespective of the value of Hive direct SQL, if we enable the pruning fallback, we will catch the exception and fallback to fetch all partitions. Does this sound like a reasonable compromise @mallman ?

gatorsmile · 2018-10-03T18:21:15Z

Yes. Let us add a conf for controlling the fallback. Please also add the test cases for verifying it. Thanks!

SparkQA · 2018-10-03T22:01:30Z

Test build #96899 has finished for PR 22614 at commit 2ad9cf4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kmanamcheri · 2018-10-05T16:13:50Z

@gatorsmile I have added the config option and an additional test.

Here's the new behavior

Setting spark.sql.metastorePartitionPruningFallback to 'false' will ALWAYS throw an exception if partition pushdown fails (Hive throws an exception). This is suggested for queries where you want to fail fast and you know that you have a large number of partitions.
Setting spark.sql.metastorePartitionPruningFallback to 'true' (this is the default setting) will ALWAYS catch exception from Hive and retry with fetching all partitions. However, to be helpful to users, Spark will read the directSql config value from Hive and provide good log messages on what the next steps to do.

@dongjoon-hyun @mallman @vanzin If these look good, can we move on this to merge? Thanks a lot for all the comments and discussions.

dongjoon-hyun · 2018-10-05T19:31:11Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

      .createWithDefault(true)

+  val HIVE_METASTORE_PARTITION_PRUNING_FALLBACK =
+    buildConf("spark.sql.hive.metastorePartitionPruningFallback")


Should we use spark.sql.legacy prefix like SPARK-19724, @rxin and @cloud-fan ?

What is the reasoning for marking this as legacy?

spark.sql.hive.metastorePartitionPruning.fallback.enabled

SparkQA · 2018-10-05T20:01:13Z

Test build #96999 has finished for PR 22614 at commit cb0577b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala

SparkQA · 2018-10-06T01:58:44Z

Test build #97014 has finished for PR 22614 at commit f42bbec.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-10-06T03:19:18Z

The PR description and title may need to change accordingly. Can you update it?

kmanamcheri · 2018-10-08T14:51:08Z

@viirya I have updated the title and description.

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

mallman · 2018-10-08T16:17:11Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

+                  s"configuration setting ${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS.key} to " +
+                  "false as a work around, however this will result in degraded performance. " +
+                  "Please report a bug to Hive stating that direct SQL is failing consistently " +
+                  "for the specified query: https://issues.apache.org/jira/browse/HIVE")


I think we should remove the suggestion to file a Hive project bug. Even with the direct SQL configuration setting enabled, there are valid metastore deployments for which it will be ignored. For example, my understanding is that if the metastore uses MongoDB for its underlying storage, the direct SQL configuration setting will be ignored. That means a failure here is not a Hive bug with direct SQL.

mallman · 2018-10-08T16:25:23Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala

-    val client = init(false)
-    val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"),
-      Seq(attr("ds") === 20170101))
+  test(s"getPartitionsByFilter returns all partitions when $partPruningFallbackKey=true") {


Change test name to

s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false and $partPruningFallbackKey=true"

?

sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala

vanzin · 2018-10-08T18:31:57Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+      .doc("When true, enable fallback to fetch all partitions if Hive metastore partition " +
+           "push down fails. This is applicable only if partition pruning is enabled (see " +
+           s" ${HIVE_METASTORE_PARTITION_PRUNING.key}). Enabling this may degrade performance " +
+           "if there are a large number of partitions." )


"if there is"

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala

SparkQA · 2018-10-08T20:19:07Z

Test build #97119 has finished for PR 22614 at commit 544b2ad.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-10-08T20:39:29Z

As @tejasapatil suggested above, this fallback is not suggested to be on in a prod setting. It could also impact the system health (e.g., the other concurrent queries that are querying the same Hive metastore could be blocked). Thus, I would suggest to make the new conf as internal and make the configuration description more clear regarding the impact of this fallback.

gatorsmile · 2018-10-08T20:45:28Z

Also, #22614 (comment) proposes a better and safer solution by introducing exponential backoff with retries. @kmanamcheri Could you update your PR with this suggestion?

vanzin · 2018-10-08T20:55:25Z

a better and safer solution by introducing exponential backoff with retries

I'm confused about that suggestion. What is being retried? The user query? The HMS call? What is changing in between the retries that gives any hope that a future call will succeed? And if all the retries fail, do you fallback like this change is doing, or what?

kmanamcheri · 2018-10-08T21:31:12Z

@gatorsmile, @tejasapatil was reviewing the code before I added the new config option. I have asked him to review the new code. Lets see what his thoughts are on that. I have also asked him clarification on what he means by exponential backoff with retries.

I want to take a step back and revisit SPARK-17992 and in particular one of the comments from @ericl

For large tables, the degraded performance should be considered a bug as well.

How about this.

If direct sql is disabled, log a warning about degraded performance with this flag and fall back to fetching all partitions.
If direct sql is enabled, crash with a message suggesting to disable filesource partition management and report a bug.
That way, we will know if there are cases where metastore pruning fails with direct sql enabled.

It looks like a compromise was reached where we don't support fetching all the time (and only for a subset of cases). My suggested fix is a cleaner way of approaching it through a SQLConf instead of looking at the Hive config.

Thoughts @mallman @ericl

SparkQA · 2018-10-08T23:34:20Z

Test build #97126 has finished for PR 22614 at commit 01e2123.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-10-09T20:41:59Z

Based on my understanding, the solution of FB team is to retry the following commands multiple times:

getPartitionsByFilterMethod.invoke(hive, table, filter).asInstanceOf[JArrayList[Partition]]

This really depends on what is the actual errors that fail getPartitionsByFilterMethod. When there are many concurrent users share the same metastore, exponential backoff with retries is very reasonable since most of errors might be caused by timeout or similar reasons.

If it still fails, I would suggest to fail fast or depends on the conf value of spark.sql.hive.metastorePartitionPruning.fallback.enabled

kmanamcheri · 2018-10-09T22:43:26Z

Based on my understanding, the solution of FB team is to retry the following commands multiple times:
getPartitionsByFilterMethod.invoke(hive, table, filter).asInstanceOf[JArrayList[Partition]]

@gatorsmile hmm my understanding was different. I thought they were retrying the fetchAllpartitions method. Maybe @tejasapatil can clarify here?

This really depends on what is the actual errors that fail getPartitionsByFilterMethod. When there are many concurrent users share the same metastore, exponential backoff with retries is very reasonable since most of errors might be caused by timeout or similar reasons.

Doesn't this apply with every other HMS API as well? If so, shouldn't we be building a complete solution in HiveShim around this to do an exponential backoff with retries on every single HMS call in HiveShim?

If it still fails, I would suggest to fail fast or depends on the conf value of spark.sql.hive.metastorePartitionPruning.fallback.enabled

Ok I agree.

I think we need clarification from @tejasapatil on which call they retry.

rezasafi · 2018-11-06T19:37:47Z

@tejasapatil @kmanamcheri any update? thank you very much in advance.

AmplabJenkins · 2019-09-16T18:18:38Z

Can one of the admins verify this patch?

HyukjinKwon · 2019-09-17T00:39:36Z

Closing this due to author's inactivity.

HiveShim should expect MetaException from HMS in all cases

dddffca

kmanamcheri changed the title ~~HiveClient.getPartitionsByFilter should not throw an exception if HMS retries directSql~~ [SPARK-25561] [SQL] HiveClient.getPartitionsByFilter should not throw an exception if HMS retries directSql Oct 2, 2018

kmanamcheri changed the title ~~[SPARK-25561] [SQL] HiveClient.getPartitionsByFilter should not throw an exception if HMS retries directSql~~ [SPARK-25561][SQL] HiveClient.getPartitionsByFilter should not throw an exception if HMS retries directSql Oct 2, 2018

gatorsmile reviewed Oct 2, 2018

View reviewed changes

dongjoon-hyun reviewed Oct 2, 2018

View reviewed changes

Improved log messages

2ad9cf4

Provide SQLConf option for user and add unit tests

cb0577b

dongjoon-hyun reviewed Oct 5, 2018

View reviewed changes

gatorsmile reviewed Oct 5, 2018

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated Show resolved Hide resolved

gatorsmile reviewed Oct 5, 2018

View reviewed changes

sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala Outdated Show resolved Hide resolved

Rename the config option and update tests

f42bbec

kmanamcheri changed the title ~~[SPARK-25561][SQL] HiveClient.getPartitionsByFilter should not throw an exception if HMS retries directSql~~ [SPARK-25561][SQL] Implement a new config to control partition pruning fallback (if partition push-down to Hive fails) Oct 8, 2018

mallman reviewed Oct 8, 2018

View reviewed changes

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala Outdated Show resolved Hide resolved

Remove a comment which might be misleading

544b2ad

mallman reviewed Oct 8, 2018

View reviewed changes

sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala Outdated Show resolved Hide resolved

vanzin reviewed Oct 8, 2018

View reviewed changes

Minor changes to address PR comments.

01e2123

dongjoon-hyun added the SQL label Jun 14, 2019

HyukjinKwon closed this Sep 17, 2019

[SPARK-25561][SQL] Implement a new config to control partition pruning fallback (if partition push-down to Hive fails) #22614

[SPARK-25561][SQL] Implement a new config to control partition pruning fallback (if partition push-down to Hive fails) #22614

Uh oh!

Conversation

kmanamcheri commented Oct 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

kmanamcheri commented Oct 2, 2018

Uh oh!

vanzin commented Oct 2, 2018

Uh oh!

SparkQA commented Oct 2, 2018

Uh oh!

vanzin commented Oct 2, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmanamcheri Oct 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Oct 3, 2018

Uh oh!

kmanamcheri commented Oct 3, 2018

Uh oh!

gatorsmile commented Oct 3, 2018

Uh oh!

SparkQA commented Oct 3, 2018

Uh oh!

kmanamcheri commented Oct 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 5, 2018

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Oct 6, 2018

kmanamcheri commented Oct 2, 2018 •

edited

Loading

dongjoon-hyun Oct 2, 2018 •

edited

Loading

kmanamcheri Oct 3, 2018 •

edited

Loading

kmanamcheri commented Oct 8, 2018 •

edited

Loading