[SPARK-30227][SQL] Add close() on DataWriter interface #26855

HeartSaVioR · 2019-12-12T01:51:39Z

What changes were proposed in this pull request?

This patch adds close() method to the DataWriter interface, which will become the place to cleanup the resource.

Why are the changes needed?

The lifecycle of DataWriter instance ends at either commit() or abort(). That makes datasource implementors to feel they can place resource cleanup in both sides, but abort() can be called when commit() fails; so they have to ensure they don't do double-cleanup if cleanup is not idempotent.

Does this PR introduce any user-facing change?

Depends on the definition of user; if they're developers of custom DSv2 source, they have to add close() in their DataWriter implementations. It's OK to just add close() with empty content as they should have already dealt with resource cleanup in commit/abort, but they would love to migrate the resource cleanup logic to close() as it avoids double cleanup. If they're just end users using the provided DSv2 source (regardless of built-in/3rd party), no change.

How was this patch tested?

Existing tests.

HeartSaVioR · 2019-12-12T01:54:32Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataWriter.scala

-      CachedKafkaProducer.close(producerParams)
-    }
-  }
+  def close(): Unit = {}


This is safe; previous implementation cleans up the instance from the cache immediately so it actually helps a bit, but no big deal even we don't do it.

is this related to adding the close API?

So there's conflict on naming; test code calls it as close(). If we would like to keep the code as it is, I can rename previous method as invalidateProducer() and leave it as it is.

what's the life cycle of kafka producers? IIRC they were cached before, but that patch gets reverted.

So there's no "return" in current Kafka producer cache and the cache evicts the expired producer on policy. Previously we force invalidating the Kafka producer when close() is explicitly called as callers of close() are temporarily using the producer (instead of running some query), and current code just let cache expire the producer on policy for all cases.

SGTM then, if the life cycle of producers are controled by the cache policy.

Btw, I just renamed origin close() method to invalidateProducer() to avoid the effect on Kafka side.

HeartSaVioR · 2019-12-12T01:56:38Z

...ore/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ForeachWriterTable.scala

-      closeCalled = true
-      writer.close(errorOrNull)
-    }
+  override def close(): Unit = {


This change clearly shows the difference; DataWriter implementations don't need to deal with possible double resource cleanup.

dongjoon-hyun · 2019-12-12T02:09:43Z

cc @cloud-fan

HeartSaVioR · 2019-12-12T02:16:30Z

cc. @rdblue as well

Btw, I was planning to cc. after the build passes, but Jenkins hasn't come in for 20 mins. It might be an issue on Amplab build?

SparkQA · 2019-12-12T04:20:24Z

Test build #115203 has finished for PR 26855 at commit 26b0e25.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-12-12T06:54:46Z

Yes. @HeartSaVioR . Jenkins has been very slow today.

SparkQA · 2019-12-12T08:05:01Z

Test build #115212 has finished for PR 26855 at commit 8058dbf.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-12-12T08:23:27Z

retest this, please

cloud-fan · 2019-12-12T08:40:54Z

LGTM except one comment. We should probably open another PR for that change and ask people who are familiar with kafka to take a look (I am not).

cloud-fan · 2019-12-12T09:22:22Z

retest this please

…e to avoid conflict

SparkQA · 2019-12-12T14:18:26Z

Test build #115226 has finished for PR 26855 at commit 8058dbf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-12-12T18:29:06Z

Test build #115240 has finished for PR 26855 at commit 21d03e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-12-13T08:13:24Z

thanks, merging to master!

HeartSaVioR · 2019-12-13T09:52:07Z

Thanks all for reviewing and merging!

Btw, looks like merge script somehow missed to update the JIRA issue. Could you please take care of this? Thanks again!

[SPARK-30227][SQL] Add close() on DataWriter interface

26b0e25

HeartSaVioR commented Dec 12, 2019

View reviewed changes

HeartSaVioR mentioned this pull request Dec 12, 2019

[SPARK-21869][SS] Revise Kafka producer pool to implement 'expire' correctly #26845

Closed

Fix UT

8058dbf

dongjoon-hyun added the SPARK CORE label Dec 12, 2019

Rollback previous implementation of KafkaDataWriter.close() and renam…

21d03e7

…e to avoid conflict

HeartSaVioR requested a review from cloud-fan December 12, 2019 22:55

cloud-fan closed this in 94eb665 Dec 13, 2019

HeartSaVioR deleted the SPARK-30227 branch December 13, 2019 10:16

[SPARK-30227][SQL] Add close() on DataWriter interface #26855

[SPARK-30227][SQL] Add close() on DataWriter interface #26855

Uh oh!

Conversation

HeartSaVioR commented Dec 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HeartSaVioR Dec 12, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 12, 2019

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Dec 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 12, 2019

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Dec 12, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Dec 12, 2019

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Dec 12, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Dec 12, 2019

Uh oh!

HeartSaVioR commented Dec 12, 2019

Uh oh!

SparkQA commented Dec 12, 2019

Uh oh!

dongjoon-hyun commented Dec 12, 2019

Uh oh!

SparkQA commented Dec 12, 2019

Uh oh!

HeartSaVioR commented Dec 12, 2019

Uh oh!

cloud-fan commented Dec 12, 2019

Uh oh!

cloud-fan commented Dec 12, 2019

Uh oh!

SparkQA commented Dec 12, 2019

Uh oh!

SparkQA commented Dec 12, 2019

Uh oh!

cloud-fan commented Dec 13, 2019

Uh oh!

HeartSaVioR commented Dec 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HeartSaVioR commented Dec 12, 2019 •

edited

Loading

HeartSaVioR Dec 12, 2019 •

edited

Loading

cloud-fan Dec 12, 2019 •

edited

Loading