Skip to content

Conversation

@arunmahadevan
Copy link
Contributor

What changes were proposed in this pull request?

This builds on top of SPARK-24748 to report 'offset lag' as a custom metrics for Kafka structured streaming source.

This lag is the difference between the latest offsets in Kafka the time the metrics is reported (just after a micro-batch completes) and the latest offset Spark has processed. It can be 0 (or close to 0) if spark keeps up with the rate at which messages are ingested into Kafka topics in steady state. This measures how far behind the spark source has fallen behind (per partition) and can aid in tuning the application.

How was this patch tested?

Existing and new unit tests

Please review http://spark.apache.org/contributing.html before opening a pull request.

@SparkQA
Copy link

SparkQA commented Jul 19, 2018

Test build #93296 has finished for PR 21819 at commit c1fc3ca.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@arunmahadevan
Copy link
Contributor Author

@HeartSaVioR @HyukjinKwon @jose-torres @tdas would you mind taking a look?

@SparkQA
Copy link

SparkQA commented Aug 10, 2018

Test build #94582 has finished for PR 21819 at commit 7129c3f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* Write per-topic partition lag as json string
*/
def partitionLags(latestOffsets: Map[TopicPartition, Long],
processedOffsets: Map[TopicPartition, Long]): String = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

def partitionLags(
    latestOffsets: Map[TopicPartition, Long],
    processedOffsets: Map[TopicPartition, Long]): String = {

per https://github.com/databricks/scala-style-guide#spacing-and-indentation

Please feel free to address it with other comments later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed. would it be possible to add this to scala style checks ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please go ahead if possible.

*/
def partitionLags(latestOffsets: Map[TopicPartition, Long],
processedOffsets: Map[TopicPartition, Long]): String = {
val result = new HashMap[String, HashMap[Int, Long]]()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: HashMap.empty[String, HashMap[Int, Long]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had followed the style in other parts of the class. Addressed and refactored the other places as well.

@SparkQA
Copy link

SparkQA commented Aug 13, 2018

Test build #94699 has finished for PR 21819 at commit ffa20ba.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@HyukjinKwon
Copy link
Member

cc @koeninger as well

@SparkQA
Copy link

SparkQA commented Aug 14, 2018

Test build #94726 has finished for PR 21819 at commit ffa20ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@arunmahadevan
Copy link
Contributor Author

@HyukjinKwon , can you take it forward? Appreciate your effort and thanks in advance.

@HyukjinKwon
Copy link
Member

Let me leave this in few days in case someone has more comments on this.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 14d7c1c Aug 18, 2018
asfgit pushed a commit that referenced this pull request Sep 5, 2018
## What changes were proposed in this pull request?

Revert SPARK-24863 (#21819) and SPARK-24748 (#21721) as per discussion in #21721. We will revisit them when the data source v2 APIs are out.

## How was this patch tested?

Jenkins

Closes #22334 from zsxwing/revert-SPARK-24863-SPARK-24748.

Authored-by: Shixiong Zhu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants