[SPARK-20883][SPARK-20376][SS] Refactored StateStore APIs and added conf to choose implementation by tdas · Pull Request #18107 · apache/spark

tdas · 2017-05-25T11:11:06Z

What changes were proposed in this pull request?

A bunch of changes to the StateStore APIs and implementation.
Current state store API has a bunch of problems that causes too many transient objects causing memory pressure.

StateStore.get(): Option forces creation of Some/None objects for every get. Changed this to return the row or null.
StateStore.iterator(): (UnsafeRow, UnsafeRow) forces creation of new tuple for each record returned. Changed this to return a UnsafeRowTuple which can be reused across records.
StateStore.updates() requires the implementation to keep track of updates, while this is used minimally (only by Append mode in streaming aggregations). Removed updates() and updated StateStoreSaveExec accordingly.
StateStore.filter(condition) and StateStore.remove(condition) has been merge into a single API getRange(start, end) which allows a state store to do optimized range queries (i.e. avoid full scans). Stateful operators have been updated accordingly.
Removed a lot of unnecessary row copies Each operator copied rows before calling StateStore.put() even if the implementation does not require it to be copied. It is left up to the implementation on whether to copy the row or not.

Additionally,

Added a name to the StateStoreId so that each operator+partition can use multiple state stores (different names)
Added a configuration that allows the user to specify which implementation to use.
Added new metrics to understand the time taken to update keys, remove keys and commit all changes to the state store. These metrics will be visible on the plan diagram in the SQL tab of the UI.
Refactored unit tests such that they can be reused to test any implementation of StateStore.

How was this patch tested?

Old and new unit tests

tdas · 2017-05-25T11:11:30Z

update this description

SparkQA · 2017-05-25T11:15:16Z

Test build #77362 has finished for PR 18107 at commit 03f5bf3.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class StateStoreId(
case class StateStoreStats()
case class UnsafeRowTuple(var key: UnsafeRow = null, var value: UnsafeRow = null)
trait StateStoreWriter extends StatefulOperator

…entation

tdas · 2017-05-25T11:18:44Z

+    partitionId: Int,
+    name: String = "")
+
+case class StateStoreStats()


remove this

tdas · 2017-05-25T11:18:57Z

   */
  def remove(key: UnsafeRow): Unit

+  def getRange(start: Option[UnsafeRow], end: Option[UnsafeRow]): Iterator[UnsafeRowTuple]


tdas · 2017-05-25T11:19:10Z

-  def key: UnsafeRow
-  def value: UnsafeRow
+object StateStoreProvider {
+  def instantiate(


tdas · 2017-05-25T11:20:14Z

-              val key = getKey(row)
-              store.put(key.copy(), row.copy())
-              numUpdatedStateRows += 1
+            allUpdatesTimeMs += timeTakenMs {


this update is to accommodate for removal of StateStore.updates()

SparkQA · 2017-05-25T12:49:29Z

Test build #3755 has finished for PR 18107 at commit d645b41.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class StateStoreId(
case class StateStoreStats()
case class UnsafeRowTuple(var key: UnsafeRow = null, var value: UnsafeRow = null)
trait StateStoreWriter extends StatefulOperator

SparkQA · 2017-05-25T12:50:04Z

Test build #77363 has finished for PR 18107 at commit d645b41.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class StateStoreId(
case class StateStoreStats()
case class UnsafeRowTuple(var key: UnsafeRow = null, var value: UnsafeRow = null)
trait StateStoreWriter extends StatefulOperator

tdas · 2017-05-25T22:05:26Z

    expectedState = Some(5),                                  // state should change
    expectedTimeoutTimestamp = 5000)                          // timestamp should change

-  test("StateStoreUpdater - rows are cloned before writing to StateStore") {


This is not needed any more as the operator is not responsible for cloning the rows when writing to the store.

SparkQA · 2017-05-25T22:21:53Z

Test build #77386 has finished for PR 18107 at commit 324fc24.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-26T05:56:53Z

Test build #77402 has finished for PR 18107 at commit 3e49621.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class StateStoreId(
case class UnsafeRowPair(var key: UnsafeRow = null, var value: UnsafeRow = null)

zsxwing

Looks pretty good. My major comment is Prefer nanoTime over currentTimeMillis

zsxwing · 2017-05-25T23:38:34Z


  def optimizerInSetConversionThreshold: Int = getConf(OPTIMIZER_INSET_CONVERSION_THRESHOLD)

+  def stateStoreProviderClass: Option[String] = getConf(STATE_STORE_PROVIDER_CLASS)


Also add this to StateStoreConf for consistency?

zsxwing · 2017-05-26T00:49:18Z

+
+  /** Records the duration of running `body` for the next query progress update. */
+  protected def timeTakenMs(body: => Unit): Long = {
+    val startTime = System.currentTimeMillis


nit: Use nanoTime instead

zsxwing · 2017-05-26T22:59:01Z

+      .internal()
+      .doc(
+        "The class used to manage state data in stateful streaming queries. This class must" +
+          "be a subclass of StateStoreProvider, and must have a zero-arg constructor.")


nit: missing space before be.

zsxwing · 2017-05-26T23:20:22Z

+      storeConfs: StateStoreConf,
+      hadoopConf: Configuration): Unit = {
+    throw new Exception("Successfully instantiated")
+


nit: extra empty line.

zsxwing · 2017-05-30T05:22:42Z

+      indexOrdinal: Option[Int], // for sorting the data
+      storeConf: StateStoreConf,
+      hadoopConf: Configuration): StateStoreProvider = {
+    val provider = Utils.getContextOrSparkClassLoader


nit: Use Utils.classForName(providerClass).

zsxwing · 2017-05-30T05:35:53Z

          // Update and output modified rows from the StateStore.
          case Some(Update) =>

+            val updatesStartTimeMs = System.currentTimeMillis


nit: please use nanoTime

zsxwing · 2017-05-30T05:36:04Z

  override def output: Seq[Attribute] = child.output

  override def outputPartitioning: Partitioning = child.outputPartitioning
+


nit: extra empty lines

zsxwing · 2017-05-30T05:36:17Z

        case None => iter
      }

+      val updatesStartTimeMs = System.currentTimeMillis


nit: please use nanoTime

zsxwing · 2017-05-30T05:36:22Z

      CompletionIterator[InternalRow, Iterator[InternalRow]](result, {
-        watermarkPredicateForKeys.foreach(f => store.remove(f.eval _))
-        store.commit()
+        allUpdatesTimeMs += System.currentTimeMillis - updatesStartTimeMs


nit: please use nanoTime

zsxwing · 2017-05-30T05:50:10Z

+              override protected def getNext(): InternalRow = {
+                var removedValueRow: InternalRow = null
+                while(rangeIter.hasNext && removedValueRow == null) {
+                  val UnsafeRowPair(keyRow, valueRow) = rangeIter.next()


Case class's unapply will create a Tuple. You should not use this Scala syntactic sugar :)

That is true! I had assumed unapply will get desugared into something simple, but its probably best to not to rely on the Scala compiler so much.

zsxwing · 2017-05-30T20:01:37Z

LGTM pending tests.

SparkQA · 2017-05-30T21:49:51Z

Test build #77546 has finished for PR 18107 at commit baba63d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class UnsafeRowPair(var key: UnsafeRow = null, var value: UnsafeRow = null)

SparkQA · 2017-05-30T21:59:49Z

Test build #77547 has finished for PR 18107 at commit 5c0961c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-30T22:01:53Z

Test build #77549 has finished for PR 18107 at commit fdfdcab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2017-05-30T22:32:45Z

Thanks! Merging to master.

tdas commented May 25, 2017

View reviewed changes

Refactored StateStore APIs and added conf to choose StateStore implem…

d645b41

…entation

tdas force-pushed the SPARK-20376 branch from 03f5bf3 to d645b41 Compare May 25, 2017 11:17

tdas commented May 25, 2017

View reviewed changes

Fixed bugs

324fc24

tdas commented May 25, 2017

View reviewed changes

Added more docs

3e49621

zsxwing requested changes May 30, 2017

View reviewed changes

tdas added 3 commits May 30, 2017 12:30

Addressed comments

baba63d

Added docs for StateStoreConf

5c0961c

Few more nits addressed

fdfdcab

asfgit closed this in fa757ee May 30, 2017

gatorsmile mentioned this pull request Jan 14, 2020

[SPARK-29450][SS] Measure the number of output rows for streaming aggregation with append mode #26104

Closed


		def optimizerInSetConversionThreshold: Int = getConf(OPTIMIZER_INSET_CONVERSION_THRESHOLD)

		def stateStoreProviderClass: Option[String] = getConf(STATE_STORE_PROVIDER_CLASS)

		override def output: Seq[Attribute] = child.output

		override def outputPartitioning: Partitioning = child.outputPartitioning

Conversation

tdas commented May 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 25, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 25, 2017

Uh oh!

SparkQA commented May 25, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 25, 2017

Uh oh!

SparkQA commented May 26, 2017

Uh oh!

zsxwing left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zsxwing commented May 30, 2017

Uh oh!

SparkQA commented May 30, 2017

Uh oh!

SparkQA commented May 30, 2017

Uh oh!

SparkQA commented May 30, 2017

Uh oh!

zsxwing commented May 30, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tdas commented May 25, 2017 •

edited

Loading