SPARK-1382: Fix NPE in DStream.slice #365

zsxwing · 2014-04-09T03:05:25Z

If we call the DStream.slice() before StreamingContext.start() has been called, then zeroTime is still null, and it will throw a null pointer exception. Ideally, it should throw something like a "ContextNotInitlalized" exception.

https://issues.apache.org/jira/browse/SPARK-1382

This PR added a check in the slice and its unit test.

AmplabJenkins · 2014-04-09T03:07:23Z

Can one of the admins verify this patch?

rxin · 2014-04-17T00:48:25Z

Jenkins, test this please.

tdas · 2014-04-17T00:51:45Z

streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala

Can you make this a SparkExcepation? All expected exceptions thrown by Spark should be SparkException.

Sure. One more question: There are many new Exception in DStream.scala. Is it necessary to change them to SparkExcepation?

If you can do it in this pr that would be great!

On Wednesday, April 16, 2014, Shixiong Zhu [email protected] wrote:

In
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala:

@@ -725,6 +725,9 @@ abstract class DStream[T: ClassTag](* Return all the RDDs between 'fromTime' to 'toTime' %28both included)
*/
def slice(fromTime: Time, toTime: Time): Seq[RDD[T]] = {

if (!isInitialized) {

throw new Exception(this + " has not been initialized")

Sure. One more question: There are many new Exception in DStream.scala.
It's necessary to change them to SparkExcepation?

Reply to this email directly or view it on GitHubhttps://github.com//pull/365/files#r11716967
.

tdas · 2014-04-17T00:53:06Z

Jenkins, test this please.

AmplabJenkins · 2014-04-17T00:53:12Z

Merged build triggered.

AmplabJenkins · 2014-04-17T00:53:22Z

Merged build started.

AmplabJenkins · 2014-04-17T02:04:14Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-17T02:04:14Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14194/

zsxwing · 2014-04-17T06:28:33Z

Changing to SparkExcepation will not break the API since SparkExcepation is a subclass of Exception.

rxin · 2014-04-17T06:46:08Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-17T06:48:11Z

Merged build triggered.

AmplabJenkins · 2014-04-17T06:48:19Z

Merged build started.

AmplabJenkins · 2014-04-17T07:26:22Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-17T07:26:22Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14203/

tdas · 2014-04-26T01:38:15Z

This doesnt quite merge so I cherry picked the commits, merged master, and issued a new PR #562
I will merge that in asap. Can you please close this PR ?

Thanks for the bug fix btw!

@zsxwing

@zsxwing I cherry-picked your changes and merged the master. apache#365 had some conflicts once again! Author: zsxwing <[email protected]> Author: Tathagata Das <[email protected]> Closes apache#562 from tdas/SPARK-1382 and squashes the following commits: e2962c1 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-1382 20968d9 [zsxwing] Replace Exception with SparkException in DStream e476651 [zsxwing] Merge remote-tracking branch 'origin/master' into SPARK-1382 35ba56a [zsxwing] SPARK-1382: Fix NPE in DStream.slice

@zsxwing

@zsxwing I cherry-picked your changes and merged the master. #365 had some conflicts once again! Author: zsxwing <[email protected]> Author: Tathagata Das <[email protected]> Closes #562 from tdas/SPARK-1382 and squashes the following commits: e2962c1 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-1382 20968d9 [zsxwing] Replace Exception with SparkException in DStream e476651 [zsxwing] Merge remote-tracking branch 'origin/master' into SPARK-1382 35ba56a [zsxwing] SPARK-1382: Fix NPE in DStream.slice (cherry picked from commit 058797c) Signed-off-by: Tathagata Das <[email protected]>

zsxwing · 2014-04-26T11:20:22Z

Thank you for merging it.

@zsxwing

@zsxwing I cherry-picked your changes and merged the master. apache#365 had some conflicts once again! Author: zsxwing <[email protected]> Author: Tathagata Das <[email protected]> Closes apache#562 from tdas/SPARK-1382 and squashes the following commits: e2962c1 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-1382 20968d9 [zsxwing] Replace Exception with SparkException in DStream e476651 [zsxwing] Merge remote-tracking branch 'origin/master' into SPARK-1382 35ba56a [zsxwing] SPARK-1382: Fix NPE in DStream.slice

…e#365) * Submission client redesign to use a step-based builder pattern. This change overhauls the underlying architecture of the submission client, but it is intended to entirely preserve existing behavior of Spark applications. Therefore users will find this to be an invisible change. The philosophy behind this design is to reconsider the breakdown of the submission process. It operates off the abstraction of "submission steps", which are transformation functions that take the previous state of the driver and return the new state of the driver. The driver's state includes its Spark configurations and the Kubernetes resources that will be used to deploy it. Such a refactor moves away from a features-first API design, which considers different containers to serve a set of features. The previous design, for example, had a container files resolver API object that returned different resolutions of the dependencies added by the user. However, it was up to the main Client to know how to intelligently invoke all of those APIs. Therefore the API surface area of the file resolver became untenably large and it was not intuitive of how it was to be used or extended. This design changes the encapsulation layout; every module is now responsible for changing the driver specification directly. An orchestrator builds the correct chain of steps and hands it to the client, which then calls it verbatim. The main client then makes any final modifications that put the different pieces of the driver together, particularly to attach the driver container itself to the pod and to apply the Spark configuration as command-line arguments. * Add a unit test for BaseSubmissionStep. * Add unit test for kubernetes credentials mounting. * Add unit test for InitContainerBootstrapStep. * unit tests for initContainer * Add a unit test for DependencyResolutionStep. * further modifications to InitContainer unit tests * Use of resolver in PythonStep and unit tests for PythonStep * refactoring of init unit tests and pythonstep resolver logic * Add unit test for KubernetesSubmissionStepsOrchestrator. * refactoring and addition of secret trustStore+Cert checks in a SubmissionStepSuite * added SparkPodInitContainerBootstrapSuite * Added InitContainerResourceStagingServerSecretPluginSuite * style in Unit tests * extremely minor style fix in variable naming * Address comments. * Rename class for consistency. * Attempt to make spacing consistent. Multi-line methods should have four-space indentation for arguments that aren't on the same line as the method call itself... but this is difficult to do consistently given how IDEs handle Scala multi-line indentation in most cases.

Clean the legacy Go projects path compatibility workarounds Closes: theopenlab/openlab#123 Related-Bugs: theopenlab/openlab#100

* AL-4217 skip view cache when lookupRelation * update version to r44

…an should be semantically equivalent (apache#365) When canonicalizing `output` in `InMemoryRelation`, use `output` itself as the schema for determining the ordinals, rather than `cachedPlan.output`. `InMemoryRelation.output` and `InMemoryRelation.cachedPlan.output` don't necessarily use the same exprIds. E.g.: ``` +- InMemoryRelation [c1#340, c2#341], StorageLevel(disk, memory, deserialized, 1 replicas) +- LocalTableScan [c1#254, c2#255] ``` Because of this, `InMemoryRelation` will sometimes fail to fully canonicalize, resulting in cases where two semantically equivalent `InMemoryRelation` instances appear to be semantically nonequivalent. Example: ``` create or replace temp view data(c1, c2) as values (1, 2), (1, 3), (3, 7), (4, 5); cache table data; select c1, (select count(*) from data d1 where d1.c1 = d2.c1), count(c2) from data d2 group by all; ``` If plan change validation checking is on (i.e., `spark.sql.planChangeValidation=true`), the failure is: ``` [PLAN_VALIDATION_FAILED_RULE_EXECUTOR] The input plan of org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$2 is invalid: Aggregate: Aggregate [c1#78, scalar-subquery#77 [c1#78]], [c1#78, scalar-subquery#77 [c1#78] AS scalarsubquery(c1)#90L, count(c2#79) AS count(c2)#83L] ... is not a valid aggregate expression: [SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION] The correlated scalar subquery '"scalarsubquery(c1)"' is neither present in GROUP BY, nor in an aggregate function. ``` If plan change validation checking is off, the failure is more mysterious: ``` [INTERNAL_ERROR] Couldn't find count(1)#163L in [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000 org.apache.spark.SparkException: [INTERNAL_ERROR] Couldn't find count(1)#163L in [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000 ``` If you remove the cache command, the query succeeds. The above failures happen because the subquery in the aggregate expressions and the subquery in the grouping expressions seem semantically nonequivalent since the `InMemoryRelation` in one of the subquery plans failed to completely canonicalize. In `CacheManager#useCachedData`, two lookups for the same cached plan may create `InMemoryRelation` instances that have different exprIds in `output`. That's because the plan fragments used as lookup keys may have been deduplicated by `DeduplicateRelations`, and thus have different exprIds in their respective output schemas. When `CacheManager#useCachedData` creates an `InMemoryRelation` instance, it borrows the output schema of the plan fragment used as the lookup key. The failure to fully canonicalize has other effects. For example, this query fails to reuse the exchange: ``` create or replace temp view data(c1, c2) as values (1, 2), (1, 3), (2, 4), (3, 7), (7, 22); cache table data; set spark.sql.autoBroadcastJoinThreshold=-1; set spark.sql.adaptive.enabled=false; select * from data l join data r on l.c1 = r.c1; ``` No. New tests. No. Closes apache#44806 from bersprockets/plan_validation_issue. Authored-by: Bruce Robbins <[email protected]> (cherry picked from commit b80e8cb) Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: Bruce Robbins <[email protected]>

SPARK-1382: Fix NPE in DStream.slice

35ba56a

tdas reviewed Apr 17, 2014
View reviewed changes

Merge remote-tracking branch 'origin/master' into SPARK-1382

e476651

Replace Exception with SparkException in DStream

20968d9

tdas mentioned this pull request Apr 26, 2014

[Spark-1382] Fix NPE in DStream.slice (updated version of #365) #562

Closed

zsxwing closed this Apr 26, 2014

zsxwing deleted the SPARK-1382 branch May 18, 2014 09:50

RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Mar 18, 2022

AL-4217 skip view cache when lookupRelation (apache#365)

b4da366

* AL-4217 skip view cache when lookupRelation * update version to r44

SPARK-1382: Fix NPE in DStream.slice #365

SPARK-1382: Fix NPE in DStream.slice #365

Uh oh!

Conversation

zsxwing commented Apr 9, 2014

Uh oh!

AmplabJenkins commented Apr 9, 2014

Uh oh!

rxin commented Apr 17, 2014

Uh oh!

tdas Apr 17, 2014

Choose a reason for hiding this comment

Uh oh!

zsxwing Apr 17, 2014

Choose a reason for hiding this comment

Uh oh!

rxin Apr 17, 2014

Choose a reason for hiding this comment

Uh oh!

tdas commented Apr 17, 2014

Uh oh!

AmplabJenkins commented Apr 17, 2014

Uh oh!

AmplabJenkins commented Apr 17, 2014

Uh oh!

AmplabJenkins commented Apr 17, 2014

Uh oh!

AmplabJenkins commented Apr 17, 2014

Uh oh!

zsxwing commented Apr 17, 2014

Uh oh!

rxin commented Apr 17, 2014

Uh oh!

AmplabJenkins commented Apr 17, 2014

Uh oh!

AmplabJenkins commented Apr 17, 2014

Uh oh!

AmplabJenkins commented Apr 17, 2014

Uh oh!

AmplabJenkins commented Apr 17, 2014

Uh oh!

tdas commented Apr 26, 2014

Uh oh!

zsxwing commented Apr 26, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants