Skip to content

Conversation

@timmylicheng
Copy link
Contributor

@timmylicheng timmylicheng commented Jun 4, 2020

What changes were proposed in this pull request?

Add tests for PipelineManagerV2.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3679

Please replace this section with the link to the Apache JIRA)

How was this patch tested?

UT

@timmylicheng timmylicheng changed the title HDDS-3679 Add complelete tests for PipelineManagerV2. HDDS-3679 Add unit tests for PipelineManagerV2. Jun 4, 2020
@nandakumar131 nandakumar131 changed the title HDDS-3679 Add unit tests for PipelineManagerV2. HDDS-3679. Add unit tests for PipelineManagerV2. Jun 5, 2020
Copy link
Contributor

@nandakumar131 nandakumar131 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @timmylicheng for working on this. Added the review comments inline.

@nandakumar131
Copy link
Contributor

@timmylicheng TestPipelineManagerImpl is causing JVM crash, can you please take a look at it?

[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] org.apache.hadoop.hdds.scm.pipeline.TestPipelineManagerImpl
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /Users/nvadivelu/Codebase/github/hadoop-ozone/hadoop-hdds/server-scm && /Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError '-javaagent:/Users/nvadivelu/.m2/repository/org/jacoco/org.jacoco.agent/0.8.5/org.jacoco.agent-0.8.5-runtime.jar=destfile=/Users/nvadivelu/Codebase/github/hadoop-ozone/hadoop-hdds/server-scm/target/jacoco.exec,includes=org.apache.hadoop.hdds.*:org.apache.hadoop.ozone.*' -jar /Users/nvadivelu/Codebase/github/hadoop-ozone/hadoop-hdds/server-scm/target/surefire/surefirebooter3475252050624724628.jar /Users/nvadivelu/Codebase/github/hadoop-ozone/hadoop-hdds/server-scm/target/surefire 2020-06-09T14-10-10_102-jvmRun1 surefire368614403722026491tmp surefire_05368121927827483281tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] org.apache.hadoop.hdds.scm.pipeline.TestPipelineManagerImpl
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:511)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:458)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:299)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:247)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1149)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:991)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:837)
[ERROR] 	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR] 	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR] 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
[ERROR] 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
[ERROR] 	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
[ERROR] 	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:956)
[ERROR] 	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
[ERROR] 	at org.apache.maven.cli.MavenCli.main(MavenCli.java:192)
[ERROR] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] 	at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /Users/nvadivelu/Codebase/github/hadoop-ozone/hadoop-hdds/server-scm && /Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError '-javaagent:/Users/nvadivelu/.m2/repository/org/jacoco/org.jacoco.agent/0.8.5/org.jacoco.agent-0.8.5-runtime.jar=destfile=/Users/nvadivelu/Codebase/github/hadoop-ozone/hadoop-hdds/server-scm/target/jacoco.exec,includes=org.apache.hadoop.hdds.*:org.apache.hadoop.ozone.*' -jar /Users/nvadivelu/Codebase/github/hadoop-ozone/hadoop-hdds/server-scm/target/surefire/surefirebooter3475252050624724628.jar /Users/nvadivelu/Codebase/github/hadoop-ozone/hadoop-hdds/server-scm/target/surefire 2020-06-09T14-10-10_102-jvmRun1 surefire368614403722026491tmp surefire_05368121927827483281tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] org.apache.hadoop.hdds.scm.pipeline.TestPipelineManagerImpl
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:670)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:116)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:445)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:421)
[ERROR] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[ERROR] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[ERROR] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[ERROR] 	at java.lang.Thread.run(Thread.java:748)
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException```

@timmylicheng
Copy link
Contributor Author

timmylicheng commented Jun 10, 2020

Dump shows it's related to rocksdb. Not sure if it's related to multiple DBs being merged. Stack looks weird to me. Any ideas? @nandakumar131 @elek @xiaoyuyao

JRE version: Java(TM) SE Runtime Environment (8.0_211-b12) (build 1.8.0_211-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.211-b12 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [librocksdbjni2954960755376440018.jnilib+0x602b8]  rocksdb::GetColumnFamilyID(rocksdb::ColumnFamilyHandle*)+0x8

See full dump at [https://the-asf.slack.com/files/U0159PV5Z6U/F0152UAJF0S/hs_err_pid90655.log?origin_team=T4S1WH2J3&origin_channel=D014L2URB6E](url)

@elek
Copy link
Member

elek commented Jun 10, 2020

FYI: can reproduce it on linux, locally.

It seems to be disappeared when I upgraded my rocksdb version in the main pom.xml:

-        <version>6.6.4</version>
+        <version>6.8.1</version>

I think it's good to upgrade as multiple corruption issues are fixed since 6.8.1...

@timmylicheng
Copy link
Contributor Author

FYI: can reproduce it on linux, locally.

It seems to be disappeared when I upgraded my rocksdb version in the main pom.xml:

-        <version>6.6.4</version>
+        <version>6.8.1</version>

I think it's good to upgrade as multiple corruption issues are fixed since 6.8.1...

@elek Shall we make a separate commit to upgrade rocksdb version?

@elek
Copy link
Member

elek commented Jun 10, 2020

@elek Shall we make a separate commit to upgrade rocksdb version?

I am open for both approaches, but seems to be a good idea to do it on master, too.

I am +1, in advance, if the build is green ;-)

(But we can also add it to here temporary, to check if it helps...)

@timmylicheng
Copy link
Contributor Author

@elek Tests seem passed here. I created https://issues.apache.org/jira/browse/HDDS-3776 to track rocksdb upgrade.

@elek
Copy link
Member

elek commented Jun 15, 2020

Tests seem passed here

Without any rocksdb upgrade? Scary... Can be a new intermittent failure... I quickly uploaded the version bump to avoid similar issues in the future #1077 (Thanks to open the issue.)

I am merging it. As I see all the comments from @nandakumar131 (thanks the review!) are addressed.

Thanks the contribution @timmylicheng

@elek elek merged commit 8e86480 into apache:HDDS-2823 Jun 15, 2020
@nandakumar131
Copy link
Contributor

Even after HDDS-3776 I'm able to see the jvm crash because of TestPipelineManagerImpl.

@nandakumar131
Copy link
Contributor

The crash is not due to RocksDB bug. We are trying to access the DB after closing it, which is causing the crash.

The BackgroundPipelineCreator thread is trying to access RocksDB after the DB is closed using junit's @After method.

C  [librocksdbjni5084871938501755619.jnilib+0x10243e]  _ZN7rocksdb6DBImpl3PutERKNS_12WriteOptionsEPNS_18ColumnFamilyHandleERKNS_5SliceES8_+0xe
C  [librocksdbjni5084871938501755619.jnilib+0x181b1]  _Z18rocksdb_put_helperP7JNIEnv_PN7rocksdb2DBERKNS1_12WriteOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayiiSA_ii+0x131
j  org.rocksdb.RocksDB.put(JJ[BII[BIIJ)V+0
j  org.rocksdb.RocksDB.put(Lorg/rocksdb/ColumnFamilyHandle;Lorg/rocksdb/WriteOptions;[B[B)V+23
j  org.apache.hadoop.hdds.utils.db.RDBTable.put([B[B)V+14
j  org.apache.hadoop.hdds.utils.db.RDBTable.put(Ljava/lang/Object;Ljava/lang/Object;)V+9
j  org.apache.hadoop.hdds.utils.db.TypedTable.put(Ljava/lang/Object;Ljava/lang/Object;)V+26
j  org.apache.hadoop.hdds.scm.pipeline.PipelineStateManagerV2Impl.addPipeline(Lorg/apache/hadoop/hdds/protocol/proto/HddsProtos$Pipeline;)V+14
j  sun.reflect.GeneratedMethodAccessor7.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+40
J 1739 C2 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (62 bytes) @ 0x000000010a9f7ba8 [0x000000010a9f7b00+0xa8]
j  org.apache.hadoop.hdds.scm.ha.MockSCMHAManager$MockRatisServer.process(Lorg/apache/hadoop/hdds/scm/ha/SCMRatisRequest;)Lorg/apache/ratis/protocol/Message;+131
j  org.apache.hadoop.hdds.scm.ha.MockSCMHAManager$MockRatisServer.submitRequest(Lorg/apache/hadoop/hdds/scm/ha/SCMRatisRequest;)Lorg/apache/hadoop/hdds/scm/ha/SCMRatisResponse;+14
j  org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object;+31
J 1944 C1 org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; (62 bytes) @ 0x000000010aa7728c [0x000000010aa76c40+0x64c]
j  com.sun.proxy.$Proxy9.addPipeline(Lorg/apache/hadoop/hdds/protocol/proto/HddsProtos$Pipeline;)V+16
j  org.apache.hadoop.hdds.scm.pipeline.PipelineManagerV2Impl.createPipeline(Lorg/apache/hadoop/hdds/protocol/proto/HddsProtos$ReplicationType;Lorg/apache/hadoop/hdds/protocol/proto/HddsProtos$ReplicationFactor;)Lorg/apache/hadoop/hdds/scm/pipeline/Pipeline;+66
j  org.apache.hadoop.hdds.scm.pipeline.BackgroundPipelineCreator.createPipelines()V+130
j  org.apache.hadoop.hdds.scm.pipeline.BackgroundPipelineCreator$$Lambda$20.run()V+4
j  java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4
j  java.util.concurrent.FutureTask.run()V+42
j  java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1
j  java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+30
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j  java.lang.Thread.run()V+11

Ideally, the test case execution should end when @After method is executed. In our case, the @After method and BackgroundPipelineCreator are running at the same time causing intermittent JVM crash.

@nandakumar131
Copy link
Contributor

Created HDDS-3890 for the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants