Skip to content

提交图计算任务失败,提示:rocksdb chk error #12

@wgb1990

Description

@wgb1990
Exception in thread "rpc-executor-1" com.antgroup.geaflow.common.exception.GeaflowRuntimeException: 
************
ERR_ID:
     RUN-00000002
CAUSE:
     Run error: rocksdb chk error
ACTION:
     Please check your code, or contact admin.
DETAIL:

************
	at com.antgroup.geaflow.store.rocksdb.RocksdbClient.checkpoint(RocksdbClient.java:142)
	at com.antgroup.geaflow.store.rocksdb.BaseRocksdbStore.archive(BaseRocksdbStore.java:75)
	at com.antgroup.geaflow.cluster.system.RocksdbClusterMetaKVStore.flush(RocksdbClusterMetaKVStore.java:54)
	at com.antgroup.geaflow.cluster.system.ClusterMetaStore.flush(ClusterMetaStore.java:158)
	at com.antgroup.geaflow.cluster.resourcemanager.DefaultResourceManager.persist(DefaultResourceManager.java:288)
	at com.antgroup.geaflow.cluster.resourcemanager.DefaultResourceManager.onRegister(DefaultResourceManager.java:252)
	at com.antgroup.geaflow.cluster.resourcemanager.DefaultResourceManager.lambda$onSuccess$2(DefaultResourceManager.java:201)
	at com.antgroup.geaflow.cluster.resourcemanager.DefaultResourceManager.withLock(DefaultResourceManager.java:276)
	at com.antgroup.geaflow.cluster.resourcemanager.DefaultResourceManager.onSuccess(DefaultResourceManager.java:194)
	at com.antgroup.geaflow.cluster.clustermanager.AbstractClusterManager.handleRegisterResponse(AbstractClusterManager.java:190)
	at com.antgroup.geaflow.cluster.clustermanager.AbstractClusterManager.access$000(AbstractClusterManager.java:47)
	at com.antgroup.geaflow.cluster.clustermanager.AbstractClusterManager$1.onSuccess(AbstractClusterManager.java:160)
	at com.antgroup.geaflow.cluster.clustermanager.AbstractClusterManager$1.onSuccess(AbstractClusterManager.java:152)
	at com.antgroup.geaflow.cluster.rpc.impl.AbstractRpcEndpointRef$1.onSuccess(AbstractRpcEndpointRef.java:75)
	at com.google.common.util.concurrent.Futures$4.run(Futures.java:1132)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.rocksdb.RocksDBException: While open a file for appending: /tmp/geaflow1686752263277105350/framework/cluster/geaflow1686752263277105350-1686799890038/0_chk1.tmp/CURRENT: No such file or directory
	at org.rocksdb.Checkpoint.createCheckpoint(Native Method)
	at org.rocksdb.Checkpoint.createCheckpoint(Checkpoint.java:51)
	at com.antgroup.geaflow.store.rocksdb.RocksdbClient.checkpoint(RocksdbClient.java:140)
	... 17 more
2023-06-15 03:31:33 ERROR Driver:135 - driver exception
com.antgroup.geaflow.common.exception.GeaflowRuntimeException: 
************
ERR_ID:
     RUN-00000002
CAUSE:
     Run error: rocksdb chk error
ACTION:
     Please check your code, or contact admin.
DETAIL:

************
	at com.antgroup.geaflow.store.rocksdb.RocksdbClient.checkpoint(RocksdbClient.java:142) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.store.rocksdb.BaseRocksdbStore.archive(BaseRocksdbStore.java:75) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.system.RocksdbClusterMetaKVStore.flush(RocksdbClusterMetaKVStore.java:54) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.system.ClusterMetaStore.flush(ClusterMetaStore.java:158) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.driver.DriverContext$PipelineCheckpointFunction.doCheckpoint(DriverContext.java:85) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.common.ReliableContainerContext.checkpoint(ReliableContainerContext.java:45) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.driver.Driver.executePipelineInternal(Driver.java:108) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.driver.Driver.lambda$executePipeline$1(Driver.java:96) ~[geaflow-geaflow.jar:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_372]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_372]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_372]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_372]
Caused by: org.rocksdb.RocksDBException: while link file to /tmp/geaflow1686752263277105350/framework/cluster/geaflow1686752263277105350-1686799890038/0_chk1.tmp/000011.sst: /tmp/geaflow1686752263277105350/framework/cluster/geaflow1686752263277105350-1686799890038/0/000011.sst: File exists
	at org.rocksdb.Checkpoint.createCheckpoint(Native Method) ~[geaflow-geaflow.jar:?]
	at org.rocksdb.Checkpoint.createCheckpoint(Checkpoint.java:51) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.store.rocksdb.RocksdbClient.checkpoint(RocksdbClient.java:140) ~[geaflow-geaflow.jar:?]
	... 11 more
2023-06-15 03:31:33 ERROR DriverEndpoint:51 - execute pipeline failed: java.util.concurrent.ExecutionException: com.antgroup.geaflow.common.exception.GeaflowRuntimeException: 
************
ERR_ID:
     RUN-00000002
CAUSE:
     Run error: rocksdb chk error
ACTION:
     Please check your code, or contact admin.
DETAIL:

************
com.antgroup.geaflow.common.exception.GeaflowRuntimeException: java.util.concurrent.ExecutionException: com.antgroup.geaflow.common.exception.GeaflowRuntimeException: 
************
ERR_ID:
     RUN-00000002
CAUSE:
     Run error: rocksdb chk error
ACTION:
     Please check your code, or contact admin.
DETAIL:

************
	at com.antgroup.geaflow.cluster.driver.Driver.executePipeline(Driver.java:100) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.driver.Driver.executePipeline(Driver.java:46) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.rpc.impl.DriverEndpoint.executePipeline(DriverEndpoint.java:45) [geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.rpc.proto.DriverServiceGrpc$MethodHandlers.invoke(DriverServiceGrpc.java:266) [geaflow-geaflow.jar:?]
	at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) [geaflow-geaflow.jar:?]
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) [geaflow-geaflow.jar:?]
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707) [geaflow-geaflow.jar:?]
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) [geaflow-geaflow.jar:?]
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) [geaflow-geaflow.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_372]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_372]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_372]
Caused by: java.util.concurrent.ExecutionException: com.antgroup.geaflow.common.exception.GeaflowRuntimeException: 
************
ERR_ID:
     RUN-00000002
CAUSE:
     Run error: rocksdb chk error
ACTION:
     Please check your code, or contact admin.
DETAIL:

************
	at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_372]
	at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_372]
	at com.antgroup.geaflow.cluster.driver.Driver.executePipeline(Driver.java:98) ~[geaflow-geaflow.jar:?]
	... 11 more
Caused by: com.antgroup.geaflow.common.exception.GeaflowRuntimeException: 
************
ERR_ID:
     RUN-00000002
CAUSE:
     Run error: rocksdb chk error
ACTION:
     Please check your code, or contact admin.
DETAIL:

************
	at com.antgroup.geaflow.store.rocksdb.RocksdbClient.checkpoint(RocksdbClient.java:142) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.store.rocksdb.BaseRocksdbStore.archive(BaseRocksdbStore.java:75) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.system.RocksdbClusterMetaKVStore.flush(RocksdbClusterMetaKVStore.java:54) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.system.ClusterMetaStore.flush(ClusterMetaStore.java:158) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.driver.DriverContext$PipelineCheckpointFunction.doCheckpoint(DriverContext.java:85) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.common.ReliableContainerContext.checkpoint(ReliableContainerContext.java:45) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.driver.Driver.executePipelineInternal(Driver.java:108) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.driver.Driver.lambda$executePipeline$1(Driver.java:96) ~[geaflow-geaflow.jar:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_372]
	... 3 more
Caused by: org.rocksdb.RocksDBException: while link file to /tmp/geaflow1686752263277105350/framework/cluster/geaflow1686752263277105350-1686799890038/0_chk1.tmp/000011.sst: /tmp/geaflow1686752263277105350/framework/cluster/geaflow1686752263277105350-1686799890038/0/000011.sst: File exists
	at org.rocksdb.Checkpoint.createCheckpoint(Native Method) ~[geaflow-geaflow.jar:?]
	at org.rocksdb.Checkpoint.createCheckpoint(Checkpoint.java:51) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.store.rocksdb.RocksdbClient.checkpoint(RocksdbClient.java:140) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.store.rocksdb.BaseRocksdbStore.archive(BaseRocksdbStore.java:75) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.system.RocksdbClusterMetaKVStore.flush(RocksdbClusterMetaKVStore.java:54) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.system.ClusterMetaStore.flush(ClusterMetaStore.java:158) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.driver.DriverContext$PipelineCheckpointFunction.doCheckpoint(DriverContext.java:85) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.common.ReliableContainerContext.checkpoint(ReliableContainerContext.java:45) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.driver.Driver.executePipelineInternal(Driver.java:108) ~[geaflow-geaflow.jar:?]
	at com.antgroup.geaflow.cluster.driver.Driver.lambda$executePipeline$1(Driver.java:96) ~[geaflow-geaflow.jar:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_372]
	... 3 more
Exception in thread "main" io.grpc.StatusRuntimeException: UNKNOWN
	at io.grpc.Status.asRuntimeException(Status.java:526)
	at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:576)
	at com.antgroup.geaflow.cluster.client.PipelineResult.isSuccess(PipelineResult.java:37)
	at com.antgroup.geaflow.cluster.client.PipelineResult.get(PipelineResult.java:53)
	at com.antgroup.geaflow.dsl.runtime.engine.GQLPipeLine.execute(GQLPipeLine.java:116)
	at com.antgroup.geaflow.dsl.runtime.engine.GeaFlowGqlClient.main(GeaFlowGqlClient.java:56)
2023-06-15 03:31:33 WARN  RpcServiceImpl:63 - *** shutting down gRPC server since JVM is shutting down
2023-06-15 03:31:33 WARN  RpcServiceImpl:63 - *** shutting down gRPC server since JVM is shutting down
2023-06-15 03:31:33 WARN  RpcServiceImpl:63 - *** shutting down gRPC server since JVM is shutting down
2023-06-15 03:31:33 WARN  RpcServiceImpl:65 - *** server shut down
2023-06-15 03:31:33 WARN  RpcServiceImpl:65 - *** server shut down
2023-06-15 03:31:33 WARN  RpcServiceImpl:65 - *** server shut down

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions