-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
集群环境下,服务实例间的元数据经过调用openapi修改实例元数据后,小概率出现实例元数据不一致并保持。 #11934
Comments
API修改元数据是通过raft协议同步的, 如果出现不一致的情况, 可以看下是不是有问题的节点在raft上和其他节点中掉线了,alipay-jraft.log |
另外你这个使用场景可能不太正确,元数据一般是用来存放这个实例的属性标记,比如AZ,版本,标签等, 不应该存放动态的业务内容。 |
|
其中两台nacos是部署在同一台虚拟机上,剩余一台nacos部署在另外一台,区域网内。 2024-04-01 14:57:54,443 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:57:54,945 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:57:55,447 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:57:55,948 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:57:56,450 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:57:56,951 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:57:57,453 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:57:57,955 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:57:58,456 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:57:58,958 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:57:59,459 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:57:59,961 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:00,462 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:00,964 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:01,465 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:01,966 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:02,468 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:02,970 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:03,471 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:03,973 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:04,474 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:04,976 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:05,477 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:05,979 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:06,481 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:06,982 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:07,484 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:07,986 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:08,487 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:08,988 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. |
114:8847:alipay-jraft.log 部分日志: 2024-04-01 14:57:07,928 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-1995,5,main]. 2024-04-01 14:57:07,928 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2037,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2030,5,main]. 2024-04-01 14:57:07,928 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2031,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2030,5,main]. 2024-04-01 14:57:08,181 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2040,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2038,5,main]. 2024-04-01 14:57:08,429 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2041,5,main]. 2024-04-01 14:57:08,429 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2038,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2041,5,main]. 2024-04-01 14:57:08,429 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2041,5,main]. 2024-04-01 14:57:08,610 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2039,5,main]. 2024-04-01 14:57:08,683 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2037,5,main]. 2024-04-01 14:57:08,683 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2037,5,main]. 2024-04-01 14:57:08,683 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2037,5,main]. 2024-04-01 14:57:08,931 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2031,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2030,5,main]. 2024-04-01 14:57:08,931 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2039,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2030,5,main]. 2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main]. 2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main]. 2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-1995,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main]. 2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-1995,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main]. 2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-1995,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main]. 2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-1995,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main]. 2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main]. 2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main]. 2024-04-01 14:57:09,188 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2030,5,main]. 2024-04-01 14:57:09,686 WARN Fail to unlock with Replicator [state=Probe, statInfo=<running=IDLE, firstLogIndex=1, lastLogIncluded=0, lastLogIndex=1, lastTermIncluded=0>, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2031,5,main]. |
115:8847 alipay-jraft.log 部分日志: 2024-04-01 14:58:01,465 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:01,966 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:02,468 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:02,970 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:03,471 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:03,973 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:04,474 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:04,976 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:05,477 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:05,979 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:06,481 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:06,982 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:07,484 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:07,986 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:08,487 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:08,988 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:09,490 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:09,991 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:10,493 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:10,994 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:11,496 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:11,998 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:12,499 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. 2024-04-01 14:58:13,001 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1. |
这是nacos集群下某个nacos的alipay-jraft.log.2024-03-27.0 日志,同样时间点,其他两个nacos节点是正常的 2024-03-27 14:30:37,429 INFO Deleting snapshot /home/nacos3/data/protocol/raft/naming_instance_metadata/snapshot/snapshot_8812. 2024-03-27 14:30:37,438 INFO Renaming /home/nacos3/data/protocol/raft/naming_instance_metadata/snapshot/temp to /home/nacos3/data/protocol/raft/naming_instance_metadata/snapshot/snapshot_8812. 2024-03-27 14:30:37,438 INFO Deleting snapshot /home/nacos3/data/protocol/raft/naming_instance_metadata/snapshot/snapshot_8781. 2024-03-27 14:53:39,910 INFO Node <naming_persistent_service_v2/10.0.102.115:7847> term 1 start preVote. 2024-03-27 14:53:39,910 INFO onStopFollowing: LeaderChangeContext [leaderId=10.0.102.114:7847, term=1, status=Status[ERAFTTIMEDOUT<10001>: Lost connection from leader 10.0.102.114:7847.]]. 2024-03-27 14:53:40,261 WARN Channel in TRANSIENT_FAILURE state: 10.0.102.114:7849. 2024-03-27 14:53:40,261 WARN Channel in SHUTDOWN state: 10.0.102.114:7849. 2024-03-27 14:53:40,262 INFO Peer 10.0.102.114:7849 is connected. 2024-03-27 14:53:40,493 WARN Channel in TRANSIENT_FAILURE state: 10.0.102.114:7847. 2024-03-27 14:53:40,493 WARN Channel in SHUTDOWN state: 10.0.102.114:7847. 2024-03-27 14:53:40,493 INFO Peer 10.0.102.114:7847 is connected. 2024-03-27 14:53:40,503 INFO Node <naming_persistent_service_v2/10.0.102.115:7847> received PreVoteResponse from 10.0.102.114:7849, term=1, granted=false. 2024-03-27 14:53:40,506 INFO Node <naming_persistent_service_v2/10.0.102.115:7847> received PreVoteResponse from 10.0.102.114:7847, term=1, granted=false. 2024-03-27 14:53:40,533 WARN [GRPC] failed to send response. io.grpc.StatusRuntimeException: CANCELLED: call already cancelled io.grpc.StatusRuntimeException: CANCELLED: call already cancelled io.grpc.StatusRuntimeException: CANCELLED: call already cancelled io.grpc.StatusRuntimeException: CANCELLED: call already cancelled |
看来之前的close ,我的jraft版本也是1.3.8排除了bolt的包,跟 #952h #1029 #10259问题类似。 |
如果jraft的bug导致某个节点状态错误了,那应该只能升级一下版本试试。 |
是在nacos源码升级jraft的版本,还是直接升级nacos的版本? |
升级nacos版本, 单纯只升级jraft版本,可能会导致不兼容。 |
嗯嗯 |
No more response from author, It seems new version has solved this problem. |
还没 待验证 |
Describe the bug
A clear and concise description of what the bug is.
Under the nacos cluster, service a has two instances. 001 and 002001 are registered on three nodes of the nacos cluster normally (temporary instances),
The instance can customize and modify the metadata of the instance by calling the openapi of nacos (it stores the work order numbers of some businesses in certain states),
After a period of operation, more and more data are stored and frequently modified. The metadata of 001 instance on the three nodes of Nacos is inconsistent.
在nacos集群下,服务a有两个实例,001和002,001在nacos集群三个节点上正常注册(临时实例),
实例通过调用nacos的openapi自定义修改实例的元数据(存放了一些业务使用某些状态的工单号),
经过一段时间运转,存放的数据越来越多,修改频繁,出现了nacos三个节点上的001实例元数据不一致问题。
Expected behavior
A clear and concise description of what you expected to happen.
It is expected that the data of the service and the service itself will be consistent in the cluster environment.
预想是集群环境下服务和服务本身的数据是保持一致的。
Actually behavior
A clear and concise description of what you actually to happen.
官方文档没有说明集群间实例的元数据是可以保持或者不保持一致性的
How to Reproduce
Steps to reproduce the behavior:
Desktop (please complete the following information):
linux和win 客户端 nacos sdk 1.4 服务端2.2.0
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: