Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceDiscoveryRegistryDirectory #14254

Closed
3 of 4 tasks
yipixiaofeiyang opened this issue May 29, 2024 · 22 comments
Closed
3 of 4 tasks

ServiceDiscoveryRegistryDirectory #14254

yipixiaofeiyang opened this issue May 29, 2024 · 22 comments
Labels
component/need-triage Need maintainers to triage type/need-triage Need maintainers to triage

Comments

@yipixiaofeiyang
Copy link

Pre-check

  • I am sure that all the content I provide is in English.

Search before asking

  • I had searched in the issues and found no similar issues.

Apache Dubbo Component

Java SDK (apache/dubbo)

Dubbo Version

Dubbo java 3.1.5-> 3.1.6
Jdk 1.7
Centos8
spring boot 2.7.3

Steps to reproduce this issue

This issue only applies to consumers of spring boot+dubbo, and there is no problem with the provider.
The consumer service consists of two machines!

Operation steps:
The consumer service starts normally first, and then restarts the service using the kill pid (note not -9) . During the startup process, the service keeps requesting the Spring interface,

Result:

[dubbo java-3.1.5]
The service can be processed normally

[dubbo java-3.1.6]
the service will appear:
org.apache.dubbo.rpc.RpcException: Directory of type ServiceDiscoveryRegistryDirectory
The detailed error message is as follows

Upgrading to the dubbo java latest version[3.2.13] doesn't work either!

What you expected to happen

org.apache.dubbo.rpc.RpcException: Directory of type ServiceDiscoveryRegistryDirectory already destroyed for service com.myb.saas.data.service.CoalpitService from registry nacos://192.168.100.70:8848/org.apache.dubbo.registry.RegistryService?application=admin&backup=192.168.100.71:8848,192.168.100.72:8848&dubbo=2.0.2&group=dubbo&logger=slf4j&metadata-type=remote&namespace=53a4c18c-0804-4734-8de2-57aa99f27833&pid=2133071&qos.accept.foreign.ip=false&qos.enable=false&register-mode=instance&release=3.1.6&serialize.check.status=WARN&timestamp=1716956513980
at org.apache.dubbo.rpc.cluster.directory.AbstractDirectory.list(AbstractDirectory.java:184)
at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.list(AbstractClusterInvoker.java:408)
at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.invoke(AbstractClusterInvoker.java:333)
at com.alibaba.csp.sentinel.adapter.dubbo3.SentinelDubboConsumerFilter.syncInvoke(SentinelDubboConsumerFilter.java:82)
at com.alibaba.csp.sentinel.adapter.dubbo3.SentinelDubboConsumerFilter.invoke(SentinelDubboConsumerFilter.java:66)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
at org.apache.dubbo.rpc.cluster.router.RouterSnapshotFilter.invoke(RouterSnapshotFilter.java:46)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
at org.apache.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:100)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
at org.apache.dubbo.rpc.protocol.dubbo.filter.FutureFilter.invoke(FutureFilter.java:52)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
at com.alibaba.csp.sentinel.adapter.dubbo3.DubboAppContextFilter.invoke(DubboAppContextFilter.java:47)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
at org.apache.dubbo.rpc.cluster.filter.support.ConsumerClassLoaderFilter.invoke(ConsumerClassLoaderFilter.java:40)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
at org.apache.dubbo.rpc.cluster.filter.support.ConsumerContextFilter.invoke(ConsumerContextFilter.java:120)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CallbackRegistrationInvoker.invoke(FilterChainBuilder.java:194)
at org.apache.dubbo.rpc.cluster.support.wrapper.AbstractCluster$ClusterFilterInvoker.invoke(AbstractCluster.java:92)
at org.apache.dubbo.rpc.cluster.support.wrapper.MockClusterInvoker.invoke(MockClusterInvoker.java:103)
at org.apache.dubbo.registry.client.migration.MigrationInvoker.invoke(MigrationInvoker.java:284)
at org.apache.dubbo.rpc.proxy.InvocationUtil.invoke(InvocationUtil.java:56)
at org.apache.dubbo.rpc.proxy.InvokerInvocationHandler.invoke(InvokerInvocationHandler.java:75)

Anything else

No response

Are you willing to submit a pull request to fix on your own?

  • Yes I am willing to submit a pull request on my own!

Code of Conduct

@yipixiaofeiyang yipixiaofeiyang added component/need-triage Need maintainers to triage type/need-triage Need maintainers to triage labels May 29, 2024
@yipixiaofeiyang
Copy link
Author

I don't know if I should consider this to be a bug, but it did happen in the upgraded version and the user experience was not very user-friendly

@wcy666103
Copy link
Contributor

I don't understand what this is, kill the process will call destroy logic, set a set of identifiers, and then when the business method is called again, the exception will appear

image

I didn't think of any practical significance in this scenario?

@yipixiaofeiyang
Copy link
Author

I don't understand what this is, kill the process will call destroy logic, set a set of identifiers, and then when the business method is called again, the exception will appear

image

I didn't think of any practical significance in this scenario?

Kill the running consumer service first, then restart the service. Continuously accessing the Spring interface during this process can cause this problem. Some people online claim that it was caused by the shutdown of the Dubbo service before the Spring service. After comparing the changes in 3.1.5 and 3.1.6, we did not find the reason. Could you please help analyze the question

@wcy666103
Copy link
Contributor

What do you mean "continuously accessing the Spring interface during this process" ? I'm trying to reproduce it, can you provide some demo?

@yipixiaofeiyang
Copy link
Author

What do you mean "continuously accessing the Spring interface during this process" ? I'm trying to reproduce it, can you provide some demo?

还是飙中文吧! I'm chineses!
就是dubbo消费者服务在重启过程中, 通过postman的方式, 不停的请求spring对外接口
1716967827338
类似于上面图片所示

@wcy666103
Copy link
Contributor

这。。。好邪门啊,按理说jvm实例停止了再重新启动里边的flag应该都是初试化的状态呀,并且你这个时候不断请求springweb接口有什么意义,他们不是一个jvm吗,不都停了么?我没有复现出来呢还

@yipixiaofeiyang
Copy link
Author

What do you mean "continuously accessing the Spring interface during this process" ? I'm trying to reproduce it, can you provide some demo?

还是飙中文吧! I'm chineses! 就是dubbo消费者服务在重启过程中, 通过postman的方式, 不停的请求spring对外接口 1716967827338 类似于上面图片所示

采用的是spring boot + dubbo的方式, 双服务器做负载

@yipixiaofeiyang
Copy link
Author

这。。。好邪门啊,按理说jvm实例停止了再重新启动里边的flag应该都是初试化的状态呀,并且你这个时候不断请求springweb接口有什么意义,他们不是一个jvm吗,不都停了么?我没有复现出来呢还

确实邪门, spring服务 kill后, 理论是不会接受外部请求, 应该会负载到另外一台服务器, 所以假设有一部分请求在spring服务shutdown的一瞬间进来了, 如果dubbo服务还在, 确实不应该报错, 怕就怕在dubbo服务在spring服务shutdown之前就destory了, 看网上有这个说法...但3.1.5是正常, 也就是在代码层面确实应该是可以解决的, 头疼

@xixingya
Copy link
Contributor

this may solve your problem, https://cn.dubbo.apache.org/zh-cn/overview/mannual/java-sdk/advanced-features-and-usage/others/graceful-shutdown/

server:
  shutdown: graceful
dubbo:
  application:
    name: dubbo-springboot-demo-provider
    shutwait: 30000

@yipixiaofeiyang
Copy link
Author

this may solve your problem, https://cn.dubbo.apache.org/zh-cn/overview/mannual/java-sdk/advanced-features-and-usage/others/graceful-shutdown/

server:
  shutdown: graceful
dubbo:
  application:
    name: dubbo-springboot-demo-provider
    shutwait: 30000

这个很久前试了, 只要dubbo版本>=3.1.6 就会重现
org.apache.dubbo.rpc.RpcException: Directory of type ServiceDiscoveryRegistryDirectory already destroyed for service xxx
还是忍不住又试了下
1716975637336

@xixingya
Copy link
Contributor

@yipixiaofeiyang can you find the keyword in log file
keyword:

Run shutdown hook now.

if not, make sure you add the server.shutdown=graceful

@xixingya
Copy link
Contributor

After comparing the changes in 3.1.5 and 3.1.6, I can not find issue. can you provide a demo to reproduce it.

@yipixiaofeiyang
Copy link
Author

@yipixiaofeiyang can you find the keyword in log file keyword:

Run shutdown hook now.

if not, make sure you add the server.shutdown=graceful

这个是有的, 没啥问题

@yipixiaofeiyang
Copy link
Author

After comparing the changes in 3.1.5 and 3.1.6, I can not find issue. can you provide a demo to reproduce it.

demo很简单, 一个提供者, 一个消费者, 消费者需要用2台服务器+nginx做负载均衡, 注册中心采用nacos, 部署上稍微麻烦了些

@xixingya
Copy link
Contributor

After comparing the changes in 3.1.5 and 3.1.6, I can not find issue. can you provide a demo to reproduce it.

demo很简单, 一个提供者, 一个消费者, 消费者需要用2台服务器+nginx做负载均衡, 注册中心采用nacos, 部署上稍微麻烦了些

can you upload your demo code on github?

@yipixiaofeiyang
Copy link
Author

After comparing the changes in 3.1.5 and 3.1.6, I can not find issue. can you provide a demo to reproduce it.

demo很简单, 一个提供者, 一个消费者, 消费者需要用2台服务器+nginx做负载均衡, 注册中心采用nacos, 部署上稍微麻烦了些

can you upload your demo code on github?

可以的, 我直接用附件的方式吧
provider-parent.zip

@yipixiaofeiyang
Copy link
Author

After comparing the changes in 3.1.5 and 3.1.6, I can not find issue. can you provide a demo to reproduce it.

demo很简单, 一个提供者, 一个消费者, 消费者需要用2台服务器+nginx做负载均衡, 注册中心采用nacos, 部署上稍微麻烦了些

can you upload your demo code on github?

可以的, 我直接用附件的方式吧 provider-parent.zip

服务器上的启动脚本如下, 其实和spring的server.shutdown = graceful效果差不多, 我们生产环境环境一直是这样用的, 在dubbo3.1.6之前是完全没问题的, 我就懒得改成graceful方式了
`
pid=$(ps -ef|grep java| grep ${serverName} |awk '{print $2}');
echo $pid;

waitcount=0
while [ "$pid" != "" ]; do
if [ "$waitcount" == "0" ]; then
kill $pid
fi
sleep 1
let "waitcount++"
if [ "$waitcount" == "30" ]; then
kill -9 $pid
fi
pid=$(ps -ef|grep java| grep $serverName |awk '{print $2}');
done

nohup /usr/jdk1.8.0_201/bin/java -jar -Xms700m -Xmx700m *.jar >log.txt &
`

@xixingya
Copy link
Contributor

after try, graceful shutdown works well at 3.1.5 and works well at 3.1.6, but not works at 3.2.11 and 3.2.13 @yipixiaofeiyang

@yipixiaofeiyang
Copy link
Author

after try, graceful shutdown works well at 3.1.5 and works well at 3.1.6, but not works at 3.2.11 and 3.2.13 @yipixiaofeiyang

Okay, then I'll wait for the new version and try again! By adding RpcException capture, it indicates that the service is being upgraded. Please try again later I think the client can understand!

@github-project-automation github-project-automation bot moved this from Todo to Done in Dubbo Board May 31, 2024
@985177520
Copy link

@xixingya 大佬这个修复了没

@yipixiaofeiyang
Copy link
Author

@xixingya 大佬这个修复了没

试了下最新的, 还是存在这个问题

@laywin
Copy link
Contributor

laywin commented Nov 22, 2024

对比了一下,实际各个版本都存在都有这个问题,dubbo 很难保证 consumer端的优雅停机,因为 shutdown 和 调用是并发的.

consumer 端shutdown 过程中会给一个标志位来表示invoker已经被回收了,这时候如果存在请求调用就会报上面那个错. 而 provider 端的shutdown 逻辑就不一样,会通过发送readonly 以及 unregistry 来停止接受新的请求,通过超时等待尽量让存在的请求处理完.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/need-triage Need maintainers to triage type/need-triage Need maintainers to triage
Projects
Archived in project
Development

No branches or pull requests

5 participants