-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation: changing Eureka renewal frequency *WILL* break the self-preservation feature of the server #373
Comments
It's fine for demos and development work (where you often want a faster turnaround and don't care about self preservation mode). If you'd like to explain what's happening it would be great to see a pull request for the documentation (it's right there in github next to the source code). |
I understand, but before asking for changes in the documentation I think my analysis needs to be reviewed by someone with better understanding of Eureka's internals and make sure the behaviour/consequences I describe are correct. The point is that the Eureka server makes implicit assumption that clients are sending their heartbeat at a fixed rate of 1 every 30 seconds. If two instances are registered in the registry, the server expects to receive Now, if clients are sending their heartbeats twice faster (every 15s) - the server receives 8 heartbeats per minutes and keeps receiving 4/min if you loose one of the two instances. Hence the self protection mode is not activated... This examples shows the consequences of using a heartbeat frequency other than 30s: it breaks the self protection mode mechanism. The initial registration is actually trigger by the first heartbeat: the client tries to send the first heartbeat and receives a "not found" answer from the server which means it doesn't know the instance. The client then immediately attempts to register the instance. This process only happens 30s ( You can always speed-up the initial registration process by lowering the value of During demo/development, if you want to detect "dead" instances quicker, I would suggest to play with the Hope all of this makes sense. |
Totally makes sense, but I'm not sure there is anyone with a better understanding of Eureka internals at this point (at least not a regular visitor to this project). Even the people I know at Netflix probably won't want to go into any more detail than that (most people just use it after all). We need some of your analysis in the documentation really, plus some sensible guidelines, and defaults that don't stop people from making progress quickly when they are getting started. |
Does that mean you feel as lonely as me? |
Here is a first attempt to answer the question "Why does it take so long to register an instance with Eureka?" (1) Client RegistrationWhen using the default configuration, registration happens at the first heartbeat sent to the server. Since the client just started, the server doesn't know anything about it and replies with a 404 forcing the client to register. The client then immediately issues a second call with all the registration information. The client is now registered. The first heartbeat happens 30 seconds after startup ( (2) Server ResponseCacheThe server maintains a response cache that is updated every 30s by default ( However, your instance may appear in the UI web interface just after registration. This is because the web front-end bypasses the response cache used by the REST API... If you know the instanceId, you can still get some details from Eureka about it by calling So, it may take up to another 30s for other clients to discover your newly registered instance. (3) Client cache refreshEureka client maintain a cache of the registry information. This cache is refreshed every 30 seconds by default ('eureka.client.registryFetchIntervalSeconds`). So again, it may take another 30s before a client decides to refresh its local cache and discover newly registered instances. (4) LoadBalancer refreshThe load balancer used by Ribbon gets its information from the local Eureka client. It also maintains a local cache to avoid calling the discovery client for every request. This cache is refreshed every 30s ( Note: this local cache is apparently required only to reduce the cost of obtaining server information from the used At the end, if you are not lucky, it may take up to 2 minutes before your newly registered instance starts receiving trafic from other clients. |
I'm trying to figure out how to get the delay between server start and registration in Zuul as low as possible in development only. With the defaults, we are stuck twiddling our thumbs for like 2 min every time we restart a service and then want to hit it through our zuul proxy. The information from @brenuart has helped a little, but I still can't seem to get it right. I would love some help on getting a configuration for spring.profiles: dev that has the whole registration process down to the seconds. This is what I've tried so far:
Service Config:
Right now, Zuul isn't fully noticing that the service is down (I just get a blank page and not a forwarding error like usual). Maybe once we figure this out, it could be added to the documentation that @brenuart wrote? If I should post this over on SO instead, let me know. Also, @brenuart, I think you documentation is much better than what's there, but it would be great to add all the options (like the |
Not much time for the moment but I'll try to extend coverage of "my" doc. Maybe I should coordinate with @spencergibb to incorporate those few lines into the official doc and have them to be reviewed by them. |
Another stack overflow question. http://stackoverflow.com/questions/33921557/understanding-spring-cloud-eureka-server-parameters-and-configuration I'm going to take a shot next week. |
Related docs #203 |
Just did a testing on local machine, the extreme configuration as the following: Eureka server: server: eureka: Two service end: server: spring: eureka: server: spring: eureka: The result I observed the if one of services went down, the other service got the instance list from Eureka server was quickly, no more than 5 seconds in my environment. I think it's suitable for dev/test purpose. But I wonder if there's simple way to reset Ribbon serverListRefreshInterval value in a Spring Boot application? @brenuart |
self-preservation is a mechanism by which the Eureka registry stops expiring entries when it detects that an "important" amount of services didn't renew their lease in time. This should protect the registry from clearing all entries when a (partial) network failure occurs. |
I've created a blog post with the details of Eureka here, that fills in some missing detail from Spring doc or Netflix blog. It is the result of several days of debugging and digging through source code. |
@asarkar would you mind if we (or you via PR) integrate your information into our documentation? |
@spencergibb I can PR it but there are some areas that I'd like to be elaborated, especially regions and zones. My post also has links to 2 open tickets that I believe should be answered/closed first because my post references to those. |
@brenuart Your configuration is not correct, right one is |
I have similar issue, I have posted question here, could you please help me - https://stackoverflow.com/questions/70648380/what-would-be-the-best-self-preservation-time-parameter-configuration-for-eureka |
The application can call Eureka's API to force the refresh state at startup or shutdown |
In section "Why is it slow to register a Service?" of the documentation, it is said one can speed-up the client registration process by changing the heartbeat interval to a higher frequency (default is 30seconds).
The documentation also says it might not be a good idea in production without giving much explanations on the consequences.
One should be aware the self-preservation feature of the Eureka server makes the assumption clients are sending their heartbeat every 30 seconds - and this is not configurable. Using a different value will therefore break that functionality. It is definitely not a good idea to play with that parameter...
The text was updated successfully, but these errors were encountered: