You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have lost data in a demo, as Nifi was complaining about to reaching ZooKeeper and the health-checks did not notice it.
Simply restarting the pod solved the problem, which would have done if the livenessProbe would have detected the problem.
While the numbers itself are arguable - (e.g why have a initialDelaySeconds when we have a startup probe?) and a readinessProbe is missing - the most important thing is, that a simple check on the port is not enough.
Possible solution
We should instead use https://nifi.apache.org/docs/nifi-docs/rest-api/ to check the actual node health. The most complicated part will be auth I fear (e.g. add a static user with an operator-created random secret and put it in the Authentication chain),
Additional context
No response
Environment
No response
Would you like to work on fixing this bug?
yes
The text was updated successfully, but these errors were encountered:
Affected version
0.0.0-dev
Current and expected behavior
We have lost data in a demo, as Nifi was complaining about to reaching ZooKeeper and the health-checks did not notice it.
Simply restarting the pod solved the problem, which would have done if the
livenessProbe
would have detected the problem.Currently the
livenessProbe
looks likeWhile the numbers itself are arguable - (e.g why have a
initialDelaySeconds
when we have a startup probe?) and areadinessProbe
is missing - the most important thing is, that a simple check on the port is not enough.Possible solution
We should instead use https://nifi.apache.org/docs/nifi-docs/rest-api/ to check the actual node health. The most complicated part will be auth I fear (e.g. add a static user with an operator-created random secret and put it in the Authentication chain),
Additional context
No response
Environment
No response
Would you like to work on fixing this bug?
yes
The text was updated successfully, but these errors were encountered: