-
Notifications
You must be signed in to change notification settings - Fork 308
Fix system tests #872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix system tests #872
Conversation
I noticed that it randomly failed establishing connection. Raising it from 10 to 30 seconds seems to have fixed it, at least, when running in GKE.
and wait until stream port is ready only when the feature flag stream_queue is enabled
to 1minute. We still have 3 more minutes before the annotation is removed.
to get RabbitMQCluster resource. Also provided a description of asserted generation values to help troubleshoot when it occurs.
TODO: Provide an assertion message that gives us information about the PVC such as events or conditions. So that we can know why it has not expanded.
Even though RabbitMQ diagnostics tool says that the stream port is ready to accept connections, the fact to the matter is that the stream client cannot connect on the first attempt. The stream client connects via the k8s nodePort not directly to the stream port.
to send a message via stream protocol
ChunyiLyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general! Just one comment about the stream test:
One issue I see with the change to stream test is that the test now checks for the feature flag which requires a running RMQ. If we are using RMQ without the feature, the test will deploy a dedicated cluster just to find out that stream is not supposed and then skipping the test and tearing down the RMQ. Personally I would prefer verifying the image before deploying the RabbitMQ to save unnecessary cluster creation. What do you think?
|
There seems to be a legitimate |
|
@ChunyiLyu i see what you mean .. but I thought that given that the The reason why I changed it was because i thought it was more reliable and simpler to check straightaway if the feature was enabled rather than checking for versions. |
|
@ChunyiLyu I have moved the test case for the stream protocol back to where it was originally, i.e. together with the stomp and mqtt test cases. This way, as you pointed out, we do not deploy a RabbitMQ cluster unnecessarily if the stream plugin/feature is not enabled. |
ansd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the readiness probe checks whether the AMQP port is open. As soon as the AMQP port is open, the Pod will start accepting traffic. However, we observe in the system tests that other ports take longer to open causing the system tests to fail.
When looking at the logs:
2021-10-15 13:39:26.629506+02:00 [info] <0.44.0> Application rabbitmq_management started on 'rabbit@host'
2021-10-15 13:39:26.630140+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:26.931683+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:27.233728+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:27.535770+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:27.837307+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:28.138570+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:28.440774+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:28.742584+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:29.044627+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:29.346577+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:29.648615+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:29.950628+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:30.252812+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:30.554050+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:30.855862+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:31.157550+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:31.459744+02:00 [info] <0.891.0> MQTT: will wait for 300 more ms for cluster members to join before triggering a Raft leader election
2021-10-15 13:39:31.781635+02:00 [debug] <0.897.0> mqtt_node: ra_log:init recovered last_index_term {0,0} first index 0
2021-10-15 13:39:31.823148+02:00 [debug] <0.897.0> mqtt_node: recover -> recover in term: 0 machine version: 1
2021-10-15 13:39:31.823761+02:00 [debug] <0.897.0> mqtt_node: recovering state machine version 0:1 from index 0 to 0
we see why the MQTT plugin takes longer to initialise (compared to AQMP port being opened). It needs to trigger Raft leader election which takes 5 seconds in the output above.
I'd favour to solve the root cause of this issue instead of implementing a workaround in the system test because users will observe the same issue.
One possible solution would be to define an MQTT readiness check if MQTT plugin is enabled (instead of AMQP readiness check) because we know that the MQTT port takes longer to open than the AMQP port.
WDYT?
This fixes #863 (WIP)
Summary Of Changes
The issues that i identified were mainly these ones:
status.conditions.rabbitmq.com/pluginsUpdatedAtwere very time sensitive. I managed to reproduce it a few times so it seemed that 30 seconds timeout was too tight. We could ramp it up to 1 minute without any issues. The next timeout would be 4minutes which is when we check that the annotation should no longer exist.