Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE Try to get logs for flakes #12519

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

dcorbacho
Copy link
Contributor

No description provided.

@dcorbacho dcorbacho force-pushed the trigger-ci-flake branch 7 times, most recently from 583d13c to 9c4dce9 Compare October 29, 2024 10:09
@dcorbacho dcorbacho closed this Oct 29, 2024
@dcorbacho dcorbacho reopened this Oct 29, 2024
@dcorbacho dcorbacho force-pushed the trigger-ci-flake branch 5 times, most recently from d8c4417 to 8905177 Compare November 6, 2024 18:11
@dcorbacho dcorbacho force-pushed the trigger-ci-flake branch 8 times, most recently from 8f228d3 to 58f3425 Compare November 18, 2024 07:56
@dcorbacho dcorbacho force-pushed the trigger-ci-flake branch 7 times, most recently from 73f9569 to 4b43ca7 Compare November 25, 2024 14:56
@dcorbacho dcorbacho force-pushed the trigger-ci-flake branch 2 times, most recently from c0d7ecf to b96742b Compare November 25, 2024 15:17
dumbbell and others added 7 commits November 25, 2024 16:19
[Why]
This was the first solution put in place to prevent that the temporary
hidden node connects to the node that started it to write any printed
messages. Because of this, the nodes that the temporary hidden node
queried found out about the parent node and they opened an Erlang
distribution connection to it. This polluted the known nodes list.

However later, the temporary hidden node was started with the
`standard_io` connection option. This prevented the temporary hidden
node from knowing about the node that started it, solving the problem in
a cleaner way.

[How]
This commit garbage-collects that piece of code that is now useless. It
makes the query code way simpler to understand.
[Why]
This impacts what is reported by the catch because it caught exceptions
emitted by code supposedly called later. An example is the assert
in `query_node_props2/3` last clause.
[Why]
In CI, we observe some timeouts in the Erlang distribution connections
between the temporary hidden node and the nodes it queries. This affects
peer discovery obviously.

[How]
We introduce some query retries to reduce the risk of an incomplete
query.

While here, we move the sorting of queried nodes from the
`query_node_props2/3` last clause (executed in the temporary hidden
node) to the function setting the temporary hidden node and asking for
these queries. This way the debug messages from that sorting are logged
by RabbitMQ out of the box.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants