IGNITE-13012 Make node connection checking rely on the configuration. Simplify node ping routine. #7835

Vladsz83 · 2020-05-22T10:08:23Z

This PR is first step of improvement and quickening of node failure detection. We should obtain simple, predictable and configurable node pinging.

Fixes:

Connection failure is kept within IgniteConfiguration.failureDetectionTimeout instead of 500ms + IgniteConfiguration.failureDetectionTimeout.
Interval of connection checking in TCP discovery made rely on configured failure detection timeout. Previous 500ms is the minimal interval at now. This is done to get robust node pinging and keep failure detection timeout accurate.
Removed additional connection checking. This premature node ping relied also on any received message. Imagine: if node 2 receives no message from previous node 1 within some time, it decides to do extra ping next node 3 not waiting for regular ping. This brought mess, confusion and gave no considerable guaranties.

Behavior changes:

TcpDiscoveryConnectionCheckMessage is not sent if there is a message traffic within actual failure detection timeout because any message checks connection.
Failure detection timeout is now overal timeout since last message sent. Not a timeout on current message exchange.

rkondakov · 2020-05-22T10:37:12Z

Please check the correctness of the jira issue number in the PR heading. It looks like it is incorrect. IGNITE-13021 is about the new SQL engine, not about the connectivity.

Vladsz83 · 2020-05-22T11:00:48Z

Please check the correctness of the jira issue number in the PR heading. It looks like it is incorrect. IGNITE-13021 is about the new SQL engine, not about the connectivity.

Fixed on IGNITE-13012. Thanks!

# Conflicts: # modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/ServerImpl.java

modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/ServerImpl.java

.../core/src/test/java/org/apache/ignite/internal/GridFailFastNodeFailureDetectionSelfTest.java

modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/ServerImpl.java

modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/TcpDiscoveryImpl.java

modules/core/src/test/java/org/apache/ignite/spi/discovery/tcp/ConnectionCheckTest.java

modules/core/src/test/java/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySelfTest.java

previous message.

Vladsz83 · 2020-06-10T13:36:27Z

@sergey-chugunov-1985 , I find you have good experience in TcpDiscoverySpi. Could you take a look at this ticket too?

sergey-chugunov-1985 · 2020-06-23T12:27:56Z

modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/ServerImpl.java

                hasRemoteSrvNodes = ring.hasRemoteServerNodes();

-            if (hasRemoteSrvNodes) {
+            if (hasRemoteSrvNodes)


Why not to call updateLastSentMessageTime method here as well?

Why not to call updateLastSentMessageTime method here as well?

We hasn't successfully sent message here, we hasn't received RES_OK.

As you can see, we call updateLastSentMessageTime() after successful reading spi.readReceipt or proper TcpDiscoveryHandshakeResponse. These are the places where we are sure the message was sent and connection is OK.

… Simplify node ping routine. (apache#7835)

IGNITE-13021 : First impl.

3b66856

Vladsz83 changed the title ~~IGNITE-13021 Make node connection checking rely on the configuration. Simplify node ping routine.~~ IGNITE-13012 Make node connection checking rely on the configuration. Simplify node ping routine. May 22, 2020

Vladsz83 added 11 commits May 22, 2020 17:56

IGNITE-13012 : merged with master. Minor fixes.

675c069

IGNITE-13012 : merged with master. Minor fixes.

e4ddf05

IGNITE-13012 : halt timeouts on the ping.

bf93ac1

Merge remote-tracking branch 'origin/IGNITE-13012' into IGNITE-13012

e729270

# Conflicts: # modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/ServerImpl.java

IGNITE-13012 : +test.

47a9f7d

IGNITE-13012 : redeem of the timeouts. Fixed test.

2c929fa

IGNITE-13012 : redeem of the timeouts. Fixed test.

245943a

Merge branch 'master' into IGNITE-13012

f7d58ae

IGNITE-13012 : Fixed tests. + a test.

8f4dabf

IGNITE-13012 : fix of coordinator failure test.

62f5d6a

IGNITE-13012 : test fix

dc23756

anton-vinogradov requested changes Jun 1, 2020

View reviewed changes

Vladsz83 added 13 commits June 2, 2020 12:46

IGNITE-13012 : Reverted tests. Failure detection timeout is shared with

7089343

previous message.

IGNITE-13012 : fix.

3515f40

IGNITE-13012 : + test.

9dca4f1

IGNITE-13012 : + test fix.

bd00c20

IGNITE-13012 : +10ms as the timer granulation.

c464725

IGNITE-13012 : test fixes.

5370831

IGNITE-13012 : + 10ms as acceptable code delay.

a9ad35e

IGNITE-13012 : test redeemed.

a8fad43

IGNITE-13016 : faster test.

45c426f

Revert "IGNITE-13134 : test duration fix.

0d58fe4

IGNITE-13012 : faster test.

7f7a608

IGNITE-13012 : minority.

dde7e7c

Merge branch 'master' into IGNITE-13012

7b40043

Vladsz83 added 4 commits June 15, 2020 16:51

IGNITE-13012 : spelling fix.

e1b9735

IGNITE-13012 : empty lines.

1b07dd5

IGNITE-13012 :renaming.

a4be000

IGNITE-13012 :renamings. Removes test.

d9c3108

anton-vinogradov approved these changes Jun 15, 2020

View reviewed changes

reverted removal of 'public'

71435c2

sergey-chugunov-1985 reviewed Jun 23, 2020

View reviewed changes

Vladsz83 added 2 commits June 23, 2020 16:12

IGNITE-13012 : removed redundant hasRemoteSrvNodes

322242a

Merge branch 'master' into IGNITE-13012

be6e2ed

anton-vinogradov merged commit 87aad40 into apache:master Jun 24, 2020

Vladsz83 deleted the IGNITE-13012 branch June 24, 2020 10:22

kartiksomani pushed a commit to kartiksomani/ignite that referenced this pull request Sep 15, 2020

IGNITE-13012 Make node connection checking rely on the configuration.…

5ad02b1

… Simplify node ping routine. (apache#7835)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IGNITE-13012 Make node connection checking rely on the configuration. Simplify node ping routine. #7835

IGNITE-13012 Make node connection checking rely on the configuration. Simplify node ping routine. #7835

Uh oh!

Vladsz83 commented May 22, 2020 •

edited

Loading

Uh oh!

rkondakov commented May 22, 2020

Uh oh!

Vladsz83 commented May 22, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vladsz83 commented Jun 10, 2020 •

edited

Loading

Uh oh!

sergey-chugunov-1985 Jun 23, 2020

Uh oh!

Vladsz83 Jun 23, 2020 •

edited

Loading

Uh oh!

Vladsz83 Jun 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

IGNITE-13012 Make node connection checking rely on the configuration. Simplify node ping routine. #7835

IGNITE-13012 Make node connection checking rely on the configuration. Simplify node ping routine. #7835

Uh oh!

Conversation

Vladsz83 commented May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkondakov commented May 22, 2020

Uh oh!

Vladsz83 commented May 22, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vladsz83 commented Jun 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergey-chugunov-1985 Jun 23, 2020

Choose a reason for hiding this comment

Uh oh!

Vladsz83 Jun 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vladsz83 Jun 23, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Vladsz83 commented May 22, 2020 •

edited

Loading

Vladsz83 commented Jun 10, 2020 •

edited

Loading

Vladsz83 Jun 23, 2020 •

edited

Loading