Fix: TCP setevent no longer throws bogus error message#2986
Fix: TCP setevent no longer throws bogus error message#2986mwinkel-dev merged 6 commits intoMDSplus:alphafrom
Conversation
|
This WIP must be carefully scrutinized to ensure it is not a breaking change for the "events" feature of MDSplus. When reviewing this proposed fix, it will likely be useful to study the individual commits. Commit 9874518: This is the minimum change needed to fix the problem reported by PPPL. The issue is that because PR #2288 redefined Commit 23e2581: Updates the Commit 9366af2: The Commits 6721c2e and 1079327: For clarity, the separate Commit 8d7e819: Significant rewrite of the |
|
To manually test these changes, the following configuration was used.
|
| free(ansarg.ptr); | ||
| } | ||
|
|
||
| if (tries >= MAX_TRIES) { |
There was a problem hiding this comment.
Question: When a bad server is encountered, should the function continue sending the event to the remaining servers (as per this WIP)? Or should the function instead immediately terminate?
| else | ||
| { | ||
| int len = (int)((argc > 2) ? strlen(argv[2]) : 0); | ||
| status = MDSEvent(argv[1], len, argv[2]); |
There was a problem hiding this comment.
Note that the old code was returning an MDSplus status code (32-bits, 3 fields) when the function terminates. This is problematic because MDSplusSUCCESS = 65545, which when treated as a C exit code (a byte value) is a failure (i.e., echo $? displays 9 which is a non-zero value, thus denotes a C failure).
| default: | ||
| printhelp(argv[0]); | ||
| return 1; | ||
| return C_ERROR; |
There was a problem hiding this comment.
Note that this is a change in behavior. The define is C_ERROR = -1, which echo $? displays as 255.
The conjecture is that very few users have shell scripts that check the status code of wfevent. Thus, this change is likely very low risk.
Alternatively, we could add a new define, C_FAILURE = 1, and use it instead of C_ERROR.
| { | ||
| receive_thread_ids[idx] = searchOpenServer(receive_servers[idx]); | ||
| if(receive_thread_ids[idx] < 0) | ||
| if(receive_thread_ids[idx] <= INVALID_CONNECTION_ID) |
There was a problem hiding this comment.
Note that around line 413 below, an error condition is detected whereupon a socket is set to zero (i.e., stdin) which seems a bit odd. The code works "as is", so it is probably OK to keep it.
|
There appears to be yet another issue with TCP events, thus this WIP definitely needs more work. To test TCP events, it is advisable to use three virtual machines, call them A, B and C. A sets The test consists of the following:
The above test revealed that C would only receive the Josh has confirmed that TCP events should work without using UDP on any of the VMs. (Similar behavior was seen months ago, but it was incorrectly assumed that TCP events were designed to convert to UDP at the target system.) Thus, additional investigation is required. |
|
Experiments prove that TCP events are indeed being converted to UDP events at the "target" system. This occurs with both TCL's Here are the details . . . Both TCL and the |
|
A tentative conclusion from additional experiments is that UDP is the backbone of TCP events. Assume four servers: A, B, C, and D. B and C are on the same network segment. D executes a It is difficult to test the above configuration when all the four systems are VMs are on the same virtual network. However, based on experiments and perusing the source code, it seems likely that UDP is the backbone with TCP being used for the first and last network hop in the chain. More investigation is required. |
|
An additional experiment using debug print statements proves the preceding conjecture that TCP events are converted to / from UDP events. |
WhoBrokeTheBuild
left a comment
There was a problem hiding this comment.
These all look like good changes, and I think the risks are minimal. LGTM
Fixes issue #2977 .