Fix: TCP setevent no longer throws bogus error message by mwinkel-dev · Pull Request #2986 · MDSplus/mdsplus

mwinkel-dev · 2025-11-18T01:45:32Z

Fixes issue #2977 .

mwinkel-dev · 2025-11-18T02:10:32Z

This WIP must be carefully scrutinized to ensure it is not a breaking change for the "events" feature of MDSplus. When reviewing this proposed fix, it will likely be useful to study the individual commits.

Commit 9874518: This is the minimum change needed to fix the problem reported by PPPL. The issue is that because PR #2288 redefined INVALID_CONNECTION_ID = -1 instead of zero, the TCP setevent code was not correctly handling a connection ID of zero, thus it reconnected as connection ID = 1, which did send the event, but also generated a bogus error message. The fix in this first commit was sufficient to prove that the root cause had been identified. However, it is a partial fix because it only works if the mds_event_target environment variable contains just a single TCP event server.

Commit 23e2581: Updates the MdsEvents.c file to use the INVALID_CONNECTION_ID define throughout, thereby eliminating the confusion caused by the old code (which incorrectly assumes zero is the invalid ID).

Commit 9366af2: The sendRemoteEvent() function returns an MDSplus status code (32-bit integer with three fields) up the call stack, which is what the TCL setevent feature needs. However, the separate setevent utility runs at the operating system prompt and thus should use C exit codes when terminating. For clarity, this is done with the C_OK and C_ERROR defines.

Commits 6721c2e and 1079327: For clarity, the separate wfevent utility was also changed to use C_OK and C_ERROR.

Commit 8d7e819: Significant rewrite of the sendRemoteEvent() function to correctly process a list of servers specified by the mds_event_target environment variable. The existing code was only retrying a connection on the first failed server that was encountered. This rewrite assumes that an effort should be made to write to every server in the list even if some servers cannot be reached. An alternate design would be to terminate as soon as the first failing server is encountered.

mwinkel-dev · 2025-11-18T02:24:58Z

To manually test these changes, the following configuration was used.

Two VMs, A and B, were used. (The operating system doesn't matter.)
Server A had four terminal windows open, with an mdsip process running in each window. (Each mdsip process used a different port.). When server A was a Rocky Linux VM, the software firewall, firewalld, was configured to allow TCP traffic on the various mdsip ports. The firewall was also configured for UDP traffic on port 4000.
Server B ran a developer build with this WIP. To force TCP events to be used, this statement was executed: export mds_event_target="<server A>:<port 1>;<A>:<port 2>;<A>:<port 3>;<A>:<port 4>". Note that the double quotes and semicolon are important.
On server B, two terminal windows were opened. One was used to run the standalone wfevent utility to confirm that TCP events were indeed being sent. The other terminal windows was used to run 1) TCL's setevent command and 2) the separate setevent utility.
After running a test, the messages written by the mdsip processes on server A were also reviewed.
To test a failed connection, B's terminal windows did wfevent, then on A one of the mdsip processes was terminated, and then on B the TCL setevent or separate setevent was issued.

mwinkel-dev · 2025-11-18T02:27:52Z

mdsshr/MdsEvents.c

-      free(ansarg.ptr);
    }
+
+    if (tries >= MAX_TRIES) {


Question: When a bad server is encountered, should the function continue sending the event to the remaining servers (as per this WIP)? Or should the function instead immediately terminate?

mwinkel-dev · 2025-11-18T02:39:50Z

setevent/setevent.c

  else
  {
    int len = (int)((argc > 2) ? strlen(argv[2]) : 0);
    status = MDSEvent(argv[1], len, argv[2]);


Note that the old code was returning an MDSplus status code (32-bits, 3 fields) when the function terminates. This is problematic because MDSplusSUCCESS = 65545, which when treated as a C exit code (a byte value) is a failure (i.e., echo $? displays 9 which is a non-zero value, thus denotes a C failure).

mwinkel-dev · 2025-11-18T02:44:50Z

wfevent/wfevent.c

      default:
        printhelp(argv[0]);
-        return 1;
+        return C_ERROR;


Note that this is a change in behavior. The define is C_ERROR = -1, which echo $? displays as 255.

The conjecture is that very few users have shell scripts that check the status code of wfevent. Thus, this change is likely very low risk.

Alternatively, we could add a new define, C_FAILURE = 1, and use it instead of C_ERROR.

mwinkel-dev · 2025-11-18T02:51:01Z

mdsshr/MdsEvents.c

  {
    receive_thread_ids[idx] = searchOpenServer(receive_servers[idx]);
-    if(receive_thread_ids[idx] < 0)
+    if(receive_thread_ids[idx] <= INVALID_CONNECTION_ID)


Note that around line 413 below, an error condition is detected whereupon a socket is set to zero (i.e., stdin) which seems a bit odd. The code works "as is", so it is probably OK to keep it.

mwinkel-dev · 2025-11-18T21:18:14Z

There appears to be yet another issue with TCP events, thus this WIP definitely needs more work.

To test TCP events, it is advisable to use three virtual machines, call them A, B and C.

A sets mds_event_target to point to B. B runs the mdsip process that relays the TCP event. C sets mds_event_server to B. On all VMs, make sure the software firewall permits TCP traffic on the desired port.

The test consists of the following:

On C, do wfevent test_event
On A, do setevent test_event
On C, check to see if the test_event was received
On B, check the messages displayed by the mdsip process to troubleshoot any issues.

The above test revealed that C would only receive the test_event if B's software firewall was configured to allow UDP traffic on port 4000 (the default for MDSplus UDP events). Because the three VMs are on the same virtual network, it is probable that B is sending out UDP events that are received by A and C.

Josh has confirmed that TCP events should work without using UDP on any of the VMs. (Similar behavior was seen months ago, but it was incorrectly assumed that TCP events were designed to convert to UDP at the target system.)

Thus, additional investigation is required.

mwinkel-dev · 2025-11-18T23:15:10Z

Experiments prove that TCP events are indeed being converted to UDP events at the "target" system. This occurs with both TCL's setevent command and the separate setevent utility. Fixing TCP events will likely be complicated, so that task should probably should be split off as a new issue / PR.

Here are the details . . .

Both TCL and the seteventutility call the MDSEvent() function of MdsEvents.c, which detects the mds_event_target environment variable and eventually causes sendRemoteEvent() to be called. That function sends a TDI expression to the "target" system that causes it to evaluate the setevent.fun TDI function, which in turn causes the "target" to also call its MDSEvent() function. However, on the "target", the mds_event_target and mds_event_server environment variables are undefined, and thus the target's MDSEvent() calls MDSUdpEvent(). Thus the "target" converts TCP events into UDP events.

references TDI function

setevent.fun

MDSEvent

mwinkel-dev · 2025-11-19T03:28:14Z

A tentative conclusion from additional experiments is that UDP is the backbone of TCP events.

Assume four servers: A, B, C, and D. B and C are on the same network segment. D executes a wfevent test_event command that uses TCP to listen to an mdsip process on C. The A server does a setevent test_event that uses TCP to send an event to an mdsip process on B. B converts the TCP event to a UDP event, which is seen by the mdsip process on C, which in turn then uses TCP to notify D.

It is difficult to test the above configuration when all the four systems are VMs are on the same virtual network. However, based on experiments and perusing the source code, it seems likely that UDP is the backbone with TCP being used for the first and last network hop in the chain.

More investigation is required.

mwinkel-dev · 2025-11-19T16:20:37Z

An additional experiment using debug print statements proves the preceding conjecture that TCP events are converted to / from UDP events.

WhoBrokeTheBuild

These all look like good changes, and I think the risks are minimal. LGTM

mwinkel-dev added 6 commits November 17, 2025 12:45

Fix: first TCP setevent no longer throws bogus error message

9874518

Fix: rewrite setevent using INVALID_CONNECTION_ID

23e2581

Fix: the setevent utility now uses C exit codes

9366af2

Fix: wfevent now uses C exit defines for clarity

6721c2e

Fix: wfevent now how consistent style on returns

1079327

Fix: rewrite sendRemoteEvent() so retries on each server if needed

8d7e819

mwinkel-dev self-assigned this Nov 18, 2025

mwinkel-dev added bug An unexpected problem or unintended behavior US Priority tool/event Relates to the event tools (wfevent, setevent) labels Nov 18, 2025

mwinkel-dev commented Nov 18, 2025

View reviewed changes

mwinkel-dev requested review from WhoBrokeTheBuild, heidthecamp, joshStillerman and santorofer November 18, 2025 02:53

WhoBrokeTheBuild approved these changes Nov 19, 2025

View reviewed changes

mwinkel-dev marked this pull request as ready for review November 19, 2025 17:03

mwinkel-dev merged commit 89fdb73 into MDSplus:alpha Nov 19, 2025
1 check passed

mwinkel-dev mentioned this pull request Nov 19, 2025

mdstcl cli setevent fails on first attempt #2977

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: TCP setevent no longer throws bogus error message#2986

Fix: TCP setevent no longer throws bogus error message#2986
mwinkel-dev merged 6 commits intoMDSplus:alphafrom
mwinkel-dev:mw-2977-setevent

mwinkel-dev commented Nov 18, 2025

Uh oh!

mwinkel-dev commented Nov 18, 2025 •

edited

Loading

Uh oh!

mwinkel-dev commented Nov 18, 2025 •

edited

Loading

Uh oh!

mwinkel-dev Nov 18, 2025

Uh oh!

mwinkel-dev Nov 18, 2025 •

edited

Loading

Uh oh!

mwinkel-dev Nov 18, 2025 •

edited

Loading

Uh oh!

mwinkel-dev Nov 18, 2025

Uh oh!

mwinkel-dev commented Nov 18, 2025

Uh oh!

mwinkel-dev commented Nov 18, 2025

Uh oh!

mwinkel-dev commented Nov 19, 2025

Uh oh!

mwinkel-dev commented Nov 19, 2025

Uh oh!

WhoBrokeTheBuild left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mwinkel-dev commented Nov 18, 2025

Uh oh!

mwinkel-dev commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mwinkel-dev commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mwinkel-dev Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

mwinkel-dev Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mwinkel-dev Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mwinkel-dev Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

mwinkel-dev commented Nov 18, 2025

Uh oh!

mwinkel-dev commented Nov 18, 2025

Uh oh!

mwinkel-dev commented Nov 19, 2025

Uh oh!

mwinkel-dev commented Nov 19, 2025

Uh oh!

WhoBrokeTheBuild left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mwinkel-dev commented Nov 18, 2025 •

edited

Loading

mwinkel-dev commented Nov 18, 2025 •

edited

Loading

mwinkel-dev Nov 18, 2025 •

edited

Loading

mwinkel-dev Nov 18, 2025 •

edited

Loading