Skip to content

Fix: reduce open files due to dispatcher#2740

Merged
zack-vii merged 6 commits intoalphafrom
zck_dispatch
Apr 16, 2024
Merged

Fix: reduce open files due to dispatcher#2740
zack-vii merged 6 commits intoalphafrom
zck_dispatch

Conversation

@zack-vii
Copy link
Contributor

This is related to issue #2731 and may fix some if not all of the related problems.

  • reuse action server 's connection id (prevent extra sockets if connection already exists)
  • set dispatched flag before dispatching to prevent possible race condition
  • lock Clients list in ServerQAction and check socket before reusing

@zack-vii zack-vii changed the title Fix: reduce open files bue to dispatcher Fix: reduce open files due to dispatcher Apr 10, 2024
@zack-vii
Copy link
Contributor Author

bummer, i cannot access jenkins so its unclear to me what went wrong. I will test locally tomorrow.

@mwinkel-dev
Copy link
Contributor

Hi @zack-vii,

Thanks for submitting this proposed fix for GA's leaking sockets, Issue #2731.

In addition to the files you've changed, I've also found it useful to change some additional files. My conjecture, perhaps wrong, is that the final solution will be a melding of our two proposed fixes.

I am now building your PR locally on my dev system and running it through my suite of tests. Will post results here within an hour or two.

@mwinkel-dev
Copy link
Contributor

mwinkel-dev commented Apr 11, 2024

Hi @zack-vii,

Testing of this PR demonstrated that this PR fixes the primary case, but fails on an edge case.

Pro -- This PR is a sever-side fix for the primary cause of leaked sockets (i.e., the situation GA encountered). It is thus a more elegant solution than the client-only fix that I created. I agree that this PR should be used. On Thursday or Friday, I will complete a review of this PR.

Con -- If mdstcl dispatches actions so quickly that it overloads the action server, then some actions will be marked as "failed" by the client. Failed actions remain connected to the action server, but never receive a SrvJobFINISHED so are zombie actions. Thus they leak sockets. Even with this PR 2740, my testing was able to cause ~100 sockets to leak. (I am working on a client-side fix for that edge case. However, you are more familiar with this code than I am. A-OK for us both to work on it and then compare our solutions.)

Other -- Looks like the Client_do_message() of Client.h needs a Client_remove(c, factive) call in the case SrvJobFINISHED section.

Next Steps -- I appreciate the assistance and collaboration. Here is my suggestion on how we should proceed:

  • I'll do a detailed review of this PR, plus do some more testing.
  • So that after Jenkins builds this PR, I can approve it.
  • Whereupon you can merge it into alpha.
  • We'll then add a new PR for the "failed action" edge case.
  • I'll add a third PR to clean up some other cruft that I've spotted in the dispatcher.
  • After we are satisfied that all is OK, then we'll cherry-pick the associated PRs from alpha to GA's branch.
  • And then GA's branch will be built and the RPMs distributed to GA.

Addendum
Many of the following posts by @mwinkel-dev are conjectures and thus wrong. For the summary of the investigation, refer to the post in Issue #2731 referenced by this link.
#2731 (comment)

@mwinkel-dev mwinkel-dev added bug An unexpected problem or unintended behavior US Priority tool/tcl Relates to the Tree Control Language or mdstcl prompt labels Apr 11, 2024
@zack-vii
Copy link
Contributor Author

Regarding the Con. I think the expected behavior seems to be that the server hold one connection to each action server. That is even after the phase, the shot is over and the cycle begins anew the connection may be reused.
The issue on the client side was that it did not properly detect if the socket was disconnected. I far as I oversaw the current implementation there is only one detached thread that handles the listening socket for incoming action_server replies as well as established connection.

My simple but effective tdi script for testing is;

_root=getenv('MDSPLUS_DIR')
setenv(CONCAT('test_path=',_root,'/test_path'))
_dispatch=BUILD_DISPATCH(2, 'DUMMY', 'INIT', 50, "")
_task=BUILD_PROGRAM(1, "echo test")
_action=BUILD_ACTION(`_dispatch, `_task)
treeopennew('test', 1)
treeaddnode("act", _n, 'ACTION')
treeputrecord("act", `_action)
FOR (_i=1;_i<=3000;_i++) EXECUTE("treeaddnode($1, _n, 'ACTION');treeputrecord(`$1, `$2);", EXECUTE("DECOMPILE(`$)", _i), _action)
treewrite()
treeclose()
setenv('DUMMY=localhost:30000')
tcl('set tree test/shot=1')
tcl('dispatch act')
tcl('dispatch/build')
SPAWN(CONCAT('mdsip -s -p 30000 -h ', _root, '/testing/mdsip.hosts&'))
WRITE(1, _out)
WAIT(1)
tcl('dispatch/phase INIT')
WAIT(1)
SPAWN('killall mdsip')
SPAWN(CONCAT('mdsip -s -p 30000 -h ', _root, '/testing/mdsip.hosts&'))
WAIT(1)
tcl('dispatch/phase INIT')
WAIT(1)
tcl('dispatch/close')
SPAWN('killall mdsip')

and can be invoked from the development environment

# after checkout; usual setup
./bootstrap
./configure --debug  # . . .
# enter development environment with all env vars set
make tests-env
# ch dir to root of repo
cd $MDSPLUS_DIR
# create folder for test tree
mkdir test_path
# update bins
make
# run test
gdb --args tditest dispatch-test.tdi
# update source ; goto update bins

During the WAIT(1) are good spots where you can inspect (gdb: Ctrl+C) p *Clients, p *Clients->next and will hopefully find: Clients->next becomes NULL soon after the mdsip service was terminated. You can comment # or move around the spawn calls or even fire up an external server (make sure the env vars are set) to observe its behavior when the dispatch server restarts or stays alive in-between dispatches.

Interesting to see would be how this hold if you add an active monitor server or more involved action_servers.

@zack-vii
Copy link
Contributor Author

I may have found the issue with the python dcl_dispatcher_test.py. When checking for an existing connection, we need to check if the conid is still valid.

@mwinkel-dev
Copy link
Contributor

Hi @zack-vii,

Thanks for the additional detail.

My test harness is comparable, and I do see one retained connection that you describe. That works well and is not a concern.

The edge cases I am investigating will likely not arise often during practical use of mdstcl. However, it is a big deal if leaked sockets force GA to reboot the entire physical server. Thus, I have been stress testing the dispatch feature of mdstcl to characterize its failure modes. (My goal is to confirm that if the edge cases do arise, that it won't be necessary to reboot the server every day.)

The current edge case I have been investigating consists of slow actions (e.g., wait(2.0);1;) with mdstcl dispatching at 10 actions per second or faster. That is how I ended up with ~100 leaked sockets.

Thanks for mentioning the action monitor and different dispatch phases. I will also do experiments with those features.

@mwinkel-dev
Copy link
Contributor

Hi @zack-vii,

I've studied every line of this PR -- it is a nice fix! And the code refactoring also adds clarity.

After my lunch break, I will submit a review and approval for this PR.

@mwinkel-dev
Copy link
Contributor

Hi @zack-vii,

Regarding ServerQAction.c file, I like the changes you made.

  • The send_reply() function now always calls cleanup_client() thereby preventing the leaking sockets. (The old code only cleaned up when an error occurred.)
  • Refactoring AttachPort() into two functions makes the code easier to understand.
  • Refactoring RemoveClient() into two functions also adds clarity.
  • Excellent that a mutex now protects the manipulations of the Clients linked list in these functions: add_client(), find_client() and remove_client().

In the ServerDispatchPhase.c file, moving the actions[i].dispatched = 1 line eliminates a race condition.

Note though that on the client-side, I encountered a different race condition. After the client calls SendArg(), it then immediately calls GetAnswerInfoTS() to read the status handshake (i.e., the value 1 for success). However, the action was completed by the server before the client had finished processing the GetAnswerInfoTS(). Normally that would not occur. But the stress test of hundreds of actions dispatched at once caused that scenario to arise. With my client-only fix (and actions that evaluate very quickly), occasionally one of the receiver threads was tearing down the connection / socket at the same time that the main thread's GetAnswerInfoTS() was using the connection, which usually resulted in a crash.

We won't be using my client-side fix, thus the specific problem I created with that fix vanishes.

However, I do wonder what will happen if the server-side fix kills both sockets (i.e., cleans up the client) while the client's main thread is in the midst of the GetAnswerInfoTS(). I believe that the client will detect that as an error and will correctly handle that error. However, I will do additional testing to confirm that is true.

@mwinkel-dev
Copy link
Contributor

I will approve this PR after Jenkins is able to build it on all platforms. It presently fails on on RHEL7 and Windows.

RHEL7

/opt/jenkins/workspace/MDSplus_PR-2740/rhel7/servershr/ServerQAction.c: In function 'find_client':

/opt/jenkins/workspace/MDSplus_PR-2740/rhel7/servershr/ServerQAction.c:897:3: error: 'for' loop initial declarations are only allowed in C99 mode

   for (ClientList **p = &Clients; *p != NULL; p = &(*p)->next)

   ^

/opt/jenkins/workspace/MDSplus_PR-2740/rhel7/servershr/ServerQAction.c:897:3: note: use option -std=c99 or -std=gnu99 to compile your code

/opt/jenkins/workspace/MDSplus_PR-2740/rhel7/servershr/ServerQAction.c: In function 'remove_client':

/opt/jenkins/workspace/MDSplus_PR-2740/rhel7/servershr/ServerQAction.c:924:3: error: 'for' loop initial declarations are only allowed in C99 mode

   for (ClientList **p = &Clients; *p != NULL; p = &(*p)->next)

   ^

Windows

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c: In function 'setup_client':

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:949:14: error: format '%d' expects argument of type 'int', but argument 3 has type 'SOCKET' {aka 'long long unsigned int'} [-Werror=format=]

  949 |       MDSMSG("setup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |              ^~~~~~~~~~~~~~~~~~~~~~             ~~~~

      |                                                 |

      |                                                 SOCKET {aka long long unsigned int}

/opt/jenkins/workspace/MDSplus_PR-2740/windows/_include/mdsmsg.h:79:25: note: in definition of macro '__MDSMSG'

   79 |     pos += sprintf(pos, __VA_ARGS__);              \

      |                         ^~~~~~~~~~~

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:949:7: note: in expansion of macro 'MDSMSG'

  949 |       MDSMSG("setup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |       ^~~~~~

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:949:33: note: format string is defined here

  949 |       MDSMSG("setup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |                                ~^

      |                                 |

      |                                 int

      |                                %I64d

In file included from /opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:46:

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:949:14: error: format '%d' expects argument of type 'int', but argument 3 has type 'SOCKET' {aka 'long long unsigned int'} [-Werror=format=]

  949 |       MDSMSG("setup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |              ^~~~~~~~~~~~~~~~~~~~~~             ~~~~

      |                                                 |

      |                                                 SOCKET {aka long long unsigned int}

/opt/jenkins/workspace/MDSplus_PR-2740/windows/_include/mdsmsg.h:79:25: note: in definition of macro '__MDSMSG'

   79 |     pos += sprintf(pos, __VA_ARGS__);              \

      |                         ^~~~~~~~~~~

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:949:7: note: in expansion of macro 'MDSMSG'

  949 |       MDSMSG("setup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |       ^~~~~~

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:949:33: note: format string is defined here

  949 |       MDSMSG("setup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |                                ~^

      |                                 |

      |                                 int

      |                                %I64d

In file included from /opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:46:

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c: In function 'cleanup_client':

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:962:12: error: format '%d' expects argument of type 'int', but argument 3 has type 'SOCKET' {aka 'long long unsigned int'} [-Werror=format=]

  962 |     MDSMSG("cleanup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |            ^~~~~~~~~~~~~~~~~~~~~~~~             ~~~~

      |                                                 |

      |                                                 SOCKET {aka long long unsigned int}

/opt/jenkins/workspace/MDSplus_PR-2740/windows/_include/mdsmsg.h:79:25: note: in definition of macro '__MDSMSG'

   79 |     pos += sprintf(pos, __VA_ARGS__);              \

      |                         ^~~~~~~~~~~

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:962:5: note: in expansion of macro 'MDSMSG'

  962 |     MDSMSG("cleanup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |     ^~~~~~

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:962:33: note: format string is defined here

  962 |     MDSMSG("cleanup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |                                ~^

      |                                 |

      |                                 int

      |                                %I64d

In file included from /opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:46:

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:962:12: error: format '%d' expects argument of type 'int', but argument 3 has type 'SOCKET' {aka 'long long unsigned int'} [-Werror=format=]

  962 |     MDSMSG("cleanup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |            ^~~~~~~~~~~~~~~~~~~~~~~~             ~~~~

      |                                                 |

      |                                                 SOCKET {aka long long unsigned int}

/opt/jenkins/workspace/MDSplus_PR-2740/windows/_include/mdsmsg.h:79:25: note: in definition of macro '__MDSMSG'

   79 |     pos += sprintf(pos, __VA_ARGS__);              \

      |                         ^~~~~~~~~~~

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:962:5: note: in expansion of macro 'MDSMSG'

  962 |     MDSMSG("cleanup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |     ^~~~~~

/opt/jenkins/workspace/MDSplus_PR-2740/windows/servershr/ServerQAction.c:962:33: note: format string is defined here

  962 |     MDSMSG("cleanup connection %d " SVRJOB_PRI, sock, SVRJOB_VAR(job));

      |                                ~^

      |                                 |

      |                                 int

      |                                %I64d

@zack-vii
Copy link
Contributor Author

However, I do wonder what will happen if the server-side fix kills both sockets (i.e., cleans up the client) while the client's main thread is in the midst of the GetAnswerInfoTS(). I believe that the client will detect that as an error and will correctly handle that error. However, I will do additional testing to confirm that is true.

the server should never cancel out of a regular mdsip request unless it times out between the messages of the same request or is terminated or interrupted of some sort out of the ordinary. hence the result of a request should be independant of the state of the task (scheduled, executing, done). moreover if a job is scheduled it should return success. a race condition may arise only if the reply is lost. due to tcp we would probably run into a timeout assuming the dispatch was incomplete but actually was not. that should be very raw and requires an underperforming network considering the trafic.

@zack-vii
Copy link
Contributor Author

@mwinkel-dev thanks for pointing out the issues with the jenkins checks.. looks like they are knowen issues that i simply payed not attention to. i will see if i can sort them out over the weekend... the power of docker ;)

@mwinkel-dev
Copy link
Contributor

Hi @zack-vii,

Thanks for answering my questions. And also for coming up with a much better fix than I had on the client-side.

Monday is a holiday for us. So we'll resume work on this next week.

@zack-vii
Copy link
Contributor Author

fingers crossed, it went through on my machine but if it fails please hint me the failing platforms.

@smithsp
Copy link

smithsp commented Apr 16, 2024

@zack-vii @mwinkel-dev It looks like it passed. :-)

Copy link
Contributor

@mwinkel-dev mwinkel-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This server-side fix should be viewed as a partial fix of Issue #2731.

  • Testing shows that during normal usage scenarios, this PR prevents sockets from leaking.
  • However, it does not address some "edge cases" that can cause many sockets to leak when mdstcl and/or the action server are overloaded.

The summary of this code review is in the following post.
#2740 (comment)

The full details consist of this post and all following posts.
#2740 (comment)

Note: -- It is probable that the complete fix of Issue #2731 will require additional PRs.

@mwinkel-dev
Copy link
Contributor

Hi @zack-vii,

I have just approved this PR. It successfully passed the Jenkins build. And also passed much (but not all) of my testing. It can now be merged to the "alpha" branch.

Hi @smithsp,

This PR is a partial fix of Issue #2731. In my opinion, it is robust enough to handle GA's normal workflows. However, there are some "edge cases" that sill leak sockets (albeit at a much slower rate than when GA had to reboot a server on 25-Mar-2024). You should decide if you want this partial fix now, or instead wish to wait until the full fix is available.

If you want this partial fix now, it will take us a day or so to cherry-pick it into the GA branch, build the packages, do another round of testing, and distribute the build to GA.

@zack-vii zack-vii merged commit a194645 into alpha Apr 16, 2024
@zack-vii zack-vii deleted the zck_dispatch branch April 16, 2024 04:53
@zack-vii
Copy link
Contributor Author

@mwinkel-dev : can you give me some details about the edge cases that are still leaking.

@mwinkel-dev
Copy link
Contributor

Hi @zack-vii,

I am testing the "action server" as I would any web server -- normal load (which is A-OK), spike load (fails), heavy continuous load (still to do) and so forth.

For the spike load test, I have mdstcl dispatch hundreds of actions faster than the action server can execute them. In that scenario, some of the dispatched actions are reported as failing because they've lost connection to the action server. Each of those failed actions leaks a socket. I have generated ~100 leaked sockets with that test.
#2740 (comment)
#2740 (comment)

I have also noticed in the code that the action server has a limited port range of 100 to 200 ports. It is possible (probable?) that the spike load test consumes all of those ports.

Although the spike load condition is unlikely to arise during normal workflows at GA, if it ever does then it might force GA to reboot the physical server. Which disrupts the work of many users and thus is a serious problem.

My hunch is that the fix for the spike load test will involve a client-side (mdstcl) fix. And instead of doing load balancing and dynamically adjusting the number of available receiver threads, a simpler approach would simply be to add a throttle to mdstcl so it can never dispatch more actions than available receiver threads. (I already wrote such a throttle as part of my client-side only approach for solving Issue #2731.)

Now that your client-side fix (i.e., this PR) has been merged to alpha, I will continue my investigation / testing. When I find the root cause of the problem, I will report my findings via GitHub (i.e., in a new issue).

@smithsp
Copy link

smithsp commented Apr 16, 2024

@mwinkel-dev We would like to take you up on your offer to get a cherry-pick version of this partial fix to our GA version with RPM kits. Thank you for your efforts.

@mwinkel-dev
Copy link
Contributor

Hi @smithsp,

OK, will do. Here are the steps.

  • This evening, I will repeat my socket tests to make sure the alpha branch (with this partial-fix PR merged in) behaves the same way it did when I tested it prior to the merge.
  • On Wednesday morning we will start the process (e.g., cherry-pick this PR into the GA branch, etc).
  • After the RPMs are produced, I will repeat the socket testing (plus run the full suite of automated tests).
  • And then we'll distribute the RPMs to GA.

Note: -- Unless GA objects, we will also include PR #2735 in the cherry-pick to the GA branch. That is a simple change that can be useful when troubleshooting multi-threaded code. It would be useful to have that feature in the GA branch too.

@mwinkel-dev
Copy link
Contributor

mwinkel-dev commented Apr 17, 2024

Hi @smithsp and @sflanagan,

Before this PR was merged to the alpha branch, it passed all automated tests that Jenkins runs (on all platforms).

After it was merged to alpha, it also passed the following manual tests (performed on Ubuntu 20.04).

  • did not leak sockets under a normal workload
  • did not leak sockets under a spike load of 400 actions (see note below)
  • re-ran the IDL tests and they passed
  • passed the MATLAB tests

Caveats

  • This fix has not been tested with the "action monitor".
  • Has only been tested with a single computer running both the mdstcl dispatch and the "action service". Stated another way, it has not been tested with a pair of computers: one running mdstcl, the other running the "action service".
  • Has not been tested while other people were using the same computer.
  • My testing has only been done on Ubuntu 20.04; have not done testing on RHEL8.
  • My manual testing has only used the dispatch /build and dispatch /phase statements of mdstcl. Other dispatch related features have not been tested (e.g., abort server, dispatch /check, dispatch /close, and so forth).
  • The spike load test results are presently a mystery. The log files indicate that this spike load test worked OK. But there are also reasons to question the results. (The inconsistency compared to previous spike load tests might be caused by a slightly different test harness.)

Summary
This PR worked well after being merged to alpha. It likely covers all normal workflows. If it fails, it will likely be when dealing with edge cases.

@zack-vii
Copy link
Contributor Author

I have also noticed in the code that the action server has a limited port range of 100 to 200 ports. It is possible (probable?) that the spike load test consumes all of those ports.

the 100 ports are given by the default of the MDSIP_REPLY_PORT_RANGE (? or similar) env var which i thing ranges 8800-8899. it was used as one port per actionserver. I think this is not the case anymore as there is on listening port that handles all replies.

I will try to replicate your setup. Do you dispatch the actions in a single thread or multi-thread?

@mwinkel-dev
Copy link
Contributor

mwinkel-dev commented Apr 17, 2024

Hi @zack-vii,

This post has three topics: PR #2740 behavior (versus prior), client-side cleanup, and my test harness.

1) With PR 2740 versus Without
Regarding last evening's testing of PR #2740 (in the alpha branch), I just realized that all 1,200 dispatched actions shared a single connection ID. My recollection (perhaps wrong) was that prior to the PR, that the "action service" created a separate connection ID for each action. Does sharing the same connection ID across all actions pose any problems? (The log file of the test indicates that the 1,200 actions probably did all execute correctly.). I will repeat this test after breakfast today to make sure my recollection is correct.

2) Client-Side Cleanup
The mdstcl client also maintains data structures. But they probably aren't being cleaned-up. As per my earlier post -- "Looks like the Client_do_message() of Client.h needs a Client_remove(c, factive) call in the case SrvJobFINISHED section.
#2740 (comment)

3) My Test Harness
I use mdstcl to dispatch actions via the dispatch phase command. My test cases typically dispatch in multiples of 400 actions. Although mdstcl is a multi-threaded program, it only uses a single thread to dispatch actions.

However, my test harness is also based on the client-side only fix that I created for Issue #2731. I have made numerous changes to my test harness to eliminate features that are now handled by your PR #2740. It is thus possible (probable?) that I am observing a bug in my test harness and not in the dispatch phase feature of mdstcl. After the build / release of the GA branch has been completed, I will investigate and let you know whether it is a bug in my code or not. And when my test harness is stable, I will also make it available to you.

Summary
For GA, the most important questions to answer are 1) and 2) above. Item 3) can be ignored for now.

Addendum
Regarding item 1), the errors log file for the "action service" only shows a single connection was made. I was instead expecting 400 connections made (even if the connection ID was reused).

For 2), I experimented with an approach that allowed the "receiver" threads to access the "thread-static information" in the "main" thread. My experimental code worked OK when it was configured to just read the linked list of connections. However, if it was configured to delete connections, then it caused deadlock when under heavy load. My hope is that with this PR #2740, we can ignore the client-side data structures -- however more testing should be done to confirm that doing so is OK.

Regarding 3), my test harness presently consists of a single instance of mdstcl. It is thus not a multi-threaded test (i.e., not running multiple instances of mdstcl by using multiple terminal windows).

@mwinkel-dev
Copy link
Contributor

mwinkel-dev commented Apr 17, 2024

Hi @zack-vii -- Likely root cause of the edge case that leaks sockets is simply the difference between re-using an existing connection and creating new connections.

  • It appears that during normal usage, a single connection is opened between mdstcl and the "action service" -- and that one connection is used for dispatching all actions to the "action service". This is based on observing stable_7.96.9, alpha of 14-Apr-2024 (immediately prior to this PR 2740), and alpha of 16-Apr-2024 (includes PR 2740).
  • My test harness for the edge case is probably deleting the connection and re-creating it for every single action. Which if so, is definitely an abnormal workflow.

I will do a few more experiments this evening to confirm if this conjecture is indeed true.

@mwinkel-dev
Copy link
Contributor

mwinkel-dev commented Apr 18, 2024

Hi @zack-vii,

Major mystery solved. And yes, my test harness was too extreme.

The crux of the matter is that the architectures of the mdsip and action services are not well documented by the comments in the source code. Thus as a maintenance programmer, one makes guesses based on patterns seen in small regions of the code without understanding the broader context. I guessed wrong. However, as those guesses fail, one learns the architecture of the services by trial and error.

Now that I have a better understanding of the services, I can answer my own questions.

1) With PR 2740 and Without
Yes, the architecture is designed to re-use a single connection to dispatch all actions. Thus stable_7.96.9 (the version GA was running prior to November 2023), alpha of 14-Apr-2024 (prior to PR #2740), and alpha of 16-Apr-2024 (with PR #2740) are all working as designed.

2) Client-side Cleanup
Definitely should not have a Client_remove() call in the SrvJobFINISHED case. The architecture is designed to re-use the connection, and thus the connection should be retained no matter how many actions have completed. The cases that do use Client_remove() are those that are associated with a defective connection. Although connections should not be removed when actions finish, perhaps there are other client-side cleanup tasks that should be done.

3) My Test Harness
My test harness was closing sockets and creating a new connection after each action executed (i.e., whenever an action's "receiver" thread received the SrvJobFINISHED reply). And thus was creating hundreds of connections at the same time that hundreds of actions were being executed. That is an abnormal work flow for mdstcl, and thus a very extreme stress test. Yes, mdstcl and the "action service" should be able to handle that stress test without leaking sockets. But it is an improbable workflow, and thus the release of PR #2740 is low risk.

Addendum
Apparently the "action service" only has one "receiver" thread. Which is a standard design for a web server. (But when I glanced at the high-level routines, I assumed that the service was starting a "receiver" thread for each action.)

@mwinkel-dev
Copy link
Contributor

mwinkel-dev commented Apr 18, 2024

Prior to this PR, the sockets were being leaked in the server_connect() routine of ServerSendMessage.c. For each action, that function was always calling ConnectToMds() which in turn always creates a new connection.

This PR fixes that by changing server_connect() to reuse connections. It only calls ConnectToMds() in the occasional instance when it is unable to reuse a connection.

The other changes made by this PR are also useful (refactoring for clarity, using mutex to protect data structures, and so forth).

mwinkel-dev pushed a commit that referenced this pull request Apr 24, 2024
* Fix: reuse action_server connection id in ServerConnect; avoid duplicates in list

* Fix: set dispatched early; unset if dispatching failed; prevent race on fast actions

* Fix: lock Clients in ServerQAction; cleanup and check before use

* Fix: reconnect dropped connections

* Fix: use correct windows SOCKET print format

* Fix: satisfy rhel7 c standard
WhoBrokeTheBuild pushed a commit that referenced this pull request May 21, 2024
* Fix: reuse action_server connection id in ServerConnect; avoid duplicates in list

* Fix: set dispatched early; unset if dispatching failed; prevent race on fast actions

* Fix: lock Clients in ServerQAction; cleanup and check before use

* Fix: reconnect dropped connections

* Fix: use correct windows SOCKET print format

* Fix: satisfy rhel7 c standard
WhoBrokeTheBuild added a commit that referenced this pull request May 21, 2024
* Gm apd java (#2729)

* Improve APD support for Java interface

* Improve APD support for Java - forgotten files

* Commit packages

* When activate debug trace, now compiles without error. (#2735)

This fixes Issue 2734.

* Fix: reduce open files due to dispatcher (#2740)

* Fix: reuse action_server connection id in ServerConnect; avoid duplicates in list

* Fix: set dispatched early; unset if dispatching failed; prevent race on fast actions

* Fix: lock Clients in ServerQAction; cleanup and check before use

* Fix: reconnect dropped connections

* Fix: use correct windows SOCKET print format

* Fix: satisfy rhel7 c standard

* Gm apd thin cpp (#2742)

* Added ADP support in C++ thin client

* Added tdi fun

* Added TDI FUn

* Fix commands

* Gm new marte (#2743)

* more parameters for marte2_simulink_generic

* Proceed with the new implementation

* Proceed

* Proceed

* Proceed

* Proceed

* Proceed

* proceed

* Proceed

* Proceed

* Partially tested version

* Added execution times recording

* Proceed

* Procced with debugging

* Proceed

* Proceed

* Proceed

* Fixes for multisampled acquisition

* Remove quotes from string parameters

* Minor fixes

* Procced debugging

* Debugging

* More channels

* Debug Distributed configuration

* Fix sognal recording for synchronized inputs

* Further debug

* Further debug

* Small fixes

* Close ti final version

* Forgotten fix

* Make port visible, fix parameter name

* unaligned nids

* Increase DiscontinuityFactor

* Discontinuityfactor

* More channels

* Proceed with the new implementation

* Proceed

* Proceed

* Proceed

* Proceed

* Proceed

* proceed

* Proceed

* Proceed

* Partially tested version

* Added execution times recording

* Proceed

* Procced with debugging

* Proceed

* Proceed

* Proceed

* Fixes for multisampled acquisition

* Remove quotes from string parameters

* Minor fixes

* Procced debugging

* Debugging

* More channels

* Debug Distributed configuration

* Fix sognal recording for synchronized inputs

* Further debug

* Further debug

* Small fixes

* Close ti final version

* Forgotten fix

* Make port visible, fix parameter name

* unaligned nids

* Increase DiscontinuityFactor

* Discontinuityfactor

* More channels

* Packages updated

* Remove print

* Remove error messages

---------

Co-authored-by: mdsplus <mdsplus@roactive2.rfx.local>

* Docs: Improve documentation for getSegment* python wrappers (#2732)

Add explanation and rename parameters for:
* getSegmentLimits
* getSegmentList

* Fix: Update JAVASOURCE to 8 to support JDK 17 (#2747)

* Fix: improve mdstcl's error handling and add comments (#2746)

* add comments regarding action service

* send_reply() now does cleanup_client() on bad socket

* explain mdstcl's receiver thread cannot access main thread's connection list

* Improve handling of non-MDSplus error codes

* add comments regarding action dispatch

* add comment explaining receiver thread select loop

* Fix: multiple string escape warnings thrown by python 12 (#2748)

```
mdsplus/pydevices/RfxDevices/FAKECAMERA.py:40: SyntaxWarning: invalid escape sequence '\C'
  {'path': ':EXP_NODE', 'type': 'text', 'value': '\CAMERATEST::FLIR:FRAMES'},

mdsplus/pydevices/RfxDevices/PLFE.py:220: SyntaxWarning: invalid escape sequence '\#'
  '^(\#[0-5][01]([01][0-9][0-9]|2[0-4][0-9]|25[0-5])){6}$', msg)

mdsplus/pydevices/RfxDevices/CYGNET4K.py:361: SyntaxWarning: invalid escape sequence '\E'
  self.serialIO(b'\x55\x99\x66\x11\x50\EB', None)

mdsplus/pydevices/RfxDevices/CYGNET4K.py:461: SyntaxWarning: invalid escape sequence '\8'
  return self.setValue(b'\81\x82', min(0xFFF, value), True)

mdsplus/pydevices/MitDevices/dt100.py:161: SyntaxWarning: invalid escape sequence '\.'
  regstr = '([0-9\.]*) [0-9] ST_(.*)\r\n'
```

The \CAMERATEST became \\CAMERATEST
The regex strings should be python r-strings `r""`, but to maintain backwards compatibility, we're using \\
The broken hex-codes now have x in them

* Build: Resolve linker error after updating the windows builder to Fedora 39 (#2749)

* Build: Resolve linker error after updating the windows builder to Fedora 39

This appeared after updating the mdsplus/builder:windows docker image to Fedora 39, and Wine to 9.0
The newer libxml2 tried to link dynamically unless we explicitly set LIBXML_STATIC

* Hopefully fix the MdsTreeNodeTest

It turns out that this was failing previously, but we weren't properly catching the error

* Fix errors in windows build from newer gcc

* Docs: Update sites.csv (#2615)

add Startorus Fusion in Xi'an, China

* Fix: mdsip now sends proper auth status back to the client (#2752)

Fixes issues #2750 and #2652

* Fix: mdstcl's `show current` no longer segfaults when no tree paths defined (#2754)

* Fix: "show current" no longer segfaults when no tree paths defined

* Fix: corrected typo in error message

* Use original error message so tests pass

* Fix: Add Debian 12 and Ubuntu 24.04 and support GCC 12+ (#2753)

* Build: Add Debian 12 and Ubuntu 24.04

* Add extra flags for GCC 12+ and stub imp for Python 3.12

GCC 12+ triggers a bunch of false positive warnings (which we treat as errors)
This adds AX_C_FLAGS to configure those `-Wno-*` flags for GCC 12+
`cmdExecute.c` now uses snprintf to avoid buffer overflow warnings, also generated by GCC 12+
`compound.py.in` now supports Python 3.12+

* compound.py now supports Python 2.7.. again

---------

Co-authored-by: Stephen Lane-Walsh <slwalsh@psfc.mit.edu>

* Fix: Improve error messaging when calling Setup Device in jTraverser (#2744)

* Improve error messaging when calling Setup Device in jTraverser

e.getMessage() sometimes returned null, but just e will always print something
Add a printStackTrace() for InvocationTargetException exceptions to show the encapsulated error

* Add import for InvocationTargetException

* Build: Fix off-by-one versions produced by Jenkins (#2756)

This fixes the bug where `--os=bootstrap` wasn't receiving the version from `--version=x.y.z`
However, confusingly, this also changes the Jenkinsfile to not use that feature, and instead use `git tag` in order to embed the proper git information as well as the proper version information
The `--os=bootstrap` and `--version` fix is still included just so that it doesn't break if someone else tries to use it

* Build: Increase default test timeout to 1h (#2757)

When the build server(s) are at capacity, it's not unreasonable for a test to take more than 10 seconds, which was the old default timeout
This sets the default to 1h, and removes the overrides in various tests

* Gm fix filter (#2755)

* Allow filtering data from MinMax resampling; remove useless thread in jServer

* Fix compile error

* Remove debug message

* Make Windows Compiler happy

* Build: Fix 'HEAD' in `show version` and tag error (#2758)

Jenkins builds in a detached HEAD state, which caused bootstrap to use HEAD as the branch name
We pass --branch= to the bootstrap call in Jenkins, but $BRANCH wasn't being passed into the bootstrap docker container
Also, attempts to build alpha versions with tags that already existed failed

* Fix: mdstcl show version tag and links (#2760)

Fixes Issue #2759

* Feature: CompileTree will exit with non-zero status code for error messages. (#2446)

And error message should go to stderr.

* Build: Add package override for ubuntu and debian (#2761)

Override sections for Ubuntu 24 and Debian Bookworm were added.

* Fix: Python release version tag (#2764)

* Feature: Add "Date:" to show version output (#2767)

Implements #2766

Example:
```
$ mdstcl sho ver

MDSplus version: 7.140.75
----------------------
  Release:  alpha_release-7-140-75
  Date:     Thu May 16 17:43:14 UTC 2024
  Browse:   https://github.com/MDSplus/mdsplus/tree/alpha_release-7-140-75
  Download: https://github.com/MDSplus/mdsplus/releases/tag/alpha_release-7-140-75
```

* Fix: remove abort flag from RfxDevices DIO2 initialization (#2769)

Fixes issue #2768

* Fix: Missing repo metadata signing (#2770)

This will hopefully fix the lack of signed metadata files that are preventing us from automatically publishing releases

---------

Co-authored-by: GabrieleManduchi <gabriele.manduchi@igi.cnr.it>
Co-authored-by: mwinkel-dev <122583770+mwinkel-dev@users.noreply.github.com>
Co-authored-by: Timo Schroeder <zack-vii@users.noreply.github.com>
Co-authored-by: mdsplus <mdsplus@roactive2.rfx.local>
Co-authored-by: Josh Stillerman <jas@psfc.mit.edu>
Co-authored-by: Fernando Santoro <44955673+santorofer@users.noreply.github.com>
Co-authored-by: Louwrensth <Louwrensth@users.noreply.github.com>
WhoBrokeTheBuild pushed a commit that referenced this pull request Jun 30, 2025
* Fix: reuse action_server connection id in ServerConnect; avoid duplicates in list

* Fix: set dispatched early; unset if dispatching failed; prevent race on fast actions

* Fix: lock Clients in ServerQAction; cleanup and check before use

* Fix: reconnect dropped connections

* Fix: use correct windows SOCKET print format

* Fix: satisfy rhel7 c standard
WhoBrokeTheBuild added a commit that referenced this pull request Jun 30, 2025
* Gm marte2 updates (#2324)

* Fix:improve error messages

* Some minor fixes

Co-authored-by: GabrieleManduchi <andrea.rgn@gmail.com>

* Fix: explicitly setting the WRTD_TICKNS from tree node value (#2323)

* Fix: explicitly setting the WRTD_TICKNS from tree node value

* Modify some node options

* Clean the node entries for better readability

* Fix: Add vaildation of sample rate frequencies for SC devices (#2328)

* Feature: Add extern entrypoint CamXlateLogicalname (#2321)

A user wants to call xlate_logicalname in camshr.  New entrypoint
added to expose this routine.

Closes issue:
https://github.com/MDSplus/mdsplus/issues/2319

* Fix: change wait mode from 'Default' to 'Busy' (#2336)

Default Wait mode does not seem to work in all circumstances.  this
commit changes it to 'Busy' wait instead.

Also: Encode strings for Python3 compatability.

* Feature: New TDI python function that query Influxdb data from an MDSplus node (#2348)

* Feature: New TDI python fun to read data from Influxdb from an MDSplus tree

* Only use getTimeContext. Query was improved to be more general

* Add small comment

* Read credentials from file

* Change input order. Add address and credentials

* Address and credentials input as MDSplus tree nodes

* Add small comment

* Consolidate WHERE and time start and end tiime

* Refactor code

* Several changes to input parameters and variable names

* Fix to the 435st class so that the trig_time node is correctly populated with the star time of the shot

* Add file in kernel packaging list

* Fix: remove #pragma once from the code (#2349)

User reported that #pragma once, which is a nonstandard feature
is not supported by their compiler.  As it is 'nonstandard' it
seemed sensible to revert to #ifndef _FILENAME_EXT  instead

* Fix:changed decompile() no more supported to toString() (#2351)

Co-authored-by: mdsplus <mdsplus@scdevail.rfx.local>

* Fix: Get clock plan and calculate the correct nanosec per ticks for the ACQ2106 WRTD TICKNS. (#2355)

* Fix: Set the WRTD_TICKNS from loaded clock plan

* Fix the value of the 5M12 frequency

* Improve some comments

* Improve error handling

* Remove 2 unused nodes

* Remove un-necessary check logic and use sync_role query to check plan

* Remove a small bug when building the plan string

* Fix: Influxdb signal with delta step (#2356)

* Refactor code to include Influx aggregation parameter to deal with setTimeContext() delta step

* Add debug statements

* Add comment remove TODO

* Add changes to 435st and 423st so that the start time is recorded at the beginning

* Build: clang-format

* Build: python use setuptool as default (#2366)

i merge it now so i have time to fix it in case it does not work

* Fix: matlab support for 2021a; removed deprecated stuff (#2365)

* Fix: matlab support for 2021a; removed deprecated stuff

* Build: upodated matlab packages

* Fix: python exceptions, pep, and fixed bytes_list (#2369)

* Fix: python setup, setupkw (#2376)

manually tested with and without setuptools,

* Gm tree thread (#2371)

* Updates to support library for National Devices

* Fix:required changes for trees in threads

These changes are required aftera MDSplus update that requests reopening a tree for every created thread

* New and updated devices

* Update rpm

* Fix wrong nid

* Fix wrong nid

* Python3 compatibility

* Ermove commented

* e Please enter the commit message for ynund comments r changes. Lines starting

* type

* Wrong fix

* MARTE2_SUPERVISOR improvements

* Further devices

* Updated pkg

* Open tree in every thread

* clang-format

* somee python cleanup

Co-authored-by: mdsplus <mdsplus@mcpsl.nbtf>
Co-authored-by: GabrieleManduchi <andrea.rgn@gmail.com>
Co-authored-by: cloud <timo.schroeder@ipp.mpg.de>

* Fix: on some systems UIDs are > 16 bits (alternative) (#2379)

* Fix: on some systems UIDs are > 16 bits

On systems using active directory uids can be constructed from
AD SIDs and may have bits in their high word set.  This PR
addresses https://github.com/MDSplus/mdsplus/issues/2375

If their are bits set in the high word of the UID then do not or in
the group.

When displaying in TCL, if the the low 16 bits of the owner do not
translate to a user, then try to translate all 32 bits to an owner

* Fix: remove grp from tcl and added flag2 bit for 32uid

* Tests: fixed owner to show uid only

Co-authored-by: Josh Stillerman <jas@psfc.mit.edu>

* Fix: add list fall-back implementation to support older servers (#2383)

* Fix: add list fall-back implementation to support older servers

MdsIpTunnel is special MdsIpFile, support space ' ' as %20

* Fix: handle Process stream closed

* Fix: fixed/suppressed all warnings

* Build: added possible suppression for existing definite in dlopen

* Fix: check message header before trying to read MSGLEN more bytes

* Fix: drop read and trigger install by env var, default should be by PYTHONPATH (#2388)

* mdsplus-api: improve mdsip, all by String (#2389)

* Gm fix cpp ctx (#2394)

* Fix:change incorrect jScope property file

* Fix:handle tree context in different threads

A diffent tree context is now created whenever the the thread owning the tree object has changed

* Fix:tree context in multithreaded applications

This version computes the correct, thread-dependent, context

* Fix error compiler

* Fix compiler error

* Make sure ctx vector is empty

* Windows portability

* Windows compiler issues

* Remove Finalize()

* Fix leaks

Co-authored-by: GabrieleManduchi <andrea.rgn@gmail.com>

* Mitica update (#2402)

* Fix : Asynchronous data segments acquisition fixed.

:

* Fix: Add device reset, nisync_reset, in init operation

* Added devices

* Add3ed MARTe2 devices

* Fix: Added delay of 500ms in the trigger method to ensure the correct synchronization of timing signals.

* Updared interface

* Updated deployment

Co-authored-by: mdsplus <mdsplus@mcpsl.nbtf>
Co-authored-by: GabrieleManduchi <andrea.rgn@gmail.com>

* Jas add compression methods two (#2407)

* Feature: add NCI attribute compression_method

Use the NCI field spare2 to hold a 1 byte compression_method which
will be used to determine which of the supported (distributed)
compression methods will be used to compress this node.

1 --> 'standard'
2 --> 'gip'

this is a work in progress.
Still need:
  tcl_set_node - keep a stack of error strings since now we have a
                 warning, which will keep going.

  tcl dir/full - display this
  tdi SETNCI and GETNCI
  treeshr (use the compression method)

* 

Use the NCI field spare2 to hold a 1 byte compression_method which
will be used to determine which of the supported (distributed)
compression methods will be used to compress this node.

1 --> 'standard'
2 --> 'gip'

this is a work in progress.
Still need:
  tcl_set_node - keep a stack of error strings since now we have a
                 warning, which will keep going.

  tcl dir/full - display this
  tdi SETNCI and GETNCI
  treeshr (use the compression method)

* Fix: use DESCRIPTOR_CSTRING not DESCRIPTOR macro (#2409)

The DESCRIPTOR macro uses the size of its argument.  Since the code
    is passing a pointer to a string, instead of an array, this is wrong.

* Gm fix marte2 (#2411)

* Fix:change incorrect jScope property file

* Fix:possible deadlock in state machine

Removed Mode=ExpectsReply in generated configuiration file

Co-authored-by: GabrieleManduchi <andrea.rgn@gmail.com>

* Fix: can not get cli_get_value a qualifier twice (#2410)

* Fix: can not get cli_get_value a qualifier twice

Can not and should not anyway ask for the value of the same
qualifier more than one time.

This change gets the compression method, if specified, outside of
the node loop.

* Feature: New Acq2106 423ELF transient device (#2241)

* Feature: Add transient capture for 2106 423ELF

* Remove calibration comment

* Add Debian and Redhat entries for deplay packaging

* Change INIT parameter name

* Change if statement logic for the init() parameter

* Remove double calls to acq400_hapi

* Fix transient init() input parameter

* Add to store a call to getUUT()

* Add extra comment. Arm from INIT

* Feature: mdsplus-api: trust in jsch when handling config (#2422)

* Reverted PR 2361 changes that added nix support. (#2430)

Co-authored-by: Daniel Austin <daustin@zap.energy>

* Fix: move tree copyy to run() in MDSWorker thread (#2425)

* Fix: Setting of Ac2106 435 Signal Conditioning Gains and Offsets in a more efficient way, i.e using PyEpics. (#2432)

* Fix: add new setChanScaleGlobal() to correctly set the global gains and offsets of the signal

* Different approach to setting gains and offsets: no global values

* Remove def_gains and def_offset from parts values

* Change setGainsOffsets to only set when node has a new value

* Improve the change setGainsOffsets to only set when node has a new value

* Add 3 threads for each call to setGainsOffsets() to change the gains and offsets for each card

* Using EPICS calls to set all the gains. Ask for the hostname of the ACQ.

* Bring back the setting of offsets, for consistency

* Improved comment on the EPICS PV variable definitions

* Remove threading when setting each of the sites' gains

* Feature: New acq2106 435ELF transient device (#2240)

* Feature: Add transient capture for 2106 435ELF

* Improve node definition formating

* Add raw input and expression to get the calibrated input

* Add calibrated signal

* Add colibration expression in the input node

* Add Numpy right_shift() to the raw data

* Remove commented out calibration expression section from store

* Add comment on right_shift() usage

* Add deplay packageing for debian and redhat noarch

* Change INIT parameter name and some formatting

* Change if statement logic

* Add second argument to call to init()

* Change resampling parameter value to be False/True value

* Small formatting changes

* Some formatting

* Try to fix a broken rebase. Code is same as alpha mdsplus + the two new changes

* Another small function parameters formatting

* Fix bug on setting the NACC when defaulting to nacc=1

* Reword a warning message

* Fix: Gm fix bagel (#2438)

* Fix:change incorrect jScope property file

* Fix:wrong management of nid data un getXXX() methods

* Fix:correct behavior of downsampling between threads

* Use PickSampleGAM

* Use PickSampleGAM

* Same behavior in subsampling also for GAMs

Co-authored-by: GabrieleManduchi <andrea.rgn@gmail.com>

* Fix: Gm fix bagel (#2440)

* Fix:change incorrect jScope property file

* Fix:wrong management of nid data un getXXX() methods

* Fix:correct behavior of downsampling between threads

* Use PickSampleGAM

* Use PickSampleGAM

* Same behavior in subsampling also for GAMs

* Fix field misalignment

Co-authored-by: GabrieleManduchi <andrea.rgn@gmail.com>

* Gm redp dac (#2443)

* Fix:change incorrect jScope property file

* Fix:wrong management of nid data un getXXX() methods

* Feature:RedPitaya DAC

Added device & support for RedPitaya DAC device

* packages updated

Co-authored-by: GabrieleManduchi <andrea.rgn@gmail.com>

* Feature: Gm marte2 dtt (#2457)

* Added MARTe2 DTT Devices

* Added Drag&Drop for MARTe2 device fields

* Feature:new MARTe2 devices

Added MARTe2 devices for DTT simulation and UDP communication

* rpm updated

* Feature:add the possibility of passing var args via array in MDSplus:… (#2459)

* Feature:add the possibility of passing var args via array in MDSplus::compileWithArgs and MDSplus::executeWithArgs

* Fix:need to change name of compileWithArgs and executeWithArgs for array args

* Back to original name

* added debian-bullseye (#2460)

* added debian11

* fixed package building script for modern python

Co-authored-by: cloud <cloud@ipp.mpg.de>

* Add afixed-length string specifier to the printf for buildtag() (#2475)

There was an error generated if your branch was 12 or more characters. The string would be truncated by git_revision.sh, but it would throw an error about printing a "non-null-terminated string".

* Gm sync redp (#2478)

* Feature:new redpitaya functionality

Added the possibility of synchronizing to a 1 MHz clock that is maintained in sync with the system cklock

* Feature:
 message for your changes. Lines starting

Co-authored-by: AndreaRigoni <andrea.rigoni@igi.cnr.it>

* Add configuration for building on Ubuntu 22.04 (#2485)

* Add configuration for building on Ubuntu 22.04
Fix errors reported by gcc 11

* Add Ubuntu22 to linux.xml

* Gm fix connection (#2489)

* Fix:Wrong shell files

Shell files CompileTree and DecompileTree did not make correct reference to java classes (now in mds.jtraverser package)
On many linux systems xinetd /etc/rc.d/init.d/xinetd  does not exist anymore and service command must be called instead.

* Fix:Connection object

Fixed wrong check of connectionId returned by MdsConnect. It can be zero

Co-authored-by: mdsplus <mdsplus@ropc1.rfx.local>

* Gm wrtd timer (#2491)

* Support for WRTD timer

* Added devices

* Fix:WRTDTimer related devices and dw setup (#2497)

* Gm wrtd driver (#2502)

* Fix:MARTE2_COMPONENT & MARTE2_STREAM

* Fix:DeviceInputs bean

* Fix: Gm wrtd driver (#2504)

* Fix:MARTE2_COMPONENT & MARTE2_STREAM

* Fix:DeviceInputs bean

* Fix DTT MARTe2 Device setup

* Fix MARTe2 DeviceInputs Bean

* Fix wrong segment len for OUT TIME

* Fix:pass missed ronly flag passed to Tree constructor

Co-authored-by: mdsplus <mdsplus@scdevail.rfx.local>

* Use PickSampleGAM when required (#2505)

Co-authored-by: mdsplus <mdsplus@scdevail.rfx.local>

* Fix wrong type in MDSWriter (#2506)

Co-authored-by: mdsplus <mdsplus@scdevail.rfx.local>

* Fix: APD/EmptyData deserialization (#2518)

* fix: APD/EmptyData deserialization

* Fix: cleaner and faster code by eliminating __getattr__

* Add socket/service files for running per-connection mdsip servers with systemd (#2510)

* Fix: Support thick mixed with local and distributed in a tree path (#2526)

* Fix failure to get/set shot id with :: at end of path

When getting/setting the current shot with a multi-part tree path that ends with ::,
we currently fail to traverse the list of paths.

For example:
```
$ export "cmod_path=/tmp;alcdata-new::"
$ mdstcl show current cmod
Connect failed to host: /tmp;alcdata-new
Failed to get shotid.
```

This commit allows traversal of the list of paths, so this now works:
```
$ export "cmod_path=/tmp;alcdata-new::"
$ mdstcl show current cmod
Current shot is 1170112002
```

If there is a more standard way to traverse the list of paths, I would love to use it.

* Feedback from Josh

* Fix: Bug in PR #2526 (#2533)

Replace exp with experiment_lower in TreeGetCurrentShotId() and TreeSetCurrentShotId().

* Fix:corrected missing copy of isEdit flag in Tree(Tree *) constructor (#2534)

* Major update of NI6683, NI6368EV and fix on CRIO_MPAG devices (#2539)

Co-authored-by: mdsplus <mdsplus@mcpsl-pcf.codac.iter.org>
Co-authored-by: Andrea Rigoni <andrea.rgn@gmail.com>

* Add RHEL9 build, remove xinetd package dependency, bug fixes (#2541)

Add rhel9.opts to use mdsplus/builder:rhel9 (which is based on Rocky Linux 9.1, as CentOS 9 does not exist)
Remove xinetd as a package requirement in linux.xml.
If /etc/xinet.d/ is present, it will still install the config files, but the packages will no longer depend on it. This is because we now have systemd files as an alternative, and RHEL9 has officialy dropped support for it, so this seems like a good time to cut ties.
When installing the `mdsplus-*-kernel` package on a system without xinetd or systemd, the post install script will throw errors trying to copy files. Instead, we wrap them in if statements to only copy the files if those directories exist.
Replace `.getiterator()` with `.iter()` to fix error when using `xml.ElementTree` with python3 in `redhat_build_rpms.py` and `alpine_build_apks.py`
This already seemed to be fixed in `debian_build_debs.py`
Removed seemingly duplicate include for `libdc1394_support*.so` in linux.xml

* Fix: standardizing on pass by reference for exception handling (#2544)

* Fix: standardizing on pass by reference for exception handling

* Went to far removing .what(). In this case it was needed.

* Missed changing two files to pass by reference

* Refactoring per Gabriele's comments.

Undo removal of what() from printf
Undo removal of what() from ostream <<

* Improved the setting of gains and offsets (#2484)

* Improved setChanScale format, setGainOffsets and added computeGains()

* Fixed setGainOffsets()

* Improved argument names for setChanScale()

* Fixed resampled node call to setChanScale()

* Fixed bug in setGainsOffsets

* Fixed bug in computeGains()

* Modified setChanScale for D-Tacq firmware is v498 or greater

* Imrpoved comment

* Improved comment in setChanScale

* Added a resetting function for the Gains and offsets, in case this is needed to solve Amy's findings

* Reverted changes added to reset gains

* Fixed a bug in setGainOffsets()

* Added posibility that the SC_GAIN node contains a list of G1 and G2, not just G12

* Fixed adding the  posibility that the SC_GAIN node contains a list of G1 and G2, not just G12

* Added error handling for the gain inputs

* Corrected error text

* Removed try/except from setGainsOffsets. Added a float conversion instead.

* In the 423/435 stream devices, dev.copy() was moved back to the main process. (#2477)

* Moved the tree copy (dev.copy()) from the MDSWorker thread back to the main process thread.

* Moved the creation of chans[] and decim[] to MDSWorker thread.

* For 423 devices, dev.copy() has also been moved

* Converted self.chans and self.decim to chans and decim.

* Added static variable NUM_CHANS_PER_SITE

* Corrected the calls to the static variable NUM_CHANS_PER_SITE

* Create a tuple containing an instant of the tree to be used in the MDSWorker thread (instead of using .copy()

* Break tuple into 3 variables

Replace info tuple with tree, shot and path to bring a copy of the tree to the worker thread.

* Gm dtacq (#2562)

* New devices&forms for DTACQ devices

* Make Java happy

* Added devices

* Import under try

* Remove print

---------

Co-authored-by: AndreaRigoni <andrea.rigoni@igi.cnr.it>

* Gm matlab apd (#2573)

* Use APD to map MATLAB (arrays of)structures

* Feature:allow any MATLAB type including recursive (array of) structures be stored and retrieved in MDSplus via APD

Some fixes required as well as MATLAB routine extensions

* Adedd support routines for generic structure storage

* removed debug printf

---------

Co-authored-by: AndreaRigoni <andrea.rigoni@igi.cnr.it>

* Update influxSignal.py (#2577)

Change the * to a + in the regex replace triggered by deltaTime being set. This was leading to a strange bug (possibly caused by going from python2 to python3) where it would match the entire fieldKey, and then match the empty space at the end of the line. 

So instead of:
`MEAN(fVal)`
it would become:
`MEAN(fVal)MEAN()`

.. which would then make InfluxDB reject our query.

* Gm update devices (#2578)

* Fix:last fixes for DTACQ support

* Forgotten interface

* Fixed EPICS time; Removed deadclock source

* to allowalpha pull

* A few fixes after MDSPlus course for DEMO ADC and Streamed ADC

* Added support for ANY usage

* Added dispatch monitor features

* Support library for camera devices updates

* Device setup updates

* Python devices updates

* Tdi device updates

* Test programs for ELAD front end modules

* New device setup form

* New python devices

* New functions support

* Test Elad front end program

* Fix xmin xmax limit in EvaluationShots method`

* Added RFX devices

* forgotten fix

* make python3 compatible

* Added semicolon to avoit debug messages (#2582)

* Gm fix connection (#2606)

* Fix:Connection thread safe C++ object

Made now thread safe

* Fix

* Removed comments

* Fix:invalid tree context in Data::execute (#2621)

* Gm fix devices (#2622)

* Fix:fixed several issues

* Fix:fixed lasr issue

* Fix:fixed another last  issue

* Fix:fixed another last  issue

* Added devices

* Allow APD data be accessed via thin client via Connection.get() (#2620)

* Fix:Connection thread safe C++ object

Made now thread safe

* Fix

* Removed comments

* Feature:allow PAD data be returned by Connection.get

Implemented by retrieving serialized version and deserializing it locally

* Fix: 2625 - start connection IDs at 1 (not zero)  (#2626)

* Proposed fix for Issue 2625

* Proposed change (part 2) for Issue 2625

* Edit a comment for Issue 2625

* Fix: 2625 - Restore do loop for integrity check

---------

Co-authored-by: Mark Winkel <mwinkel@psfc.mit.edu>

* Revert "Fix: 2625 - start connection IDs at 1 (not zero)  (#2626)" (#2627)

This reverts commit 389ed8679a61c6907be9d3cf0b4ae0cb72bcd57f.

* Fix: 2625 v2 - only change IDL for socket 0 issue (#2628)

Co-authored-by: Mark Winkel <mwinkel@psfc.mit.edu>

* Revert "Fix: 2625 v2 - only change IDL for socket 0 issue (#2628)" (#2630)

This reverts commit 443018ead5ae95bb74e89da924d795de5ab449a7.

* Fix: 2625 v4 -- IDL socket 0 issue and partial revert of PR #2620 (#2635)

* Fix: 2625 - IDL only by replace keyword_set()

* Fix: 2625 - Integrity check when add connection

* Fix: 2625 v4 - update debian packaging for IDL

* Fix: 2625 - correct typo in comment

* Fix: 2625 v4 - mdsdisconnect now correctly passes connection ID to the C library, mdsipshr

* Fix: 2625 v4 - more explanation of why mds_keyword_set() is needed

* Fix: 2625 v4 - revert PR 2620 connection.py change

* Fix: 2625 v4 -- add new IDL file to package for RHEL / Rocky

---------

Co-authored-by: Mark Winkel <mwinkel@psfc.mit.edu>

* Gm rfx devices (#2636)

* Feature:new RFX devices

* Removed obsolete device

* Demo ready for ICALEPCS

* Fix unsupported Use case

* added devices

* import serial in try block

* It broke another thing

* Fix:untested use cases (#2641)

* Fix: Change forking to simple in the systemd mdsip service (#2644)

Type=forking is expecting the parent process to die, which I thought ours was doing but apparently isn't

* Issue 2625 - IDL test harness (#2643)

* Issue 2625 - IDL test harness

* Issue 2625 -- exclude IDL test harness from packaging

* Issue 2625 - correct IDL-2638-loop test, comment failing tests for future bug fixes

---------

Co-authored-by: Mark Winkel <mwinkel@psfc.mit.edu>

* Fix: dir /full segfault (#2647)

* Fix: dir /full segfault #2646

This bug was introduced when alternate compression methods were added
commit 9829bf64dd7f3b9158f4d98172594fc523e4b746

That code allocated the NCI byte `spare2` to hold the index of the
compression method.  It assumed that if it was not filled in it would be
zero.  This turned out not to be the case for some nodes in some trees.

This PR adds a DBI that says wether or not to look at this byte.  The
** the default value of this DBI is ignore the compression method byte. **
To enable alternate compression methods, which have been seldom used
(never?) set the DBI DbiADVANCED_COMPRESSION to true.

- open the tree for edit
- call TreeSetDbi with DbiADVANCED_COMPRESSION set to 1
- write the tree

There will be TCL verbs to set and get this database attribute.

The database characteristics (DBI) have been initialized to zero since
https://github.com/MDSplus/mdsplus/blame/263cc8bce631b291f17ae2c3e09164c55a067b93/treeshr/TreeOpen.c#L659

Example that sets the flag:
```
int main()
{
  int status;
  int length;
  int one = 1;
  DBI_ITM cmp_itm[2];
  const char *tree = "test";
  const int shot = 42;

  cmp_itm[0].buffer_length=1;
  cmp_itm[0].code = DbiADVANCED_COMPRESSION;
  cmp_itm[0].pointer = (void *)&zero;
  cmp_itm[0].return_length_address = &length;
  cmp_itm[1].code = DbiEND_OF_LIST;

  status = TreeOpenEdit(tree, shot);
  if (!(status & 1)) {
    printf("open - status = %d\n", status);
    exit(1);
  }
  status = TreeSetDbi(cmp_itm);
  if (!(status & 1)) {
    printf("SetDBI - status = %d\n", status);
    exit(1);
  }
  status = TreeWriteTree("nb", 196500);
  if (!(status & 1)) {
    printf("Write - status = %d\n", status);
    exit(1);
  }
}
```

---------

Co-authored-by: Stephen Lane-Walsh <slwalsh@psfc.mit.edu>
Co-authored-by: Fernando Santoro <fsantoro@psfc.mit.edu>

* Gm redpitaya (#2653)

* Added RFX_TRIGUART device and support C code

* New RedPitaya with synchronous 1MHz clock

* New RedPitaya with synchronous 1MHz clock

* removed useless file

* New RedPitaya release

* New RedPitaya release

---------

Co-authored-by: AndreaRigoni <andrea.rigoni@igi.cnr.it>

* Added Basler USB cameras support (#2666)

* Build: Initial Jenkinsfile (#2672)

* Initial Jenkinsfile

Add a basic Jenkinsfile that will build and test
Includes fixes for deploy/*.sh scripts that didn't work on the new jenkins

* Add build badge to README, add --rm to platform_build.sh

* Remove docker networks from build.sh

* Build: Add cleanWs() after build to the Jenkinsfile (#2675)

* Build: Fix build.sh to test the ExitCode from docker (#2677)

After modifying build.sh to be more Ctrl+C friendly, it no longer tests the result of the docker command properly
This will wait for `docker logs` to return, meaning the container has exited, and then it will check the ExitCode property through `docker inspect`.

* Build: Fix test event port, collect and import test results into Jenkins (#2678)

The event port is now computed from OSList.indexOf(OS) instead of $EXECUTOR_NUMBER
Artifacts will now archive regardless of the build result

Add followSymlinks: false to archiveArtifacts
Replace glob.glob with pathlib.Path.glob because it doesn't follow symlinks

* Build: Isolate each build/test in their own network (#2680)

Add --dockernetwork to build.sh
Default to "jenkins-$EXECUTOR_NUMBER-$OS_INDEX"
    e.g. jenkins-3-ubuntu22
Create/Use/Delete the network with docker, if specified

* Gm fix apd (#2661)

* Fix for APD management

* Fix APD arguments (now serialized)

* Fix:Required to handle serialized arguments

* Fix compiler error

* Fix:handle returned code errors, added support for writing APD data

* Remove ` in experssion

* remove `

* $ requires a different handling

* Workaround for Tdi Compiler bugs

* Fix missing parenthesis

* Fix forgotten Dtype definition for List

* Fix:fixed wrong size and shape

* Added cell support for matlabe interface

* Fix string array

* non numeric data support added

* Packages updated

* Make tdi test tab happy

---------

Co-authored-by: AndreaRigoni <andrea.rigoni@igi.cnr.it>

* Add define for DbiTREE_VERSION to alpha (#2682)

* Build: Add IDL tests to the Jenkinsfile (#2679)

* Work on adding the IDL tests to the Jenkinsfile

Refactor idl/testing/run_tests.py to use argparse instead of env variables
Add defaults so we don't have to specify them all in the Jenkinsfile
Leave a stub for adding MATLAB later

* Fix Jenkins truncating JUnit logs

* Add set +x to quiet sourcing setup.sh

* Build: Restructure the Jenkinsfile (#2685)

* Restructure the Jenkinsfile

Remove the dependency on 'camunda-community'
Use a try/finally to collect test results immediately, instead of in the global post {}

* Groovy syntax error

* Add withEnvironment

* Groovy syntax error

* Groovy syntax error

* Build: Begin work on publishing (#2686)

Add the publishing step, which will:

Calculate the new version with get_new_version.py, which replaces commit_type_check.sh
Build a release with that version for each distro with build.sh --release=VERISON
Publish each distro with build.sh --publish=VERSION
Create a github release
All debian and redhat platforms will also generate tgz files for the GitHub release
One tgz for /usr/local/mdsplus, and one for all the debs or rpms

* Build: Treat unprefixed commits as "Fix:" (#2692)

* Build: Give the tgz files absolute paths (#2694)

* Build: Add darren as an Admin so he can build PRs (#2699)

* more parameters for marte2_simulink_generic (#2700)

* Build: Sign packages with Jenkins, upload Windows installers to GitHub (#2701)

Rename `tarfiles/` folder to `packages/`

* Build: Actually publish to the real directory (#2702)

* Fix: Release versioning was always 1 behind (#2705)

Jenkins now re-runs bootstrap after calculating and tagging the new version
This also moves the --release/--publish=VERSION into a separate parameter called --version=VERSION
This now properly passes the BRANCH/RELEASE_VERSION into bootstrap for us to override the git info

* Fix: Restructure the Jenkinsfile to better support versioning (#2707)

Move the Calculate Version step to the top
Call Bootstrap once with the version (or 0.0.0 for PRs)
Instead of having Test Packaging/Release steps, just have one Release step

* Build: add MATLAB tests (not using mdsip) (#2674)

* First draft of MATLAB tests

* Exclude the MATLAB tests from packaging

* Build: MATLAB tests rewritten using argparse

---------

Co-authored-by: Mark Winkel <mwinkel@psfc.mit.edu>

* Fix: Switch to pyproject.toml and stub setup.py for better pip / setuptools compatibility (#2698)

* Change to pyproject.toml with stub setup.py

* actually this requirest setuptools >= 60.0.0 for the blank setup.py

* add pyproject.toml to dist files

* enable python2.7 and setuptools<60 support

* using slightly cleaner api

* change correspondence email

* Build: configure PYTHONPATH for IDL test harness (#2711)

* Build: configure PYTHONPATH for IDL test harness

* Build:  minor edit to MDSPLUS_DIR for the IDL test

* Build: another syntax correction for MDSPLUS_DIR on IDL tests

* Build: enhance IDL test harness by adding write tests and more read tests (#2656)

* Build: enhanced IDL tests now use argparse

* Build: parameterize the "write" tree for IDL tests

- no longer uses hard coded tree name and shot number
- also changed file permissions on run_tests.py

* Build: use temporary directory for transient files

---------

Co-authored-by: Mark Winkel <mwinkel@psfc.mit.edu>

* Build: parameterize the "write" tree for MATLAB tests (#2712)

* Build: parameterize the "write" tree for MATLAB tests

- no longer uses hard-coded tree name and shot number
- also deleted a comment that was no longer applicable

* Build: MATLAB tests now use a temporary directory

---------

Co-authored-by: Mark Winkel <mwinkel@psfc.mit.edu>

* Revert "Feature: PR #2620 and PR #2661 (#2720)"

* Revert "Gm fix apd (#2661)"

This reverts commit bde7c51300e2fa7747c2a71800da42b63dde713e.

* Revert "Allow APD data be accessed via thin client via Connection.get() (#2620)"

This reverts commit d996b0ce5a7f2a840db49ff8dbe60b652b53090e.

* Build: correctly format expected output for a MATLAB test (#2722)

Co-authored-by: Mark Winkel <mwinkel@psfc.mit.edu>

* Add 'Release' as a commit prefix (#2727)

Using 'Release' will now use the exact version that commit is tagged with, instead of trying to bump the version based on the other commits
Add comments to get_new_version.py

* Gm apd java (#2729)

* Improve APD support for Java interface

* Improve APD support for Java - forgotten files

* Commit packages

* Fix: reduce open files due to dispatcher (#2740)

* Fix: reuse action_server connection id in ServerConnect; avoid duplicates in list

* Fix: set dispatched early; unset if dispatching failed; prevent race on fast actions

* Fix: lock Clients in ServerQAction; cleanup and check before use

* Fix: reconnect dropped connections

* Fix: use correct windows SOCKET print format

* Fix: satisfy rhel7 c standard

* Gm apd thin cpp (#2742)

* Added ADP support in C++ thin client

* Added tdi fun

* Added TDI FUn

* Fix commands

* Gm new marte (#2743)

* more parameters for marte2_simulink_generic

* Proceed with the new implementation

* Proceed

* Proceed

* Proceed

* Proceed

* Proceed

* proceed

* Proceed

* Proceed

* Partially tested version

* Added execution times recording

* Proceed

* Procced with debugging

* Proceed

* Proceed

* Proceed

* Fixes for multisampled acquisition

* Remove quotes from string parameters

* Minor fixes

* Procced debugging

* Debugging

* More channels

* Debug Distributed configuration

* Fix sognal recording for synchronized inputs

* Further debug

* Further debug

* Small fixes

* Close ti final version

* Forgotten fix

* Make port visible, fix parameter name

* unaligned nids

* Increase DiscontinuityFactor

* Discontinuityfactor

* More channels

* Proceed with the new implementation

* Proceed

* Proceed

* Proceed

* Proceed

* Proceed

* proceed

* Proceed

* Proceed

* Partially tested version

* Added execution times recording

* Proceed

* Procced with debugging

* Proceed

* Proceed

* Proceed

* Fixes for multisampled acquisition

* Remove quotes from string parameters

* Minor fixes

* Procced debugging

* Debugging

* More channels

* Debug Distributed configuration

* Fix sognal recording for synchronized inputs

* Further debug

* Further debug

* Small fixes

* Close ti final version

* Forgotten fix

* Make port visible, fix parameter name

* unaligned nids

* Increase DiscontinuityFactor

* Discontinuityfactor

* More channels

* Packages updated

* Remove print

* Remove error messages

---------

Co-authored-by: mdsplus <mdsplus@roactive2.rfx.local>

* Docs: Improve documentation for getSegment* python wrappers (#2732)

Add explanation and rename parameters for:
* getSegmentLimits
* getSegmentList

* Fix: improve mdstcl's error handling and add comments (#2746)

* add comments regarding action service

* send_reply() now does cleanup_client() on bad socket

* explain mdstcl's receiver thread cannot access main thread's connection list

* Improve handling of non-MDSplus error codes

* add comments regarding action dispatch

* add comment explaining receiver thread select loop

* Build: Resolve linker error after updating the windows builder to Fedora 39 (#2749)

* Build: Resolve linker error after updating the windows builder to Fedora 39

This appeared after updating the mdsplus/builder:windows docker image to Fedora 39, and Wine to 9.0
The newer libxml2 tried to link dynamically unless we explicitly set LIBXML_STATIC

* Hopefully fix the MdsTreeNodeTest

It turns out that this was failing previously, but we weren't properly catching the error

* Fix errors in windows build from newer gcc

* Fix: mdstcl's `show current` no longer segfaults when no tree paths defined (#2754)

* Fix: "show current" no longer segfaults when no tree paths defined

* Fix: corrected typo in error message

* Use original error message so tests pass

* Fix: Add Debian 12 and Ubuntu 24.04 and support GCC 12+ (#2753)

* Build: Add Debian 12 and Ubuntu 24.04

* Add extra flags for GCC 12+ and stub imp for Python 3.12

GCC 12+ triggers a bunch of false positive warnings (which we treat as errors)
This adds AX_C_FLAGS to configure those `-Wno-*` flags for GCC 12+
`cmdExecute.c` now uses snprintf to avoid buffer overflow warnings, also generated by GCC 12+
`compound.py.in` now supports Python 3.12+

* compound.py now supports Python 2.7.. again

---------

Co-authored-by: Stephen Lane-Walsh <slwalsh@psfc.mit.edu>

* Build: Fix off-by-one versions produced by Jenkins (#2756)

This fixes the bug where `--os=bootstrap` wasn't receiving the version from `--version=x.y.z`
However, confusingly, this also changes the Jenkinsfile to not use that feature, and instead use `git tag` in order to embed the proper git information as well as the proper version information
The `--os=bootstrap` and `--version` fix is still included just so that it doesn't break if someone else tries to use it

* Build: Increase default test timeout to 1h (#2757)

When the build server(s) are at capacity, it's not unreasonable for a test to take more than 10 seconds, which was the old default timeout
This sets the default to 1h, and removes the overrides in various tests

* Build: Fix 'HEAD' in `show version` and tag error (#2758)

Jenkins builds in a detached HEAD state, which caused bootstrap to use HEAD as the branch name
We pass --branch= to the bootstrap call in Jenkins, but $BRANCH wasn't being passed into the bootstrap docker container
Also, attempts to build alpha versions with tags that already existed failed

* Fix: mdstcl show version tag and links (#2760)

Fixes Issue #2759

* Build: Add package override for ubuntu and debian (#2761)

Override sections for Ubuntu 24 and Debian Bookworm were added.

* Feature: Add "Date:" to show version output (#2767)

Implements #2766

Example:
```
$ mdstcl sho ver

MDSplus version: 7.140.75
----------------------
  Release:  alpha_release-7-140-75
  Date:     Thu May 16 17:43:14 UTC 2024
  Browse:   https://github.com/MDSplus/mdsplus/tree/alpha_release-7-140-75
  Download: https://github.com/MDSplus/mdsplus/releases/tag/alpha_release-7-140-75
```

* Fix: Missing repo metadata signing (#2770)

This will hopefully fix the lack of signed metadata files that are preventing us from automatically publishing releases

* Feature: jScope improvements (#2774)

Feature:jScope improvements

1) Expressions allowed also for signal labels
2) Allowed definition of last (absolute) times window by setting in xmin <window size in s>:

* Docs: Update GitHub Bug Report Template (#2775)

Request both a client and server version
Add instructions for getting the client and server versions
Request the installation methods

* Fix: sign the RPM package repo's metadata (#2773)

* Fix: sign the RPM package repo's metadata

* Now uses the MDSplus signing key that is on the build system.

* Minor edit for conciseness regarding GNUPGHOME

* Build: Lower test timeout to 20 minutes (#2776)

Some tests are spinning, so I'm lowering the global timeout to 20 minutes
We will need to investigate the spinning tests

* feature: jtraverser2 dynamically convert non-native floating formats (#2616)

* feature: jtraverser2 dynamically convert non-native floating formats

* Test: mdsplus-api - added FLOAT_Test for F, D, and G floats

* Fix: mdsplus-api - fixed support for d, g, and f floats; work around mdsip limitations; fixed ROPRAND

* Test: mdsplus-api - added tests for FLOATArray

* Build: bootstrap - export PYTHON for scripts like python/generate_tests (#2733)

* Build: Fix version bumping (#2778)

get_new_version.py was never resetting the minor or patch portions of the version
So 1.2.3 with a feature commit would g ive you 1.3.3, instead of 1.3.0

* Build: Monkeypatch nose to support python3 (#2779)

In python3:
  collections.Callable -> collections.abc.Callable

So we test for collections.Callable and then monkeypatch collections if it's not there

* fix: clean rfa de-/encoding (#2783)

* build: ignore test output (#2782)

* Build: MATLAB struct tests (#2787)

* first struct and cell tests

* coverage results commented and correct comparison string added

* Feature:add methods dim_of and units_of to C++ TreeNode (#2786)

Feature: Make it compatible with python API

* fix: make TreeGetRecord call _TreeGetRecord(*TreeCtx(), nid_in, dsc) (#2784)

* Fix: XNCI writes corrupting node data (alternative) (#2785)

* test: TreeSegemts also affects xncis

* fix: cleanup attribute updates; only unlink record if segment is written

* return if put dsc failed

* fix: TreeGetRecord preserve program flow and status

* Fix: Error importing MDSplus with numpy 2.x (#2797)

Wrap the import of string_ and unicode_ with a try/except in case numpy >=2 is used.
Fixes #2793

* Fix: local RHEL builds don't sign RPM metadata (#2801)

This fixes Issue #2800.

* Fix: Remove obsolete public RPM package signing key, and fix other RPM signing issues. (#2798)


Deleting the obsolete public RPM signing key revealed several issues with RPM signing.
The initial PR #2773 was incomplete and caused a cascade of problems: #2777, #2800, and #2801.
This PR fixes all of those issues.

* works with both signed and unsigned builds
* the “repo” package now is only for signed builds, no "repo" for unsigned
* the “gpg-pubkey” now has the correct fingerprint for the current key
* obsolete public key (used as a placeholder) has been deleted
* only includes a copy of the public key if doing a signed build
* checks the package repo’s metadata on signed builds

* Changes to support numpy 2+ (#2809)

Replace instances of `.array(copy=False)` with `.asarray()` as per numpy errors
Add error handling to scalar and array types to handle numpy 2+ throwing OverflowError exceptions when passed a negative or oversided number for the given type, e.g. `Uint8(-1)`
This now explicitly calls the equivalent ctypes type to do the integer wrapping for us, which will be quite slow by comparison
Rewrite the `data_scalars` and `data_arrays` test cases to use more sensible numbers that can fit within a Int8, plus some cleanup

* Fix: putMany (#2814)

* Fix: putMany

There are two issues fixed, the first is that the `EXPORT` on `PutManyExecute` was missing, leading to the `LibKEYNOTFOU` error when calling the method from TDI
The second is in `putManyObj`, where we were passing `argsData->getDscs()` as `mdsdsc_t` pointers, when they were in fact `Data` pointers.
There is now a temporary vector that holds the results of `convertToDsc()` for each `Data`, and then calls `freeDsc` at the end of the function.

This fixes #2813 and unblocks the mdsthin putMany test

* Remove range-based for to make RHEL7 happy

* Build: Allocate ports to each test (#2815)

This work was originally part of the CMake branch, which I am breaking up into smaller PRs
This will allow the tests to run in parallel, so long as `$TEST_INDEX` and `$TEST_PORT_OFFSET` are set appropriately.
The list of ports allocated to each test is in `testing/ports.csv`
Each test has been updated to use the specific ports from that CSV

* Feature: Deprecate TreeNOTOPEN in favor of TreeNOT_OPEN (#2816)

* Fix: Deprecate TreeNOTOPEN in favor of TreeNOT_OPEN

This work was originally part of the CMake branch, which I am breaking up into smaller PRs
This is not quite as open and shut as I would like, but there's only one file that uses `TreeNOTOPEN` and that's in `TreeSetNci.c`
That and the two names are very confusing
Setting `TreeNOTOPEN` as deprecated will inform how the exception classes are generated, and should clean some things up
We could consider removing the `TreeNOTOPEN` returns from `TreeSetNci.c`, however I'm worried this will break someone else's code that was looking for those values.

* Replace TreeNOTOPEN with TreeNOT_OPEN

The only place it was used was in `treeshr/TreeSetNci.c`

* Feature: Use python to generate files (#2818)

* Feature: Use python to generate files

This work was originally part of the CMake branch, which I am breaking up into smaller PRs
This is being tagged as "Feature:" instead of "Build:" due to how many files it touches, I want to update the minor version number.

Add opcodes.csv to be parsed by gen-* scripts
Convert opcbuiltins.h and tdishr.h into generated files, generated by opcodes.csv
Update, Rename and Add gen-* scripts
Move yacc/lex files out of yylex/ directories
Update bootstrap to call new gen-* scripts

* Fixes to support Python 2

* Run autopep8 on all gen-* scripts

Rename `gen-python-compound.py` to `gen-python-MDSplus-compound.py` to match the format of:
`gen-include-tdishr.py` -> `include/tdishr.h`
`gen-tdishr-TdiHash.py` -> `tdishr/TdiHash.h`
Which works except for the yacc/lex and messages/exceptions generation scripts.
Remove shebang line from `gen-messages-exceptions.py` and `gen-python-MDSplus-compound.py`
Remove executable permission on those files as well, they should only be called as `python3 deploy/gen-[name].py`

* Refactor, Delete tdishr.h

Actually delete tdishr.h, which should have been done already, it will now be generated by `gen-include-tdishr.py`
Refactor file opens to use `with` statements
Refactor multi-line strings to chains of single-line strings, to better handle indentation

* Fix TdiHash.c generation, Add include/tdishr.h to gitignore

Fix whitespace in intermediate `TdiHash.c.in` file

* Deleting opcbuiltins.h.. again

* Feature: Fix compiler warnings, Refactor and Cleanup (#2817)

* Feature: Fix compiler warnings, Refactor and Cleanup

Add fixes for many compiler errors
Mostly this is just adding `__attribute__((unused))`
Fix some of the formatting broken by clang-format
Replace many `sprintf` with `snprintf`
Rename `wait` to `short_wait` to avoid name conflicts in `UdpEventsTest`
Remove unused test functions for labview

* Remove line instead of commenting it out

* Remove VMS syntax (#2824)

* Size of TAG_NAME is 24 (#2822)

Co-authored-by: Josh Stillerman <jas@psfc.mit.edu>

* Fix: Handling of java options in configure script (#2827)

* Fix: Handling of java options in configure script

The first fixed issue was that whenever providing --disable-java or --enable-java with any value, the conditional ENABLE_JAVA was set to TRUE.
This meant that even with --disable-java running the tests would also run the java tests even though the java classes were not built.

The second fixed issue was that a warning message about a missing JDK was always displayed even with --disable-java.

* Fix: compound.py imp deprecation (#2843)

* suppress already-handled deprecation

* prefer specific exception

* add reference to deprecation disclosure

* Fix: systemd incorrectly reporting services failed (#2846)

* Fix: quiet python_module_remove.sh error (#2848)

Silence the errors while attempting to uninstall all copies of the MDSplus python packages

* Build: Add heidthecamp to the Jenkinsfile (#2850)

This will allow heidthecamp to trigger Jenkins builds

* Fix: the Fortran aliases now declare the remote calls (#2851)

* Build: Add include/opcbuiltins.h to the .gitignore (#2852)

This is due to PR #2818 that generates opcbuiltins.h during bootstrapping

* Fix: Close socket in MDSUdpEventCan (#2847)

* Fix: Close socket in MDSUdpEventCan

Fixes #2830
When a UDP event listener is cancelled, the thread was killed but the socket was not closed.

* Hopefully fix tsan data race

* Feature: Python API to return variable length NCIs (#2853)

Allow the Python API to return variable length node lists for:
* MEMBER_NIDS
* CHILDREN_NIDS
* CONGLOMERATE_NIDS

This works by checking the NCI code and getting the corresponding 'number_of_{x}' property to determine the size of the buffer

This will, however, add an extra C function call whenever those NCIs are accessed, but this is more correct.

Fixes #2788

* Build: Always use --dockernetwork for Jenkinsfile (#2855)

Currently we only use a unique --dockernetwork for the Test stage
This will add it to all stages to ensure a unique network is always available

* Build: Update Linux dependencies (#2849)

* Build: Add motif dependencies to linux.xml

Add a platform-specific dependency for each platform for motif

* Remove erronious package requirement

* Resolve unclosed tags

* Resolving missing package libxml2

* Fix: package installation process

change linux_bulid_packages.py to correctly fallback to platform if disribution is not set.
update linux.xml to use fallback for common packages.

* Feature: Remove dynamic linking to readline in tdic (#2854)

* Feat: Remove dynamic linking to readline package in tdic

Update Makefile to remove -ldl.
Update tdic.c to no longer dynamically link files.
This avoids errors with libreadline not packaging with a non-versioned .so.

* Removed refrences to no readline as it is not a case we accept anymore

* Fix: C++ TreeNode remove unnecessary Tree duplication (#2861)

* Fix:Avoid Tree duplication

The same Tree instance is used now among TreeNodes created from the same Tree. This in order to avoid the
proliferation of new contextes and consequently of new open sokets in distributed configuration

* Fix:avoid proliferation of open trees

When creating a new tree node from a tree, the tree reference is passed straight. To avoid multiple free() calls the tree is not freed at TreeNode destructor

* Fix:compile error

added forgotten }

* Fix: Cleanup PR#2861 (#2862)

Remove printf
Reword comments

* Fix: the find_fun() function now checks the status variable

* Fix: simplify the findfile_fun() function

* Fix: remove a stray comment from the findfile_fun() function

* Fix: undo most changes to findfile_fun()

* Fix: Apple Silicon - define MACOS_ARM64

* Fix: Apple Silicon - add a define

* Fix: Apple Silicon - add () to #if defined check

* Fix: restore an #endif inadvertently deleted from mdsplus.h

* Fix: Apple Silicon - move MACOS_ARM64 define to configure.ac

---------

Co-authored-by: Mark Winkel <mwinkel@psfc.mit.edu>

* Fix: delete references to deprecated data types in TdiShrExt.c (#2864)

* dtypes: comment out deprecated data types

* Fix: delete deprectated data types

* Fix: Apple Silicon - use the correct declarations for variadic TDI functions (#2863)

On Apple Silicon, this eliminates most of the segfaults caused by variadic functions.   When the correct function declarations are used, the compiler is able to generate the correct code for Apple Silicon.

This change has no impact on other platforms.

* Fix: Apple Silicon - defines for use with LibCallgFfi() (#2867)

MDSplus only uses the `libffi` library on Apple Silicon.  This PR is preparation for future PRs that add the `libffi` specific code.

Partial fix for Issue #2597.

* Fix: Apple Silicon -- traverser now calls TDI functions directly (#2869)

Directly calling `TdiCompile` and `TdiDecompile` eliminates segfault on Apple Silicon.

This is a partial fix for Issue #2597.

* Fix: add missing va_end() statements (#2868)

* Revert "Fix: Apple Silicon -- traverser now calls TDI functions directly (#2869)" (#2872)

This reverts commit fff60ae80593f19588894e65302b6ca92e744ad1.

The traverser will be fixed with a new PR that uses casts.

* Revert "Fix: Apple Silicon - defines for use with LibCallgFfi() (#2867)" (#2873)

This reverts commit ad9b55ffecc25b20ad16b9610251f79e32440429.

Apple Silicon PRs were being submitted in top-down order.
Reverting this PR so can change to bottom-up order.

* Fix: Apple Silicon - fix traverser segfault associated with TDI intrinsic functions (#2874)

* Fix: IDL tests now have correct paths (#2870)

* Fix: add test for darwin, which is what IDL returns for MacOS (#2876)

* Fix: Apple Silicon -- default path for PyLib environment variable (#2878)

* Fix: correct paths in MATLAB tests (#2879)

* Feature: Enable optional args in python wrapper createPulse (#2866)

* python Tree createPulse - now possible to pass optional arguments to _TreeCreatePulseFile. A bit related to issue #1326

* createPulse: use isinstance(...TreeNode); cast int(...) node_or_nid; copy_only_this Boolean by default

* updated header to reflect copy_only_this is Boolean

* Fix: errors with asan and tsan (#2881)

Update `strtok` to `strtok_r` in `ReplaceAliasTrees` to prevent issues when multithreading. The internal `saveptr` was being clobbered, so this moves it into the function, and therefore safely in the thread.
Reorder the `io_disconnect` function for the thread backend. The cleanup of the last request/response messages were being interrupted by the `pthread_cancel`, which is fixed by moving the `pipe_close` calls above it, matching the windows implementation.

* Fix: Apple Silicon - add libffi for calling variadic functions (#2877)

* MdsLib: include libroutines.h

* MACOS_ARM64: calling LibCallgFfi

* LibCallgFfi: added the function

* LibCallgFfi: enums defined

* LibCallgFfi: enum members renamed VA_N_FIXED_ARGS

* LibCallgFfi: define MDSPLUS_USE_FFI

* LibCallgFfi: revisions as per review of PR 2877

* LibCallgFfi: added status = to the t4012.c device to prevent compiler errors on alpha

* fixed_args: changes to TdiCall.c

* LibCallgFfi: now uses NOT_VARIADIC in TdiCall.c

* TdiCall: switch to MDSPLUS_USE_FFI, etc

* TdiCall: fix indentation

* fixed_args: change to lex

* fixed_args: update Java and TDI tests

* TdiCall: refactor the LibCallg sections

* LibCallgFfi: eliminate MDS_BYPASS_FFI define by adding boolean parameter to interlude()

* TdiCall: ignore compiler warning about unused parameter

* TdiCall: remove unnecessary strdup and strtok

* LibCallgFfi: add comments about limitations

* TdiCall: changes reviewer suggested re tdi_call()

* TdiCall: fix bypass_ffi declaration

* TdiCall: fix num_fixed_args declaration

* TdiExtFunction: now only executes TDI *.fun files

* TdiExtFunction: remove rest of unneded code

* Fix: remote_submit_helper.fun now accepts zero id and has a timeout (#2889)

* Fix: remote_submit_helper.fun now accepts zero id and has a timeout

* Fix: MdsConnect.fun now has correct check for invalid connection id

* Fix: correct typo in MdsConnect.fun

* Fix: LibCallg() now has similar edits to LibCallgFfi() (#2892)

* LibCallg: similar edits to LibCallgFfi

* Fix: assert() replaced with abort() in LibCallg* routines

* Fix: ensure SsINTERNAL not treated as OK status (#2893)

* Fix: mdsip_server no longer uses broken compression level (#2894)

* Fix: delete extra arg to GetAnswerInfoTS() (#2895)

* Fix: improve tditest help message (#2896)

* Fix: correct miscellaneous typos, add comments, disable a debug (#2897)

* Fix: treatment of scalars in TdiTrans.c (#2832)

* Fix:remove race conditions leading to deadlock (#2904)

* Fix:remove race conditions leading to deadlock

Removed several useless synchronized methods and moved waveform computatuon to mai thread

* Fix:insufficient timing precision due to the usage of float instead of double

A new routine has been added (GetXYSignalDoubleTimes) receiving time window inof as double. This will be used by the new jScope versions, still retaining the older one for previlus jScope versions.

* Fix:remove race conditions leading to deadlock

Added small suggested changes.

* Fix:remove race conditions leading to deadlock

Added forgotten import

* Fix:remove race conditions leading to deadlock

reported suggested small changes

---------

Co-authored-by: mdsplus administrator <mdsplus@aidevel.rfx.local>

* Fix: mdsip compression (#2900)

- Set `dlen` after potentially flipping the bytes in `msglen`
- Correctly pass the compressed length to `uncompress()`
- Use the correct type (unsigned) for the compressed message length

* Remove swing from MdsConnection (#2908)

Having 'javax.swing' in MdsConnection was adversly affecting other areas of the code

* Fix: mdsip services zombie processes (#2645)

* Fix: added fix for the mdsip spinning issues

* Changed the way we checked nbytes to avoid mdsip services from becoming zombies

* Feature: Add python copyTo method to Tree and TreeNode (#2857)

* Feature: Add python copyTo method to Tree and TreeNode

This adds the copyTo method to Tree and TreeNode in the python API
copyTo allows you to copy a portion of a tree to another tree, optionally:
* Filtering unwanted nodes
* Copying data
* Copying tags
* Copying XNCIs
* Copying NCIs

Example:
```
t1 = Tree('testsrc', -1, 'READONLY')
t2 = Tree('testdst', -1, 'EDIT')

def skip_devices(node):
    return (node.conglomerate_elt == 0)

t1.BRANCH.copyTo(t2.SECTION, node_filter=skip_devices, copy_xnci=False)

t1.close()
t2.write()
t2.close()
```

* Remove python 3.6 f-strings, replace mdsExceptions with _exc

* Fix Typos

* Rename src to self and add an alias

* Fix Tree.copyTo extra argument

* Update docstring for TreeNode.copyTo

* Further sanitize tag names

* Update TreeNode.copyTo to update tree paths in data

Remove decompile/compile for node data, as this lost floating point precision
Add _update_tree_paths which recursively finds tree nodes in data structures and rewrites them to point at the destination tree

* Add missing getDefault() to _update_tree_paths

* Add workaround for weird Tree/NID scenario

* Feature: Update interpolate.fun to match interpolate_ga.fun (#2845)

This appears to be some more efficient approach to array creation that was previously handled by a while loop.

* Fix: Add upgrades to PG devices (#2571)

* Fix: some upgrades to comply with Python3

* Change to using factory() instead of Acq2106_TIGA

* Change transition times to be in 1/10 of usec

* Improve comments and readability of parts array

* Add wrpg device with signals as outputs instead

* Add wrpg device with signals as outputs

* Remove print statements and comment on the limits of STL states

* Remove print statements, comment on the limits of STL states and add device to noarch packages

* Remove print statements, comment on the limits of STL states and add device to noarch packages

* Remove forgotten print statements and comments

* Update README.md (#2614)

fix the text about returning from the map

* Fix: type compatibility when compiling with Intel Fortran (ifort) (#2531)

* Fix: dispatching any action causes actmon/actlog to crash with SIGSEGV (#2835)

* Fix: dispatching any action causes actmon/actlog to crash with SIGSEGV

* Rename Svr -> Srv

* Always use 2 arguments for callback_done signature

* Fix: recursive calls to CheckIn

* Fix: Amend CommandDone and action_done signature

* Fix: Add actlog test

* Fix: copy of event data for remote events (#2906)

* Fix: fix typo using addEvent instead of removeEvent

* Fix: start test for java remote event handling

* Fix: Test wfevent with local and remote events

* Fix: Fix copy data from remote event

* Fix: typos and uninitialized memory

* Skip wfevent tests

* Skip java tests as well

* Cleanup bad git merge

* Patch for compatibility with numpy >= 2.3

---------

Co-authored-by: GabrieleManduchi <gabriele.manduchi@igi.cnr.it>
Co-authored-by: GabrieleManduchi <andrea.rgn@gmail.com>
Co-authored-by: Fernando Santoro <44955673+santorofer@users.noreply.github.com>
Co-authored-by: Josh Stillerman <jas@psfc.mit.edu>
Co-authored-by: mdsplus <mdsplus@scdevail.rfx.local>
Co-authored-by: cloud <timo.schroeder@ipp.mpg.de>
Co-authored-by: Timo Schroeder <zack-vii@users.noreply.github.com>
Co-authored-by: mdsplus <mdsplus@mcpsl.nbtf>
Co-authored-by: Daniel Austin <dan@fluffynukeit.com>
Co-authored-by: Daniel Austin <daustin@zap.energy>
Co-authored-by: cloud <cloud@ipp.mpg.de>
Co-authored-by: AndreaRigoni <andrea.rigoni@igi.cnr.it>
Co-authored-by: mdsplus <mdsplus@ropc1.rfx.local>
Co-authored-by: mdsplus <mdsplus@mcpsl-pcf.codac.iter.org>
Co-authored-by: mwinkel-dev <122583770+mwinkel-dev@users.noreply.github.com>
Co-authored-by: Mark Winkel <mwinkel@psfc.mit.edu>
Co-authored-by: Fernando Santoro <fsantoro@psfc.mit.edu>
Co-authored-by: fmolon <31769662+fmolon@users.noreply.github.com>
Co-authored-by: Darren Garnier <garnier@mit.edu>
Co-authored-by: mdsplus <mdsplus@roactive2.rfx.local>
Co-authored-by: f-trx <162459233+f-trx@users.noreply.github.com>
Co-authored-by: Antoine Merle <antoine.merle@epfl.ch>
Co-authored-by: Gregorio L. Trevisan <gtrevisan@users.noreply.github.com>
Co-authored-by: heidthecamp <heidthecamp@gmail.com>
Co-authored-by: vadim-at-te <vadim.nemytov@tokamakenergy.co.uk>
Co-authored-by: mdsplus administrator <mdsplus@aidevel.rfx.local>
Co-authored-by: heidthecamp <heidcamp@mit.edu>
Co-authored-by: Mitchell Clark <45748401+ModestMC@users.noreply.github.com>
Co-authored-by: Darren Garnier <garnier@psfc.mit.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug An unexpected problem or unintended behavior tool/tcl Relates to the Tree Control Language or mdstcl prompt US Priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants