Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue984 refactor to implement scheduled polls (second pull replaces first.) #985

Closed
wants to merge 13 commits into from

Conversation

petersilva
Copy link
Contributor

@petersilva petersilva commented Mar 19, 2024

Makes subsctantial progree on #974 but does not close it entirely.

resubmission of #984
(the original PR was clutterred up with rebasing... want to do a squash commit so won't rebase for this PR.)

So this implements the refactoring for #974 .

It worries me, in that I'm not sure what side effects the change will have. @reidsunderland do you have an easy way to test if the queueing thing is fixed by the branch? The branch is: https://github.com/MetPX/sarracenia/tree/issue984
The testing in the flow tests is pretty limited, just a straight vanilla SFTP with a short sleep, a VIP and failover.

Things I am wondering about:

It doesn't do a poll on startup, currently. it waits for one interval before the first poll. easy to fix... should we?

So now in poll, the loop towards the end of the run() routine is only used to sleep for 1 second at a time... and the real waiting is done in the scheduled.poll plugin instead.

In watch... still using long sleeps in that loop at the end... Is this an opportunity to simplify sr_watch also... so it also uses the same methods? Is there an advantage to doing it that way?

I'm wondering if there is a way to have watch use a shared duplicate suppression cache the same way as poll does, populating itself with the output exchange binding queue. Same benefit as poll, run on two machines with a vip and failover.

wondering if this change might be an opportunity to simplify sarracenia.flow/init.py run() routine. It feels too complicated.

@petersilva petersilva changed the title Issue984 second pull replaces first. Issue984 refactor to implement scheduled polls (second pull replaces first.) Mar 19, 2024
@reidsunderland
Copy link
Member

i'll test on dev

@reidsunderland
Copy link
Member

I do think this "It doesn't do a poll on startup, currently. it waits for one interval before the first poll. easy to fix... should we?" should be fixed.

If something is wrong, a tail of the logs after starting the config should show poll errors. If it doesn't poll right away you might not see possible errors.

@petersilva
Copy link
Contributor Author

petersilva commented Mar 20, 2024

  • fixed the doesn't poll on startup thing.

Also:

  • found a wacky infinite loop where:
    • localtz is utc-4, so in utc it is already tommorrow... when calculating "tommorrow" it would say midnight, but midnight was an hour ago, so sleep(<0) ... lovely. only happens betwen 20 and 24h local time.
    • Also requires local running process to be not in utc... so unlikely to occur on our servers. but annoying on development.
    • change tommorrow calculation to be tz aware and utc and that fixes it.

@reidsunderland
Copy link
Member

It never does the poll:

2024-03-20 11:29:28,581 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 11:34:38,848 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 11:39:49,118 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 11:44:59,399 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 11:50:09,651 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 11:55:19,925 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:00:30,199 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:03:40,352 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:08:50,626 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:14:00,911 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:19:11,190 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:24:21,469 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:29:31,738 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:34:42,000 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:39:52,278 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:45:02,548 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:50:12,828 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 12:55:23,066 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-03-20 13:00:33,341 [INFO] sarracenia.flowcb.scheduled.poll gather waiting for next poll

from show:

'housekeeping': 300,
'sanity_log_dead': 450.0,
'sleep': 1,
'scheduled_interval': 3600.0,

It also seems odd that housekeeping doesn't appear to run.

I was testing 735ff80. I'll try again with the latest commit on the branch.


With the latest branch, I see:

2024-03-20 15:41:22,729 [INFO] sarracenia.flowcb.scheduled update_appointments for 2024-03-20 15:41:22.729926+00:00: []

@petersilva
Copy link
Contributor Author

just double checking... I edited the top post to change branches... are you testing with:

https://github.com/MetPX/sarracenia/tree/issue984

@reidsunderland
Copy link
Member

yes, issue984

@reidsunderland
Copy link
Member

With the latest commit, it is polling, posting and both nodes are acking reasonably quickly. So it's looking good! But I still don't see housekeeping running.

@petersilva
Copy link
Contributor Author

I pushed a force of running housekeeping... try that. I will keep looking at flow/Flow/run() to see if it can be simplified.

@reidsunderland
Copy link
Member

should we have a default scheduled_interval? If sleep and scheduled_interval are both not set in the config, sleep defaults to -1 and scheduled_interval doesn't get set.

@petersilva
Copy link
Contributor Author

I think I'd like to merge this one as-is, as all the patches in here break the tests, so it's best to do a single squash-commit.
This commit will be pretty big, so would prefer to get this merged, then keep working.

Here is a branch with follow up work https://github.com/MetPX/sarracenia/tree/issue984_flow_run_refactor
refactoring the Flow/run() routine... which has the result of making polls work like all the other components transfers/downloads can be done with them. The main point was to break up that routine which was 288 lines. The re-factoring brought the routine's length down to 159 (still too long, but more mangeable.) and allowed me to understand anough to do change the behaviour.

After this gets merged, will work on a PR for that (based on the updated development, likely build by cherrypicking.)
The commits in that other branch don't need to be merged, none break anything, just progressive re-factoring.
So I'll add setting a default to that other branch, and just cherry-pick it with the others.

@petersilva
Copy link
Contributor Author

withdrawing this one in favour of: #991

@petersilva petersilva closed this Mar 21, 2024
@petersilva petersilva deleted the issue984 branch May 7, 2024 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants