Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce sleep() in CAP library code #2189

Merged
merged 19 commits into from
Apr 2, 2024

Conversation

rajan-chari
Copy link
Contributor

@rajan-chari rajan-chari commented Mar 28, 2024

Why are these changes needed?

Related issue number

Resolves #2088 Roadmap item: (Remove Sleeps from Actor creation, ActorSender, ActorConnector in the CAP framework)

Checks

@rajan-chari rajan-chari changed the title Rajan/reduce sleeps Reduce sleep() in CAP library code Mar 28, 2024
@rajan-chari rajan-chari requested review from ekzhu and kinnym March 28, 2024 17:39
Copy link
Collaborator

@kinnym kinnym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we dont want to wait infinitely.. generally not a good idea. To fix it we can modify the existing code to this. Your timeout is your worst case scenario.
// Assuming _start_event is already defined somewhere
// Create an instance of threading.Event() if it's not already created
self._start_event = threading.Event()

//Wait for the event with a timeout of 5 seconds
if not self._start_event.wait(timeout=5):
// If the event didn't occur within 5 seconds
// Do something else or raise an exception
print("Event didn't occur within 5 seconds. Proceeding with other actions.")
else:
// If the event occurred within 5 seconds
// Proceed with the rest of the code
print("Event occurred. Proceeding with other actions.")

Copy link
Collaborator

@kinnym kinnym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise the changes look good.

@codecov-commenter
Copy link

codecov-commenter commented Mar 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 37.94%. Comparing base (32fbfa2) to head (9f0bbcc).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2189   +/-   ##
=======================================
  Coverage   37.94%   37.94%           
=======================================
  Files          77       77           
  Lines        7780     7780           
  Branches     1666     1666           
=======================================
  Hits         2952     2952           
  Misses       4579     4579           
  Partials      249      249           
Flag Coverage Δ
unittests 37.93% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rajan-chari
Copy link
Contributor Author

rajan-chari commented Mar 31, 2024

Typically we dont want to wait infinitely.. generally not a good idea. To fix it we can modify the existing code to this. Your timeout is your worst case scenario. // Assuming _start_event is already defined somewhere // Create an instance of threading.Event() if it's not already created self._start_event = threading.Event()

//Wait for the event with a timeout of 5 seconds if not self._start_event.wait(timeout=5): // If the event didn't occur within 5 seconds // Do something else or raise an exception print("Event didn't occur within 5 seconds. Proceeding with other actions.") else: // If the event occurred within 5 seconds // Proceed with the rest of the code print("Event occurred. Proceeding with other actions.")

Thanks for the suggestion, Kiran. I see where you are coming from in case there is an error in signaling logic, this would prevent Actor startup from freezing.

I tried the suggested approach. Here are my observations:

  1. In debugger, even when there are no issues, the waiting threads exit. This makes the Actor classes brittle and difficult to debug. In this case, attempting to avoid possible incorrect states, puts the code in incorrect states. Since the 5 seconds is just a heuristic, it's hard to guarantee correct behavior.

  2. My preference here is to fix issues instead of providing a path for possibly incorrect behavior in the core framework. I try to ensure that all path's in the signaling thread are correct. Please let me know if you see any logic issues, for example, exceptions might cause things to not be signaled.

This change would make it harder to use the framework (at least under Debug). I concluded that the best approach here is to rely on correct event signaling behavior and correct logic in threading code, so I rolled back the changes.

@rajan-chari
Copy link
Contributor Author

rajan-chari commented Mar 31, 2024

Merge seems to be prevented by checking a file that should be excluded. I created a PR to disable the file check. This has been merged to main. Not sure how to proceed here. The exclusion has also been committed to this branch.

@sonichi
Copy link
Contributor

sonichi commented Mar 31, 2024

Could you fix the code formatting error?

@rajan-chari
Copy link
Contributor Author

Could you fix the code formatting error?

Running into this issue: #2190

@rajan-chari
Copy link
Contributor Author

Could you fix the code formatting error?

All set.

@ekzhu ekzhu added this pull request to the merge queue Apr 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 2, 2024
@ekzhu ekzhu added this pull request to the merge queue Apr 2, 2024
Merged via the queue into microsoft:main with commit db30ec8 Apr 2, 2024
23 checks passed
@rajan-chari rajan-chari deleted the rajan/reduce-sleeps branch April 2, 2024 11:18
whiskyboy pushed a commit to whiskyboy/autogen that referenced this pull request Apr 17, 2024
* 1) Removed most framework sleeps 2) refactored connection code

* pre-commit fixes

* pre-commit

* ignore protobuf files in pre-commit checks

* Fix duplicate actor registration

* refactor change

* Nicer printing of Actors

* 1) Report recv_multipart errors 4) Always send 4 parts

* AutoGen generate_reply expects to wait indefinitely for an answer.  CAP can wait a certain amount and give up.   In order to reconcile the two, AutoGenConnector is set to wait indefinitely.

* pre-commit formatting fixes

* pre-commit format changes

* don't check autogenerated proto py files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Roadmap] CAP
5 participants