Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support changing manual configuration while running #383

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ianpaul10
Copy link
Contributor

@ianpaul10 ianpaul10 commented Oct 24, 2024

Ref #380

  • Load config at the start of each find task
  • Only load config and return the new values if the file's getmtime has been updated, otherwise return the cached peers
  • If anything fails when trying to read the config file, return the cached peers (or empty dict if it's never been loaded)
  • Add new cleanup peers task that removes peers from the known_peers list if they no-longer exist in the dict but still exist in the known_peers list
  • Test that starts with 2 peers in the config file, adds a 3rd to the config, checks that it gets picked up by a node, and then remove it from the config, and check that it gets forgotten by the node.

@ianpaul10 ianpaul10 force-pushed the feat/manual-disc-follow-up branch 5 times, most recently from d551110 to 2f955cf Compare October 24, 2024 03:41
@ianpaul10
Copy link
Contributor Author

ianpaul10 commented Nov 9, 2024

@AlexCheema This PR should address the follow-up to updating the manual config while exo is running. Please take a look when you get a chance!

Copy link
Contributor

@AlexCheema AlexCheema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need it to be asyncified. Then it's good to go.


def _get_peers(self):
try:
current_mtime = os.path.getmtime(self.network_config_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this blocking?
Blocking every 1 second is going to slow exo down significantly since it runs on one thread.

Perhaps we can move these operations to a thread pool executor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. I updated all calls to self.network_config_path to run within a single thread pool executor. I think things should now be non-blocking and thread safe

if self._cached_peers is not None and self._last_modified_time is not None and current_mtime <= self._last_modified_time:
return self._cached_peers

topology = NetworkTopology.from_path(self.network_config_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is from_path blocking? Same comment as above. Can we make all I/O operations to self.network_config_path happen on a thread pool executor with one thread?

exo/networking/manual/manual_discovery.py Show resolved Hide resolved
@ianpaul10
Copy link
Contributor Author

@AlexCheema Thanks for the notes, reading from the config file should now be non-blocking

Comment on lines 74 to 85
async def task_clean_up_peers_from_config(self):
if DEBUG_DISCOVERY >= 2: print("Starting task to clean up peers from config...")
while True:
peers_from_config = await self._get_peers()
if peers_from_config:
peers_to_remove = [peer for peer in self.known_peers.keys() if peer not in peers_from_config]

for peer in peers_to_remove:
if DEBUG_DISCOVERY >= 2: print(f"{peer} is no longer found in the config but is currently a known peer. Removing from known peers...")
self.known_peers.pop(peer, None)

await asyncio.sleep(1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this is needed. I think this may be overcomplicating it.

The way StandardNode is set up, it should be possible to simply return the latest set of peers, i.e. your _get_peers is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, TIL a bit more about StandardNode, thanks!

I thought it was necessary to handle cleaning up nodes that were originally added to the config file (and thus added to self.known_peers) but removed afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants