Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval working #463

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Eval working #463

wants to merge 7 commits into from

Conversation

blindcrone
Copy link
Contributor

Added facilities for processing examples (which currently consist of an input, a target, and a length) and a means of evaluating them against a (currently hard-defaulted) loss function in a distributed fashion across the shards of the exo network

A lot of the piping here will make it easier to do distributed training.

exo/main.py Show resolved Hide resolved
exo/main.py Show resolved Hide resolved
if DEBUG >= 2: print(f"computed target from: {base_shard} {target_index}, {self.topology}. target shard: {target_shard}")
target_peer = next((p for p in self.peers if p.id() == target_id), None)
if not target_peer:
raise valueerror(f"peer for {target_index} not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be ValueError

@dtnewman
Copy link
Contributor

@blindcrone I put in some comments that I hope are useful. Note that these are all somewhat stylistic/superficial since I havent actually fetched or tested the branch on my machine.

@blindcrone blindcrone force-pushed the eval-working branch 4 times, most recently from 6bc2e01 to 1a1f6c6 Compare November 19, 2024 14:13
@blindcrone
Copy link
Contributor Author

blindcrone commented Nov 19, 2024

Okay, this now in theory trains across nodes on MLX. I'll need to add the ability to save the weights somewhere to see how well it actually does, and it'd be nice to get tinygrad working this way too

Also I think the backprop approximation loss isn't exactly a correct approximation, so if anyone has a better suggestion based on remembering the backprop equations better please do educate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants