-
Notifications
You must be signed in to change notification settings - Fork 31.6k
[doc] document MoE model approach and current solutions #14725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
|
As a reference for future PRs, it would be nice to put the images in a dataset hosted on hf.co - putting them in the repo weighs down the repo significantly (+72kb here) and it can't be removed from the history without altering it. Images in the docs can also be taken from a dataset, see SegFormer for example: https://huggingface.co/docs/transformers/master/en/model_doc/segformer#overview See how it was added here: https://github.com/huggingface/transformers/blame/master/docs/source/model_doc/segformer.rst#L44-L45 I have opened this PR to clarify this in the contribution guide: #14738 |
|
@LysandreJik maybe it would make it more obvious to everyone if we moved all images there. They won't disappear form history but there is no |
|
Yes, that's a very good call! |
|
That's a very good point about not blowing up the repo, @LysandreJik! Will this actually work well though? Aren't browsers these days capable of blocking remote urls that don't coincide on the same domain for security conscious users? |
|
We have tested it with @NielsRogge: it works well for the segformer above, and everything resides on huggingface.co. I'll take care of migrating all the images (including the one above) in a PR this week, will ensure that the images are correctly linked. |
You have tested it how? Do you have the security tools installed? Like Privacy Badger plugin and alikes? For context/background - many websites install all kinds of tracking devices, which can be an external js code, an image, etc. So tools like Privacy Badger will often skip any such 3rd party urls if configured so. Which may catch images as well. Since I have Privacy Badger I can test this if you give me an example of a doc that includes remote images.
Thank you! |
|
We empirically tested it - I use Brave which blocks ads and trackers but I don't use Privacy Badger, so I'd be happy for you to tell me how it works for you! Please let me know if you see the images here: https://github.com/LysandreJik/transformers/blob/documentation-images/docs/source/parallelism.md#zero-data-parallel and here: https://huggingface.co/docs/transformers/master/en/model_doc/segformer#overview I have no issue with those, which are currently hosted in the following dataset: https://huggingface.co/datasets/huggingface/documentation-images/tree/main If this is sufficient testing for you, let me know and I'll finalize/open the PR to remove the images and rely on the external dataset instead. |
|
Thank you for sharing how you tested it, @LysandreJik and I tested with Privacy Badger - all works - it doesn't consider those as trackers. So all is good! Thank you! |
|
Perfect, thank you for testing it out! |
This PR gives the reader an idea to travel a novel MoE Transformer architecture for a much faster training, albeit much larger memory requirements.
While most of the solutions are in the Tensorflow/TPU land, Deepspeed has introduced a pytorch solution based on Megatron-Deepspeed.
@sgugger