-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce improvements from OSLO #571
Comments
@sdtblck I saw you posted an issue regarding OSLO PP. Is there anything in PP you would like to improve? |
#2 sounds quite clever and I strongly support it. Given that we are very far from the mainline DeepSpeed repo, would #1 involve a lot of unnecessary labor compared to doing it after we get back to the main version of DeepSpeed? #3 seems like a low priority nice to have. I don’t have any plans to use that normalization, though I’m sure some people might. That said, 90% of the use that this library gets is currently internal to EleutherAI AFAIK, so things other people might want to use seems like a low priority. |
#2 I'm going to create a new branch in the current neox repo and experiment. #1 This feature does not exist in DeepSpeed. so there is no need to worry about DeepSpeed upstream. Since I've already built it into a usable form in OSLO, it should be easy to add. #3 I totally agree with you. In addition, If there are any further parts that you would like to improve or experiment even if it has nothing to do with OSLO, please feel free to assign some tasks to me. I will totally help the neox project. |
@hyunwoongko Ah I think I misread your comments about #1 :) In that case I would certainly be interested in experimenting with it :) Honestly, far and away the most helpful thing you could do is figure out how to bring us back in-line with the main DeepSpeed branch. I know that’s a big ask though, so no worries if it’s a bit daunting. In terms of building out the library, the other most important things on the horizon are #479 and #215. There’s also some outstanding abandoned PRs with optimizers like Shampoo that would be nice to have cleaned up and finished. In terms of general library maintenance, #469 and various documentation improvements such as #506 #484 and #458 would all be quite helpful. We could also always use help designing and orchestrating experiments. We can happily provide the compute for anyone willing to do the work… DM me on Slack if you’re interested. |
@hyunwoongko -- Would you like to restart this effort? |
@Quentin-Anthony sounds great. |
AOTAutograd is a novel engine provided by functorch that can fuse all parts of a neural network. I added it to OSLO recently, and this makes training very faster. I want to add this to GPTNeoX, how about this? It would be nice to implement this on the DeeperSpeed side as well.
OSLO changed megatron's MPU to have an odd number of embedding sizes. Therefore, there is no need to add meaningless padding tokens and this could increase memory efficiency, and by using this, I was able to implement the TP Automerging function as well. Note that this can merge 70+ architectures of transformers without checkpoint conversion scripts.
Recently FusedRMSNorm is added to Apex and this has been merged into OSLO. The NeoX 20B doesn't seem to use RMSNorm, but this might be helpful.
I will continue to write the parts that I can improve.
The text was updated successfully, but these errors were encountered: