Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce improvements from OSLO #571

Open
hyunwoongko opened this issue Feb 23, 2022 · 6 comments
Open

Introduce improvements from OSLO #571

hyunwoongko opened this issue Feb 23, 2022 · 6 comments
Labels
feature request New feature or request

Comments

@hyunwoongko
Copy link
Member

hyunwoongko commented Feb 23, 2022

  1. AOTAutograd is a novel engine provided by functorch that can fuse all parts of a neural network. I added it to OSLO recently, and this makes training very faster. I want to add this to GPTNeoX, how about this? It would be nice to implement this on the DeeperSpeed side as well.

  2. OSLO changed megatron's MPU to have an odd number of embedding sizes. Therefore, there is no need to add meaningless padding tokens and this could increase memory efficiency, and by using this, I was able to implement the TP Automerging function as well. Note that this can merge 70+ architectures of transformers without checkpoint conversion scripts.

  3. Recently FusedRMSNorm is added to Apex and this has been merged into OSLO. The NeoX 20B doesn't seem to use RMSNorm, but this might be helpful.


I will continue to write the parts that I can improve.

@hyunwoongko hyunwoongko changed the title AOTAutograd support Introduce improvements from OSLO Feb 23, 2022
@hyunwoongko
Copy link
Member Author

@sdtblck I saw you posted an issue regarding OSLO PP. Is there anything in PP you would like to improve?

@StellaAthena
Copy link
Member

#2 sounds quite clever and I strongly support it.

Given that we are very far from the mainline DeepSpeed repo, would #1 involve a lot of unnecessary labor compared to doing it after we get back to the main version of DeepSpeed?

#3 seems like a low priority nice to have. I don’t have any plans to use that normalization, though I’m sure some people might. That said, 90% of the use that this library gets is currently internal to EleutherAI AFAIK, so things other people might want to use seems like a low priority.

@hyunwoongko
Copy link
Member Author

hyunwoongko commented Feb 25, 2022

@StellaAthena

#2 I'm going to create a new branch in the current neox repo and experiment.

#1 This feature does not exist in DeepSpeed. so there is no need to worry about DeepSpeed upstream. Since I've already built it into a usable form in OSLO, it should be easy to add.

#3 I totally agree with you.

In addition, If there are any further parts that you would like to improve or experiment even if it has nothing to do with OSLO, please feel free to assign some tasks to me. I will totally help the neox project.

@StellaAthena
Copy link
Member

@hyunwoongko Ah I think I misread your comments about #1 :) In that case I would certainly be interested in experimenting with it :)

Honestly, far and away the most helpful thing you could do is figure out how to bring us back in-line with the main DeepSpeed branch. I know that’s a big ask though, so no worries if it’s a bit daunting.

In terms of building out the library, the other most important things on the horizon are #479 and #215. There’s also some outstanding abandoned PRs with optimizers like Shampoo that would be nice to have cleaned up and finished. In terms of general library maintenance, #469 and various documentation improvements such as #506 #484 and #458 would all be quite helpful.

We could also always use help designing and orchestrating experiments. We can happily provide the compute for anyone willing to do the work… DM me on Slack if you’re interested.

@StellaAthena StellaAthena added the feature request New feature or request label Feb 26, 2022
@Quentin-Anthony
Copy link
Member

@hyunwoongko -- Would you like to restart this effort?

@hyunwoongko
Copy link
Member Author

hyunwoongko commented May 20, 2023

@Quentin-Anthony sounds great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants