-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
141 changed files
with
2,901 additions
and
2,064 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# Contributing | ||
GPT-NeoX welcomes your contributions! | ||
|
||
## Prerequisites | ||
GPT-NeoX uses [pre-commit](https://pre-commit.com/) to ensure that formatting is | ||
consistent across GPT-NeoX. First, ensure that `pre-commit` is installed with | ||
`pip install pre-commit`. Next, the pre-commit hooks must be installed once | ||
before commits can be made: | ||
```bash | ||
pre-commit install | ||
``` | ||
Please install `clang-format` from Conda: | ||
```bash | ||
conda install clang-format | ||
``` | ||
|
||
Afterwards, our suite of formatting tests run automatically before each `git commit`. You | ||
can also run these manually: | ||
```bash | ||
pre-commit run --all-files | ||
``` | ||
If a formatting test fails, it will fix the modified code in place and abort | ||
the `git commit`. After looking over the changes, you can `git add <modified files>` | ||
and then repeat the previous `git commit` command. | ||
|
||
|
||
## Testing | ||
GPT-NeoX tracks two types of tests: unit tests and more costly model convergence tests. | ||
Unit tests are found in `tests/unit/` and the model convergence tests are found in | ||
`tests/model/`. | ||
|
||
### Unit Tests | ||
[PyTest](https://docs.pytest.org/en/latest/) is used to execute tests. PyTest can be | ||
installed from PyPI via `pip install pytest`. Simply invoke `pytest --forked` to run the | ||
unit tests: | ||
```bash | ||
pytest --forked tests/unit/ | ||
``` | ||
You can also provide the `-v` flag to `pytest` to see additional information about the | ||
tests. Note that [pytest-forked](https://github.com/pytest-dev/pytest-forked) and the | ||
`--forked` flag are required to test CUDA functionality in distributed tests. | ||
|
||
### Model Tests | ||
To execute model tests, first install GPT-NeoX. Next, execute the model test driver: | ||
```bash | ||
cd tests/model/ | ||
pytest run_sanity_check.py | ||
``` | ||
Note that the `--forked` flag is not necessary for the model tests. | ||
|
||
## Contributor License Agreement | ||
This project welcomes contributions and suggestions. Most contributions require you to | ||
agree to a Contributor License Agreement (CLA) declaring that you have the right to, and | ||
actually do, grant us the rights to use your contribution. For details, visit | ||
https://cla-assistant.io/EleutherAI/gpt-neox. | ||
|
||
When you submit a pull request, a CLA bot will automatically determine whether you need | ||
to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply | ||
follow the instructions provided by the bot. You will only need to do this once across | ||
all repos using our CLA. | ||
|
||
## New Feature Contribution Guidelines | ||
Unlike bug fix or improving existing feature (where users usually directly submit a PR and we review it), adding a new feature to GPT-NeoX requires several steps: (1) proposal and discussion, (2) implementation and verification, (3) release and maintenance. This general guideline applies to all new feature contributions. Core GPT-NeoX team member contributions may complete step 1 internally. | ||
|
||
### Step 1: Proposal and Discussion | ||
We ask users to first post your intended feature in an issue. This issue needs to include: | ||
|
||
* A description of the proposed feature. | ||
* A motivation of why it will be useful to GPT-NeoX users. | ||
* A rough design of how you implement the feature inside GPT-NeoX. | ||
* (Important) Results or planned experiments to demonstrate the effectiveness and correctness of the feature. | ||
* If the feature only affects performance and does not affect training convergence, we require testing on a fraction of training to demonstrate that the training/validation loss are consistent with baseline, and that the performance is better than baseline. | ||
* If the feature does affect training convergence, we require testing the whole training to demonstrate that the feature achieves better/on-par final model quality and training performance compared to baseline. | ||
|
||
Based on the issue we shall discuss the merit of the new feature and decide whether to accept or decline the proposal. Once accepted and after we confirm the design and implementation plan, we are ready for step 2. | ||
|
||
### Step 2: Implementation and Verification | ||
The contributor will proceed and implement the feature, and the GPT-NeoX team will provide guidance/helps as needed. The required deliverables include: | ||
|
||
* A PR to [EleutherAI/GPT-NeoX](https://github.com/EleutherAI/gpt-neox) including (1) the feature implementation (2) unit tests (3) documentation (4) example usage. | ||
* In the implementation (code, documentation, tutorial), we require the feature author to record their GitHub username as a contact method for future questions/maintenance. | ||
|
||
After receiving the PRs, we will review them and merge them after necessary tests/fixes. | ||
|
||
### Step 3: Release and Maintenance | ||
After the PRs are merged, we will announce the feature on our website (with credit to the feature author). We ask the feature author to commit to the maintenance of the feature. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Copyright (c) 2021, EleutherAI | ||
# Copyright (c) 2024, EleutherAI | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
|
@@ -12,7 +12,7 @@ | |
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
FROM nvidia/cuda:11.7.1-devel-ubuntu20.04 | ||
FROM nvidia/cuda:12.1.1-devel-ubuntu22.04 | ||
|
||
ENV DEBIAN_FRONTEND=noninteractive | ||
|
||
|
@@ -21,20 +21,20 @@ LABEL org.opencontainers.image.version = "2.0" | |
LABEL org.opencontainers.image.authors = "[email protected]" | ||
LABEL org.opencontainers.image.source = "https://www.github.com/eleutherai/gpt-neox" | ||
LABEL org.opencontainers.image.licenses = " Apache-2.0" | ||
LABEL org.opencontainers.image.base.name="docker.io/nvidia/cuda:11.7.1-devel-ubuntu20.04" | ||
LABEL org.opencontainers.image.base.name="docker.io/nvidia/cuda:12.1.1-devel-ubuntu22.04" | ||
|
||
#### System package (uses default Python 3 version in Ubuntu 20.04) | ||
RUN apt-get update -y && \ | ||
apt-get install -y \ | ||
git python3.9 python3-dev libpython3-dev python3-pip sudo pdsh \ | ||
htop llvm-9-dev tmux zstd software-properties-common build-essential autotools-dev \ | ||
git python3-dev libpython3-dev python3-pip sudo pdsh \ | ||
htop tmux zstd software-properties-common build-essential autotools-dev \ | ||
nfs-common pdsh cmake g++ gcc curl wget vim less unzip htop iftop iotop ca-certificates ssh \ | ||
rsync iputils-ping net-tools libcupti-dev libmlx4-1 infiniband-diags ibutils ibverbs-utils \ | ||
rdmacm-utils perftest rdma-core nano && \ | ||
update-alternatives --install /usr/bin/python python /usr/bin/python3 1 && \ | ||
update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1 && \ | ||
pip install --upgrade pip && \ | ||
pip install gpustat | ||
python -m pip install --upgrade pip && \ | ||
python -m pip install gpustat | ||
|
||
### SSH | ||
RUN mkdir /var/run/sshd && \ | ||
|
@@ -88,24 +88,31 @@ RUN mkdir -p /home/mchorse/.ssh /job && \ | |
echo 'export LD_LIBRARY_PATH=/usr/local/lib:/usr/local/mpi/lib:/usr/local/mpi/lib64:$LD_LIBRARY_PATH' >> /home/mchorse/.bashrc | ||
|
||
#### Python packages | ||
RUN pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117 && pip cache purge | ||
COPY requirements/requirements.txt . | ||
COPY requirements/requirements-wandb.txt . | ||
COPY requirements/requirements-onebitadam.txt . | ||
COPY requirements/requirements-sparseattention.txt . | ||
COPY requirements/requirements-flashattention.txt . | ||
RUN pip install -r requirements.txt && pip install -r requirements-onebitadam.txt | ||
RUN pip install -r requirements-sparseattention.txt | ||
RUN pip install -r requirements-flashattention.txt | ||
RUN pip install -r requirements-wandb.txt | ||
RUN pip install protobuf==3.20.* | ||
RUN pip cache purge | ||
RUN python -m pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 | ||
COPY requirements/* ./ | ||
RUN python -m pip install --no-cache-dir -r requirements.txt && pip install -r requirements-onebitadam.txt | ||
RUN python -m pip install -r requirements-sparseattention.txt | ||
RUN python -m pip install -r requirements-flashattention.txt | ||
RUN python -m pip install -r requirements-wandb.txt | ||
RUN python -m pip install protobuf==3.20.* | ||
RUN python -m pip cache purge | ||
|
||
## Install APEX | ||
RUN pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" git+https://github.com/NVIDIA/apex.git@a651e2c24ecf97cbf367fd3f330df36760e1c597 | ||
# Detect the architecture and install Apex accordingly | ||
RUN ARCH=$(uname -m) && \ | ||
if [ "$ARCH" = "x86_64" ]; then \ | ||
wget https://github.com/segyges/not-nvidia-apex/releases/download/jan-2024/apex-0.1-cp310-cp310-linux_x86_64.zip && \ | ||
unzip ./apex-0.1-cp310-cp310-linux_x86_64.zip && \ | ||
python -m pip install ./apex-0.1-cp310-cp310-linux_x86_64.whl; \ | ||
else \ | ||
# Install Apex directly from source for other architectures | ||
python -m pip install -r requirements-apex-pip.txt && \ | ||
python -m pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings --global-option=--cpp_ext --config-settings --global-option=--cuda_ext git+https://github.com/NVIDIA/apex.git@141bbf1cf362d4ca4d94f4284393e91dda5105a5; \ | ||
fi | ||
|
||
COPY megatron/fused_kernels/ megatron/fused_kernels | ||
RUN python megatron/fused_kernels/setup.py install | ||
WORKDIR /megatron/fused_kernels | ||
RUN python setup.py install | ||
|
||
# Clear staging | ||
RUN mkdir -p /tmp && chmod 0777 /tmp | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.