Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions megatron/arguments.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
import deepspeed

from megatron.enums import PositionEmbeddingType
from megatron.model.glu_activations import GLU_ACTIVATIONS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the first time I've seen that, removing __init__ file in megatron/model would help I think.

Copy link
Contributor Author

@stas00 stas00 Aug 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require adapting all these:

./pretrain_gpt.py:from megatron.model import GPTModel, GPTModelPipe
./megatron/training.py:from megatron.model import Float16Module
./megatron/training.py:from megatron.model import DistributedDataParallel as LocalDDP
./megatron/optimizer/__init__.py:from megatron.model import LayerNorm
./megatron/schedules.py:from megatron.model import DistributedDataParallel as LocalDDP
./megatron/schedules.py:from megatron.model import Float16Module
./megatron/text_generation_utils.py:from megatron.model import DistributedDataParallel as LocalDDP
./megatron/text_generation_utils.py:from megatron.model import Float16Module
./megatron/model/realm_model.py:from megatron.model import BertModel
./megatron/model/transformer.py:from megatron.model import LayerNorm
./megatron/model/bert_model.py:from megatron.model import LayerNorm
./megatron/model/gpt_model.py:from megatron.model import LayerNorm
./tasks/zeroshot_gpt/evaluate.py:from megatron.model import GPTModel
./tasks/zeroshot_gpt/evaluate.py:from megatron.model import DistributedDataParallel as LocalDDP
./tasks/zeroshot_gpt/evaluate.py:from megatron.model import Float16Module
./checkpoint-analysis.ipynb:    "from megatron.model import GPTModel\n",
./pretrain_bert.py:from megatron.model import BertModel
./pretrain_t5.py:from megatron.model import T5Model
./tools/generate_samples_gpt.py:from megatron.model import GPTModel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for it if we continue running into this issue, if you or someone else would like to tackle that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can create an issue for this - should be trivial for anybody as this is all basic python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create an issue for now I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you doing it already, or if you haven't started, I will create it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: #73

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'll try to take 5 mins whenever I have time to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I commented, the issue has already been created!

let's have some simple issues open for contributors - that way it's easier for them to ease in with contributing.

import megatron


def parse_args(extra_args_provider=None, defaults={},
Expand Down Expand Up @@ -315,7 +315,7 @@ def _add_network_size_args(parser):
help='Define position embedding type ("absolute" | "rotary"). "absolute" by default.'
)
group.add_argument('--glu-activation', type=str,
choices=GLU_ACTIVATIONS.keys(),
choices=megatron.model.glu_activations.GLU_ACTIVATIONS.keys(),
help='GLU activations to use.'
)

Expand Down
2 changes: 1 addition & 1 deletion tests/test_activations.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def test_swiglu(self):

# from megatron.testing_utils import require_torch_bf16
# @require_torch_bf16
# def test_bf16_jit(self):
# def test_bf16_jit(self):
# x_bf16 = self.x.to(torch.bfloat16)
# for activation_fn in GLU_ACTIVATIONS.values():
# output = activation_fn(x_bf16)
Expand Down