Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2.4k
Star 10.6k

Code
Issues 137
Pull requests 153
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/Megatron-LM

Labels 11 Milestones 0

Labels 11 Milestones 0

New pull request New

153 Open 239 Closed

153 Open 239 Closed

Author

Filter by author

Loading

Label

Filter by label

Loading

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Loading

Milestones

Filter by milestone

Loading

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Loading

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Update] Print training log in rank0

#1296 opened Nov 21, 2024 by shijungg

Loading…

support qwen2 hf<->mcore ckpt converter

#1290 opened Nov 19, 2024 by wenyujin333

Loading…

Fix: misnamed sharded instead of common in checkpoint

#1289 opened Nov 16, 2024 by prrathi

Loading…

Fix: Resolve multimodal model errors and update README usage instructions

#1286 opened Nov 13, 2024 by singleheart

Loading…

Set torch.multiprocessing start method as 'spawn'

#1285 opened Nov 12, 2024 by hxdtest

Loading…

Fix a bug in optimizer's mix_lr/max_lr when args.override_opt_param_scheduler==True

#1284 opened Nov 12, 2024 by lyuwen

Loading…

Huvu/update t5 attentionmasktype

#1273 opened Nov 4, 2024 by huvunvidia

Loading…

Update t5_model.py

#1271 opened Nov 2, 2024 by huvunvidia

Loading…

Enable huggingface tokenizer

#1268 opened Oct 30, 2024 by msiddaiah

Loading…

fix: remove unnecessary trailing comma in statement

#1265 opened Oct 29, 2024 by singleheart

Loading…

Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining

#1262 opened Oct 28, 2024 by dhia680

Loading…

3

[ENHANCEMENT] Add support for Apex RMSNorm for use in qk-norm

#1261 opened Oct 28, 2024 by wdevazelhes

Loading…

5

Add support to process gzip files

#1260 opened Oct 28, 2024 by puneeshkhanna

Loading…

Make it an option to use TransformerEngine activation function in FFN block

#1233 opened Oct 21, 2024 by guyueh1

Loading…

4

[Wrong spelling] Update training.py

#1229 opened Oct 21, 2024 by zyqhnu

Loading…

1

Typo fix in readme

#1223 opened Oct 17, 2024 by alexchen4ai

Loading…

2

support qwen2 and siglip weight conversion script to enable training …

#1221 opened Oct 16, 2024 by tao-githup

Loading…

2

readme spelling correction

#1216 opened Oct 13, 2024 by jonassteinberg1

Loading…

2

[Functions] Support Packed_seq_params in Megatron-LM

#1215 opened Oct 12, 2024 by Baibaifan

Loading…

Embedding

#1209 opened Oct 10, 2024 by rachitgarg91

Loading…

Dev/optimizer offloading

#1205 opened Oct 10, 2024 by lostkevin

Loading…

fix bugs for multi_latent_attention

#1203 opened Oct 9, 2024 by xqiangx1991

Loading…

Use consistent assert message

#1195 opened Oct 3, 2024 by youzagou

Loading…

2

Expose cp_comm_type in ModelParallelConfig

#1160 opened Sep 27, 2024 by zochaoq

Loading…

Enabling UCC backend for PP communication

#1157 opened Sep 24, 2024 by youngeunkwon0405

Loading…

Previous 1 2 3 4 5 6 7 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.