Skip to content

Commit 51f34ac

Browse files
authored
Add release blog of v0.4.0 (#165)
**Description** Add release blog of v0.4.0 **Major Revision** - Add release blog of v0.4.0 - Update the version of MS-AMP to 0.4 - Fix bug in DeepSpeed
1 parent 9ac98df commit 51f34ac

File tree

9 files changed

+57
-15
lines changed

9 files changed

+57
-15
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
__MS-AMP__ is an automatic mixed precision package for deep learning developed by Microsoft.
44

5-
📢 [v0.3.0](https://github.com/Azure/MS-AMP/releases/tag/v0.3.0) has been released!
5+
📢 [v0.4.0](https://github.com/Azure/MS-AMP/releases/tag/v0.4.0) has been released!
66

77
## _Check [aka.ms/msamp/doc](https://aka.ms/msamp/doc) for more details._
88

docs/developer-guides/using-docker.mdx

+8-8
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,19 @@ You need to [clone the code](./development.md#set-up) first before building the
1616

1717
<Tabs
1818
groupId='gpu-platform'
19-
defaultValue='cuda-12.1'
19+
defaultValue='cuda-12.2'
2020
values={[
21-
{label: 'CUDA-12.1', value: 'cuda-12.1'},
21+
{label: 'CUDA-12.2', value: 'cuda-12.2'},
2222
{label: 'CUDA-11.8', value: 'cuda-11.8'},
2323
]
2424
}>
25-
<TabItem value='cuda-12.1'>
25+
<TabItem value='cuda-12.2'>
2626

2727
```bash
2828
export DOCKER_BUILDKIT=1
2929
docker buildx build \
3030
--platform linux/amd64 --cache-to type=inline,mode=max \
31-
--tag msamp-dev-cuda121 --file dockerfile/torch2.1-cuda12.1.dockerfile .
31+
--tag msamp-dev-cuda122 --file dockerfile/torch2.1-cuda12.2.dockerfile .
3232
```
3333

3434
</TabItem>
@@ -48,21 +48,21 @@ docker buildx build \
4848

4949
<Tabs
5050
groupId='gpu-platform'
51-
defaultValue='cuda-12.1'
51+
defaultValue='cuda-12.2'
5252
values={[
53-
{label: 'CUDA-12.1', value: 'cuda-12.1'},
53+
{label: 'CUDA-12.2', value: 'cuda-12.2'},
5454
{label: 'CUDA-11.8', value: 'cuda-11.8'},
5555
]
5656
}>
57-
<TabItem value='cuda-12.1'>
57+
<TabItem value='cuda-12.2'>
5858

5959
```bash
6060
docker run \
6161
-itd --name=msamp-dev \
6262
--privileged --net=host --ipc=host \
6363
--gpus=all \
6464
-w /root -v /mnt:/mnt \
65-
msamp-dev-cuda121 bash
65+
msamp-dev-cuda122 bash
6666
```
6767

6868
</TabItem>

docs/user-tutorial/container-images.mdx

+2
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ You can use MS-AMP image by `ghcr.io/azure/msamp:${tag}`, available tags are lis
2525

2626
| Tag | Description |
2727
|-------------------|------------------------------------|
28+
| v0.4.0-cuda12.2 | MS-AMP v0.4.0 with CUDA 12.2 |
29+
| v0.4.0-cuda11.8 | MS-AMP v0.4.0 with CUDA 11.8 |
2830
| v0.3.0-cuda12.1 | MS-AMP v0.3.0 with CUDA 12.1 |
2931
| v0.3.0-cuda11.8 | MS-AMP v0.3.0 with CUDA 11.8 |
3032
| v0.2.0-cuda12.1 | MS-AMP v0.2.0 with CUDA 12.1 |

msamp/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,6 @@ def initialize(model, optimizer=None, opt_level='O1', use_te=False): # noqa:
100100
return cast_model, cast_optimizer
101101

102102

103-
__version__ = '0.3.0'
103+
__version__ = '0.4.0'
104104
__author__ = 'Microsoft'
105105
__all__ = ['clip_grad_norm_', 'initialize']

msamp/deepspeed/runtime/engine.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
FP16, BFLOAT16, logger, DeepSpeedEngine, instrument_w_nvtx, log_dist, \
1212
see_memory_usage, DummyOptim, DeepSpeedZeroOptimizer, DeepSpeedZeRoOffload, \
1313
PipelineModule, ZeroStageEnum
14+
from deepspeed.utils.timer import NoopTimer
1415
from deepspeed.moe.utils import is_moe_param
1516
from deepspeed.accelerator import get_accelerator
1617

@@ -191,7 +192,8 @@ def _configure_zero_optimizer(self, optimizer):
191192
ZeROOptimizer: zero optimizer.
192193
"""
193194
zero_stage = self.zero_optimization_stage()
194-
timers = self.timers if self.wall_clock_breakdown() else None
195+
timers = self.timers if self.wall_clock_breakdown() else NoopTimer()
196+
model_dtype, gradient_accumulation_dtype = self.get_data_types()
195197

196198
if optimizer is None:
197199
optimizer = DummyOptim(list(self.module.parameters()))
@@ -232,6 +234,7 @@ def _configure_zero_optimizer(self, optimizer):
232234
clip_grad=self.gradient_clipping(),
233235
contiguous_gradients=contiguous_gradients,
234236
reduce_bucket_size=self.zero_reduce_bucket_size(),
237+
use_multi_rank_bucket_allreduce=self.zero_multi_rank_bucket_allreduce(),
235238
allgather_bucket_size=self.zero_allgather_bucket_size(),
236239
dp_process_group=self.data_parallel_group,
237240
expert_parallel_group=self.expert_parallel_group if self.has_moe_layers else None,
@@ -248,6 +251,7 @@ def _configure_zero_optimizer(self, optimizer):
248251
round_robin_gradients=round_robin_gradients,
249252
has_moe_layers=self.has_moe_layers,
250253
fp16_master_weights_and_gradients=self.fp16_master_weights_and_gradients(),
254+
gradient_accumulation_dtype=gradient_accumulation_dtype,
251255
communication_data_type=self.communication_data_type,
252256
elastic_checkpoint=self.zero_elastic_checkpoint()
253257
)
+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
slug: release-msamp-v0.4
3+
title: Releasing MS-AMP v0.4
4+
author: Yuxiang Yang
5+
author_title: MS-AMP Team
6+
author_url: https://github.com/tocean
7+
tags: [MS-AMP, announcement, release]
8+
---
9+
10+
We are very happy to announce that **MS-AMP 0.4.0 version** is officially released today!
11+
12+
You can install and try MS-AMP by following [Getting Started Tutorial](https://azure.github.io/MS-AMP/docs/getting-started/installation).
13+
14+
## MS-AMP 0.4.0 Release Notes
15+
16+
### MS-AMP Improvements
17+
18+
- Improve GPT-3 performance by optimizing the FP8-gradient accumulation with kernel fusion technology
19+
- Support FP8 in FSDP
20+
- Support DeepSpeed+TE+MSAMP and add cifar10 example
21+
- Support MSAMP+TE+DDP
22+
- Update DeepSpeed to latest version
23+
- Update TransformerEngin to V1.1 and flash-attn to latest version
24+
- Support CUDA 12.2
25+
- Fix several bugs in DeepSpeed integration
26+
27+
### MS-AMP-Examples Improvements
28+
29+
- Improve document for data processing in GPT3
30+
- Add launch script for pretraining GPT-6b7
31+
- Use new API of TransformerEngine in Megatron-LM
32+
33+
### Document Improvements
34+
35+
- Add docker usage in Installation page
36+
- Tell customer how to run FSDP and DeepSpeed+TE+MSAMP example in "Run Examples" page

website/docusaurus.config.js

+1-1
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ module.exports = {
9191
announcementBar: {
9292
id: 'supportus',
9393
content:
94-
'📢 <a href="https://azure.github.io/MS-AMP/blog/release-msamp-v0.3">v0.3.0</a> has been released! ' +
94+
'📢 <a href="https://azure.github.io/MS-AMP/blog/release-msamp-v0.4">v0.4.0</a> has been released! ' +
9595
'⭐️ If you like MS-AMP, give it a star on <a target="_blank" rel="noopener noreferrer" href="https://github.com/Azure/MS-AMP">GitHub</a>! ⭐️',
9696
},
9797
algolia: {

website/package-lock.json

+2-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

website/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "msamp-website",
3-
"version": "0.3.0",
3+
"version": "0.4.0",
44
"private": true,
55
"scripts": {
66
"docusaurus": "docusaurus",

0 commit comments

Comments
 (0)