Merge base #1

ghosthamlet · 2021-03-15T11:15:13Z

No description provided.

* fix arch flags, add PTX * bug fix Co-authored-by: Jeff Rasley <[email protected]>

Co-authored-by: Jeff Rasley <[email protected]>

* Update launch.py * formatting

Co-authored-by: Jeff Rasley <[email protected]>

* [doc] xref to hostfile discussion wasn't clear where to find what was meant by `hostfile` - so adding a link to where it's discussed. * remove whitespace

Co-authored-by: Shaden Smith <[email protected]>

Co-authored-by: Jeff Rasley <[email protected]>

…608)

Co-authored-by: Samyam Rajbhandari <[email protected]>

Allow DeepSpeed models to be initialized with optimizer=None Co-authored-by: Shaden Smith <[email protected]>

Co-authored-by: Jeff Rasley <[email protected]>

Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.10.10 to 1.11.0. - [Release notes](https://github.com/sparklemotion/nokogiri/releases) - [Changelog](https://github.com/sparklemotion/nokogiri/blob/master/CHANGELOG.md) - [Commits](sparklemotion/nokogiri@v1.10.10...v1.11.0) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeff Rasley <[email protected]>

Co-authored-by: Olatunji Ruwase <[email protected]>

Co-authored-by: Jeff Rasley <[email protected]>

* Remove a very verbose print statement. * Update engine.py

* Add Linear warmup+decay lr schedule Update lr schedule unit tests * LR scheduler unit tests for LR Range Test and 1Cycle * Disable yapf to preserve parameterizaton * Disable test_pipe.py for CI debugging * Disable test_lr_scheduler for CI debugging * Disable test_lr_scheduler for CI debugging * Enable all unit tests for CI debugging Co-authored-by: Jeff Rasley <[email protected]>

@g-karthik

) Special thanks to @g-karthik for tracking this issue down.

Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>

* move workspace memory-allocation to PyTorch * refine the code based on the comments * remove unnecessary options * remove bsz from set_seq_len function

Invalid param name Thanks.

* fix the bias-add precision and indexing and also adding the layer-norm-eps as a configurable parameter for transformer * add ACC_HALF config * use defined to check if ACC_Half is defined

Co-authored-by: Jeff Rasley <[email protected]>

hi, i take a look at the code of column_sum_reduce, i have 2 questions: 1. the goal of column_sum_reduce is to get the column sum of inp matrix with shape[rows, width] and the result shape should be [width],right ? It seems that the judgment condition of pos is not suitable 2. the implementation of cuda kernel based on the asumption that, the thread with same threadIdx.y will group into a thread_block_tile, the blockDim is (32,32), i read the nvidia document https://on-demand.gputechconf.com/gtc/2017/presentation/s7622-Kyrylo-perelygin-robust-and-scalable-cuda.pdf, THREAD BLOCK TILE is a subset of threads of a thread block, divided into tiles in row-major order. doesn't it mean thread with the same threadIdx.x will group into a thread_block_tile ? thanks !!!! Co-authored-by: Reza Yazdani <[email protected]>

* fixing buffers in transformer kernel when gelu-checkpoint is enabled * fixing the test issue for other memory optimization flags * fixing a bug for when attn_dropout_checkpoint is enabled

* Squash stage3 v1 (#146) Co-authored-by: Samyam <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Samyam Rajbhandari <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Shaden Smith <[email protected]> Co-authored-by: Shaden Smith <[email protected]> Co-authored-by: eltonzheng <[email protected]> * Fix correctness bug (#147) * formatting fix (#150) * stage3 bugfix (API) update and simplified FP16 Z3 tests (#151) * fp16 Z3 API update and bugfix * revert debug change * ZeRO-3 detach and race condition bugfixes (#149) * trying out ZeRO-3 race condition fix * CUDA sync instead of stream * reduction stream sync * remove commented code * Fix optimizer state_dict KeyError (#148) Co-authored-by: Jeff Rasley <[email protected]> * fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152) * Simplifying the logic for getting averaged gradients (#153) * skip for now * Z3 Docs redux (#154) * removing some TODOs and commented code (#155) * New Z3 defaults (#156) Co-authored-by: Jeff Rasley <[email protected]> * formatting * megatron external params Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Shaden Smith <[email protected]> Co-authored-by: Shaden Smith <[email protected]> Co-authored-by: eltonzheng <[email protected]>

)

Co-authored-by: Jeff Rasley <[email protected]>

…t in the website (#799) * add optimizers and schedules to rtd * update ds website and fix links * add optimizers and schedules to rtd * update ds website and fix links * add flops profiler to rtd * fix Co-authored-by: Shaden Smith <[email protected]>

* Control ZeRO wall clock timers * Disable more ZeRO3 debug prints Co-authored-by: Jeff Rasley <[email protected]>

* fix log(0) & 1/log(1) bugs * simplify Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Cheng Li <[email protected]>

…have 'params' (#827) Co-authored-by: Jeff Rasley <[email protected]>

Admin merging for pure-doc PR that does not trigger build.

jeffra and others added 30 commits December 11, 2020 10:05

add manual workflow to run tests with precompiled ops

0518252

[build] fix computer capability arch flags, add PTX, handle PTX (#591)

8a184b6

* fix arch flags, add PTX * bug fix Co-authored-by: Jeff Rasley <[email protected]>

add DeepSpeedZeroConfig repr method (#596)

66268bd

Co-authored-by: Jeff Rasley <[email protected]>

Supported customizing kwargs for lr_scheduler (#584)

a4763f5

Co-authored-by: Jeff Rasley <[email protected]>

Update launcher to set local rank environ variable (#597)

c5a449f

* Update launch.py * formatting

implement missing get_last_lr (#595)

9f8e8f3

Co-authored-by: Jeff Rasley <[email protected]>

[doc] xref to hostfile discussion (#604)

007466e

* [doc] xref to hostfile discussion wasn't clear where to find what was meant by `hostfile` - so adding a link to where it's discussed. * remove whitespace

Fixes for RTD build errors (#606)

6380ee3

Co-authored-by: Shaden Smith <[email protected]>

Transformer-kernel - supporting any arbitrary sequence-length (#587)

fd2f970

Co-authored-by: Jeff Rasley <[email protected]>

Ability to initialize distributed backend outside deepspeed runtime (#…

7435b2f

…608)

Elastic training support (#602)

81aeea3

Co-authored-by: Samyam Rajbhandari <[email protected]>

update SA comp check to fix torch-cpu issue (#631)

24e0739

Support initialization with dict configuration (#632)

e6ac731

Allow DeepSpeed models to be initialized with optimizer=None (#469)

a9a83a6

Allow DeepSpeed models to be initialized with optimizer=None Co-authored-by: Shaden Smith <[email protected]>

change dist to torch.distributed to fix bug in assert. (#638)

d38ad6a

docs: minor spelling tweaks (#623)

46d2e28

Co-authored-by: Jeff Rasley <[email protected]>

Fix docstring format (#640)

5ab1279

Module replacement support (#586)

44bd538

Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

Update builder.py (#642)

64461da

Add deepspeed.init_distributed to RTD page (#645)

4e2dc4e

Co-authored-by: Olatunji Ruwase <[email protected]>

document deepspeed.initialize() (#644)

828d75b

Co-authored-by: Jeff Rasley <[email protected]>

add additional validation checks in elastic config (#646)

bc046dc

Remove a very verbose print statement. (#649)

af212f6

* Remove a very verbose print statement. * Update engine.py

version bump to 0.3.10

c14b839

Handle actvitation checkpointing args that are None or non-tensors (#660

adcfd26

) Special thanks to @g-karthik for tracking this issue down.

squash latest flops profiling changes (#1) (#664)

e2fbe4d

Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>

Move workspace memory-allocation to PyTorch (#661)

981bc7d

* move workspace memory-allocation to PyTorch * refine the code based on the comments * remove unnecessary options * remove bsz from set_seq_len function

Validate consistent ckpt tags across ranks (#667)

f032e56

jeffra and others added 29 commits February 18, 2021 16:20

Update engine.py (#767)

29fa4b2

[doc] fix incorrect param name (#773)

e60e92e

Invalid param name Thanks.

Fixing the module-inject Api (#786)

48065c0

Fix the bias-add and add the layer-norm-eps parameter (#791)

e2dfcad

* fix the bias-add precision and indexing and also adding the layer-norm-eps as a configurable parameter for transformer * add ACC_HALF config * use defined to check if ACC_Half is defined

Delete out2 (#798)

62396b7

fixing the compiling issue for the AMD architecture (#796)

490e6f7

Co-authored-by: Jeff Rasley <[email protected]>

document the requirement to call for all ranks (#801)

7eb083c

fixed typo (#802)

db987cf

Fixing gelu_checkpointing memory issue (#812)

8295d7a

* fixing buffers in transformer kernel when gelu-checkpoint is enabled * fixing the test issue for other memory optimization flags * fixing a bug for when attn_dropout_checkpoint is enabled

Update ZeRO-Offload tutorials (#824)

ba33e86

update tutorial/doc links for zero3 (#835)

d7de916

Fix zero3 tutorial link

75ffdaf

bump DSE to include ZeRO-3

9c5eee3

Fix for RTD

af54897

Model scale changing 5x to 3x

6adc19a

replace home env with ~

4949636

Fix regression in runner (#843)

2e6692c

bumping DSE pointer (#847)

564eb4b

set adamw_mode default true (follows FusedAdam and < 0.3.11 logic) (#844

dd03cff

)

less scary overflow notice (#833)

29853c3

Co-authored-by: Jeff Rasley <[email protected]>

small tweaks (#839)

7925d0c

Control ZeRO wall clock timers (#849)

311795d

* Control ZeRO wall clock timers * Disable more ZeRO3 debug prints Co-authored-by: Jeff Rasley <[email protected]>

[WarmupDecayLR] fix log(0) & 1/log(1) bugs (#772)

18a26f3

* fix log(0) & 1/log(1) bugs * simplify Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Cheng Li <[email protected]>

bump to v0.3.12

35fd7cc

Bug fix: Remove client optimizer param_group list item that does not …

458ff02

…have 'params' (#827) Co-authored-by: Jeff Rasley <[email protected]>

[doc] pipeline doc typos/improvements (#659)

73d762c

Admin merging for pure-doc PR that does not trigger build.

ghosthamlet merged commit 517357e into ghosthamlet:master Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge base #1

Merge base #1

Uh oh!

ghosthamlet commented Mar 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Merge base #1

Merge base #1

Uh oh!

Conversation

ghosthamlet commented Mar 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants