Skip to content

Conversation

@ghosthamlet
Copy link
Owner

No description provided.

jeffra and others added 30 commits December 11, 2020 10:05
* fix arch flags, add PTX

* bug fix

Co-authored-by: Jeff Rasley <[email protected]>
* [doc] xref to hostfile discussion

wasn't clear where to find what was meant by `hostfile` - so adding a link to where it's discussed.

* remove whitespace
Co-authored-by: Samyam Rajbhandari <[email protected]>
Allow DeepSpeed models to be initialized with optimizer=None

Co-authored-by: Shaden Smith <[email protected]>
Co-authored-by: Reza Yazdani <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.10.10 to 1.11.0.
- [Release notes](https://github.com/sparklemotion/nokogiri/releases)
- [Changelog](https://github.com/sparklemotion/nokogiri/blob/master/CHANGELOG.md)
- [Commits](sparklemotion/nokogiri@v1.10.10...v1.11.0)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jeff Rasley <[email protected]>
* Remove a very verbose print statement.

* Update engine.py
* Add Linear warmup+decay lr schedule
Update lr schedule unit tests

* LR scheduler unit tests for LR Range Test and 1Cycle

* Disable yapf to preserve parameterizaton

* Disable test_pipe.py for CI debugging

* Disable test_lr_scheduler for CI debugging

* Disable test_lr_scheduler for CI debugging

* Enable all unit tests for CI debugging

Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Cheng Li <[email protected]>

Co-authored-by: Jeff Rasley <[email protected]>
* move workspace memory-allocation to PyTorch

* refine the code based on the comments

* remove unnecessary options

* remove bsz from set_seq_len function
jeffra and others added 29 commits February 18, 2021 16:20
Invalid param name

Thanks.
* fix the bias-add precision and indexing and also adding the layer-norm-eps as a configurable parameter for transformer

* add ACC_HALF config

* use defined to check if ACC_Half is defined
hi, i take a look at the code of column_sum_reduce, i have 2 questions:
   1. the goal of column_sum_reduce is to get the column sum of inp matrix with shape[rows, width] and the result shape should be [width],right ? It seems that the judgment condition of pos is not suitable
   2. the implementation of cuda kernel based on the asumption that, the thread with same threadIdx.y will group into a thread_block_tile, the blockDim is (32,32), i read the nvidia document https://on-demand.gputechconf.com/gtc/2017/presentation/s7622-Kyrylo-perelygin-robust-and-scalable-cuda.pdf, THREAD BLOCK TILE is a subset of threads of a thread block, divided into tiles in row-major order. doesn't it mean thread with the same threadIdx.x will group into a thread_block_tile ?
thanks !!!!

Co-authored-by: Reza Yazdani <[email protected]>
* fixing buffers in transformer kernel when gelu-checkpoint is enabled

* fixing the test issue for other memory optimization flags

* fixing a bug for when attn_dropout_checkpoint is enabled
* Squash stage3 v1 (#146)

Co-authored-by: Samyam <[email protected]>
Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Samyam Rajbhandari <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Shaden Smith <[email protected]>
Co-authored-by: Shaden Smith <[email protected]>
Co-authored-by: eltonzheng <[email protected]>

* Fix correctness bug (#147)

* formatting fix (#150)

* stage3 bugfix (API) update and simplified FP16 Z3 tests (#151)

* fp16 Z3 API update and bugfix

* revert debug change

* ZeRO-3 detach and race condition bugfixes (#149)

* trying out ZeRO-3 race condition fix

* CUDA sync instead of stream

* reduction stream sync

* remove commented code

* Fix optimizer state_dict KeyError (#148)

Co-authored-by: Jeff Rasley <[email protected]>

* fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152)

* Simplifying the logic for getting averaged gradients (#153)

* skip for now

* Z3 Docs redux (#154)

* removing some TODOs and commented code (#155)

* New Z3 defaults (#156)

Co-authored-by: Jeff Rasley <[email protected]>

* formatting

* megatron external params

Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Shaden Smith <[email protected]>
Co-authored-by: Shaden Smith <[email protected]>
Co-authored-by: eltonzheng <[email protected]>
…t in the website (#799)

* add optimizers and schedules to rtd

* update ds website and fix links

* add optimizers and schedules to rtd

* update ds website and fix links

* add flops profiler to rtd

* fix

Co-authored-by: Shaden Smith <[email protected]>
* Control ZeRO wall clock timers

* Disable more ZeRO3 debug prints

Co-authored-by: Jeff Rasley <[email protected]>
* fix log(0) & 1/log(1) bugs

* simplify

Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Reza Yazdani <[email protected]>
Co-authored-by: Cheng Li <[email protected]>
Admin merging for pure-doc PR that does not trigger build.
@ghosthamlet ghosthamlet merged commit 517357e into ghosthamlet:master Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.