Skip to content

Conversation

@stas00
Copy link
Contributor

@stas00 stas00 commented Feb 11, 2022

This PR adds the the missing epsilon to formula to match the code


It's actually very unlikely that the denominator will ever be zero in this class - unless each input is zero, since this is the root of sum of squares of numbers.

I don't know if it's faster to add epsilon than to check if all inputs are zero, since the output will be zero then.

@eqy

@eqy
Copy link
Collaborator

eqy commented Feb 11, 2022

I'm not sure this formula is accurate, as the epsilon value is used during the computation of the root-mean-square:

U c_invvar = rsqrt(sigma2 + epsilon);
rather than after.

@stas00
Copy link
Contributor Author

stas00 commented Feb 11, 2022

Thank you for catching that, @eqy. You're correct - perhaps we just document this nuance as in my last edit?

@stas00 stas00 changed the title [FusedRMSNorm doc] add epsilon to formula [FusedRMSNorm doc] document where epsilon is added Feb 11, 2022
@crcrpar
Copy link
Collaborator

crcrpar commented Feb 11, 2022

thank you @stas00

@crcrpar crcrpar merged commit c8c00ef into NVIDIA:master Feb 11, 2022
@stas00 stas00 deleted the patch-1 branch February 11, 2022 18:38
@crcrpar crcrpar added this to the 22.03 milestone Feb 22, 2022
hubertlu-tw pushed a commit to ROCm/apex that referenced this pull request Apr 15, 2022
* [FusedRMSNorm doc] add epsilon to formula

* correct

* better wording
jithunnair-amd pushed a commit to ROCm/apex that referenced this pull request Aug 5, 2022
* FusedRMSNorm/"T5LayerNorm" based on FusedLayerNorm (NVIDIA#1274)

* FusedRMSNorm based on FusedLayerNorm

* refactor duplicated kernels

* delete comments

* delete comments

* cleanup

* cleanup

* cleanup, fixed clobbering forward_affine_mixed_dtypes

* fix pybind naming and add MixedFused test

* undo skipping

* check elementwise_affine

* Update tests/L0/run_fused_layer_norm/test_fused_layer_norm.py

Oof, nice catch, thanks

Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>

Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>

* fix and generate docs for FusedRMSNorm (NVIDIA#1285)

* [FusedRMSNorm doc] document where epsilon is added (NVIDIA#1295)

* [FusedRMSNorm doc] add epsilon to formula

* correct

* better wording

* Fix some bugs

* Optimize HostRMSNormGradient and HostApplyRMSNorm for AMD GPUs

* Fix NaN issues in FusedRMSNorm

* Update test_fused_layer_norm.py

* Skip test_fused_layer_norm.TestAutocastFusedRMSNorm on ROCm

* Use at::cuda::warp_size() instead of at::cuda::getCurrentDeviceProperties()->warpSize

Co-authored-by: eqy <eddiey@nvidia.com>
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants