Fix LayerNorm gradient flow issue #3012

tymat · 2025-06-28T19:28:50Z

Fix LayerNorm.forward() to use tensor operations instead of scalar operations
Replace sum_keepdim()/size with mean_keepdim() to preserve gradients
Use broadcast_add() with epsilon tensor instead of scalar addition
Fix ops::layer_norm_slow() with same gradient-preserving changes
Update ops::layer_norm() to use slow implementation for proper gradients
Add comprehensive gradient flow test (now passes with 100% gradient flow)
Add numerical equivalence test to ensure accuracy is maintained
Fixes training issues where LayerNorm parameters weren't being updated

Resolves gradient propagation bug where only 33% of parameters received gradients during backpropagation, preventing proper model training. #3011

- Fix LayerNorm.forward() to use tensor operations instead of scalar operations - Replace sum_keepdim()/size with mean_keepdim() to preserve gradients - Use broadcast_add() with epsilon tensor instead of scalar addition - Fix ops::layer_norm_slow() with same gradient-preserving changes - Update ops::layer_norm() to use slow implementation for proper gradients - Add comprehensive gradient flow test (now passes with 100% gradient flow) - Add numerical equivalence test to ensure accuracy is maintained - Fixes training issues where LayerNorm parameters weren't being updated Resolves gradient propagation bug where only 33% of parameters received gradients during backpropagation, preventing proper model training.

AlpineVibrations · 2025-06-29T22:25:45Z

great. it would be awesome to have more training code examples and workflows with candle

ivarflakstad · 2025-07-01T10:36:12Z

Hey! Thanks for this :)

I think we'll have to implement this in the optimized kernels as well before we can merge.
I assume all the variants (cpu, cuda, metal) suffer from the same issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix LayerNorm gradient flow issue #3012

Fix LayerNorm gradient flow issue #3012

Uh oh!

tymat commented Jun 28, 2025

Uh oh!

AlpineVibrations commented Jun 29, 2025

Uh oh!

ivarflakstad commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix LayerNorm gradient flow issue #3012

Are you sure you want to change the base?

Fix LayerNorm gradient flow issue #3012

Uh oh!

Conversation

tymat commented Jun 28, 2025

Uh oh!

AlpineVibrations commented Jun 29, 2025

Uh oh!

ivarflakstad commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants