Expose weight initialization bounds for LayerNorm, Projection, Conv2D, Conv3D #926

antoine-tran · 2024-12-19T12:27:27Z

What does this PR do? Please describe:

Most of fairseq2 Modules runs the standard (Xavier) initialization function, or uniform / constant weights initialization.

Sometimes users want to experiment with different algorithms too, for example when they want to manually set the boundaries of the weights. This was the case for the JEPA model.

This PR add parameters init_fn to the common Module (Projection, TransfomerEncoderLayer, LayerNorm)

Fixes #{issue number}

Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

cbalioglu · 2024-12-19T15:25:46Z

src/fairseq2/models/jepa/factory.py

+            init_module(proj, std=init_std)
+
+            with torch.no_grad():
+                proj.weight.div_(math.sqrt(2.0 * layer_idx))


Is this accurate? In the reference implementation, I see that the scaling is done with layer_idx + 1 instead of layer_idx. https://github.com/facebookresearch/jepa/blob/main/src/models/vision_transformer.py#L150

good catch , yes this was the mistake, thanks @cbalioglu

cbalioglu · 2024-12-19T15:27:22Z

src/fairseq2/nn/utils/module.py

@@ -570,3 +571,44 @@ def get_module_size(module: Module) -> ModuleSizeInfo:
        info.total_size_bytes += size_bytes

    return info
+
+
+def normalize_truncate(


I think we can use PyTorch's trunc_normal_ instead of this function. This is also noted in the reference implementation here: https://github.com/facebookresearch/jepa/blob/main/src/utils/tensors.py#L18-L19

cbalioglu · 2024-12-19T15:28:37Z

src/fairseq2/nn/utils/module.py

+    tensor.clamp_(min=a, max=b)
+
+
+def init_truncated_uniforma_weights_and_bias(


I would prefer this function to be within JEPA's factory.py instead of this file which is meant typically for much more generic module helper functions.

cbalioglu · 2024-12-19T15:29:38Z

src/fairseq2/models/jepa/factory.py

+            init_module(proj, std=init_std)
+
+            with torch.no_grad():
+                proj.weight.div_(math.sqrt(2.0 * layer_idx))


Note that I performed this change to make sure that secondary reset_parameters() calls result in identical weight initialization.

cbalioglu · 2024-12-19T17:05:40Z

src/fairseq2/models/jepa/factory.py

@@ -373,6 +370,13 @@ def init_projection(proj: Linear) -> None:
            dtype=self._dtype,
        )

+        # rescale the last layer


Remnant from old commits?

add init function to the builders

7fca3ca

antoine-tran requested a review from cbalioglu as a code owner December 19, 2024 12:27

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2024

antoine-tran changed the title ~~add init function to the builders~~ Add init function to the builders Dec 19, 2024

Tuan Tran and others added 2 commits December 19, 2024 13:58

refactor init_module function

7b959fe

Cosmetic updates

f745bea

cbalioglu reviewed Dec 19, 2024

View reviewed changes

Tuan Tran added 6 commits December 19, 2024 17:41

Can's comments

854b68e

fix git rebase

328f8ca

fix git rebase

f57108d

lint

2d27bab

lint

8d7dfaf

flake8

e86afaa

cbalioglu reviewed Dec 19, 2024

View reviewed changes

black

f4aaf33

antoine-tran changed the title ~~Add init function to the builders~~ Expose weight initialization bounds for LayerNorm, Projection, Conv2D, Conv3D Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose weight initialization bounds for LayerNorm, Projection, Conv2D, Conv3D #926

Expose weight initialization bounds for LayerNorm, Projection, Conv2D, Conv3D #926

antoine-tran commented Dec 19, 2024 •

edited

Loading

cbalioglu Dec 19, 2024

antoine-tran Dec 19, 2024

cbalioglu Dec 19, 2024

cbalioglu Dec 19, 2024

cbalioglu Dec 19, 2024

cbalioglu Dec 19, 2024

		tensor.clamp_(min=a, max=b)


		def init_truncated_uniforma_weights_and_bias(

Expose weight initialization bounds for LayerNorm, Projection, Conv2D, Conv3D #926

Are you sure you want to change the base?

Expose weight initialization bounds for LayerNorm, Projection, Conv2D, Conv3D #926

Conversation

antoine-tran commented Dec 19, 2024 • edited Loading

cbalioglu Dec 19, 2024

Choose a reason for hiding this comment

antoine-tran Dec 19, 2024

Choose a reason for hiding this comment

cbalioglu Dec 19, 2024

Choose a reason for hiding this comment

cbalioglu Dec 19, 2024

Choose a reason for hiding this comment

cbalioglu Dec 19, 2024

Choose a reason for hiding this comment

cbalioglu Dec 19, 2024

Choose a reason for hiding this comment

antoine-tran commented Dec 19, 2024 •

edited

Loading