Initialization for Log Alpha #5

hxu105 · 2025-01-04T23:03:39Z

Howdy, thank you for sharing this amazing work. I have some questions about the log alpha initialization. For both Llama and GPT2, you have initialized the log alphas with N(10, 0.01). Is there any specific reason for choosing this sharp Gaussian distribution? What will be the impact if I use standard normals or other initializations?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialization for Log Alpha #5

Initialization for Log Alpha #5

hxu105 commented Jan 4, 2025

Initialization for Log Alpha #5

Initialization for Log Alpha #5

Comments

hxu105 commented Jan 4, 2025