Skip to content

Conversation

@cebtenzzre
Copy link
Member

ref https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/blob/7dbbc90392e2f80f3d3c277d6e90027e55de9125/config.json#L13

This is a quick-and-dirty fix since this code is going to be replaced anyway. It would be more correct to read layer_norm_eps when we convert to GGUF, and load that hyperparameter from the GGUF at inference time.

The difference between an epsilon of 1e-6 in LLaMA 1 and 1e-5 in LLaMA-2 created a significant difference in perplexity, so they implemented this parameter to ggml_norm and ggml_rms_norm soon after LLaMA-2 came out, and until the switch to GGUF they defaulted to 5e-6, which was a suitable middleground, and allowed the user to customize the parameter at inference time via a command-line option.

The difference between 1e-5 and 1e-12 is certainly more significant... if only we had benchmarks for this code.

Signed-off-by: Jared Van Bortel <[email protected]>
@cebtenzzre cebtenzzre merged commit 007d469 into main Feb 26, 2024
@cebtenzzre cebtenzzre deleted the fix-bert-norm-eps branch February 26, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants