add varentropy #39

Occupying-Mars · 2025-08-11T09:26:25Z

(disclaimer: haven't tested it by actually running to see loss because gpu poor)
basic pr

confidence based rewards

added varentropy to measure confidence of treacher LLM if less confidence that means golden token is more important if more confidence that means there can be more branched out tokens in the top k tokens

later can add dynamic top_k number of tokens based on confidence if its less confidence we can choose more top_k for the distribution it gives

tokenbender · 2025-08-12T08:14:51Z

this is very cool but i have some plans with varentropy, avataRL can just get rid of critic and rank-cum-reward top k logits based on varentropy as an ablation.

i think we can teach a model to be interesting like this.

add varentropy

5d3b2b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add varentropy #39

add varentropy #39

Uh oh!

Occupying-Mars commented Aug 11, 2025

Uh oh!

tokenbender commented Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add varentropy #39

Are you sure you want to change the base?

add varentropy #39

Uh oh!

Conversation

Occupying-Mars commented Aug 11, 2025

confidence based rewards

Uh oh!

tokenbender commented Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants