You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Adamits I'll implement but want to check with you if it's worthwhile since my domain is speech:
What are your thoughts on adding in new positional embeddings to Transformer models (particularly RoPE)? iirc we're using the standard cosine ones but they're a bit old fashioned nowadays. Know if there's arguments for or against?
The text was updated successfully, but these errors were encountered:
In general, I think the method we use is not considered the best one, but no idea to what extent it matters in our small-vocab character domains. Its actually a sort of interesting problem since I would think in problems with monotonic-ish alignments position representations would be very important.
I definitely think we should implement alternatives. Funny enough, a friend who works in this space and has used Yoyodyne for baselines texted me yesterday asking if we had RoPe style embeddings so it sounds like a great feature.
Kk, i'll implement them and we can see how they work off a fork. From some posts it looks like ~10 loc so not the craziest implementation. If we're seeing overfitting we can try to apply some context window limiting or the like.
Main things I'm thinking
RoPE
Absolute embeddings
No embedding (NoPE) (some people have been arguing PEs aren't really that important, so seems like worthwhile option)
@Adamits I'll implement but want to check with you if it's worthwhile since my domain is speech:
What are your thoughts on adding in new positional embeddings to Transformer models (particularly RoPE)? iirc we're using the standard cosine ones but they're a bit old fashioned nowadays. Know if there's arguments for or against?
The text was updated successfully, but these errors were encountered: