Skip to content

Commit

Permalink
note
Browse files Browse the repository at this point in the history
  • Loading branch information
lucidrains committed Dec 21, 2024
1 parent 8b367e6 commit d27c3da
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion x_transformers/x_transformers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1650,7 +1650,7 @@ def __init__(
unet_skips = False,
num_residual_streams = 1,
reinject_input = False, # seen first in DEQ paper https://arxiv.org/abs/1909.01377, but later used in a number of papers trying to achieve depthwise generalization https://arxiv.org/abs/2410.03020v1
add_value_residual = False, # resformer from Zhou et al - https://arxiv.org/abs/2410.17897v1
add_value_residual = False, # resformer from Zhou et al - https://arxiv.org/abs/2410.17897v1 - further corroboration by https://arxiv.org/abs/2412.15113 (faster emergence of ICL) - looks like this setting may becoming a necessity for every transformer soon
learned_value_residual_mix = True, # seeing big improvements when the value residual mix value is learned per token - credit goes to @faresobeid for taking the first step with learned scalar mix, then @Blinkdl for taking it a step further with data dependent. here we will use per token learned
rel_pos_kwargs: dict = dict(),
**kwargs
Expand Down

0 comments on commit d27c3da

Please sign in to comment.