-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Vararg Chain (Chain of Parallel) #2101
Conversation
Closes FluxML#2100 As mentionned in FluxML#2100 (comment), this will break any code using `Chain()` as the identity function.
This reminded me that we discussed multi-input |
This was also discussed in great detail in #1698 where there was a desire to remove A couple of details that are relevant from that discussion:
The issue (#2100) is a little ambiguous. You can insert (m::Chain)(xs...) = _applychain(m.layers, xs) would work. This still keeps |
Is this documented anywhere? If not, I can make a PR. It's a crucial part for combining embeddings and inputs. In any case, then I agree with either the status quo, or the (m::Chain)(xs...) = _applychain(m.layers, xs) proposal. |
Surprisingly the My initial comment was just to surface prior discussions on the topic. Having had the chance to review those discussions in more detail, I think we can condense to two options:
In the case of (2), the (2a) is what we currently have, and I am okay with all options. Maybe the other maintainers can weigh in. |
Ideally MIMO would just work, but unfortunately I think splatting intermediates would break models like |
@cstjean were you planning on opening another PR with the convenience method discussed above? |
Sure, I can do that. |
Closes #2100
As mentionned in #2100 (comment), this will break any code using
Chain()
as the identity function. I need a decision whether this is acceptable, or if I should special-case it.PR Checklist