The second original Swish paper #1

EliasHasle · 2018-10-31T00:40:58Z

Here: https://www.semanticscholar.org/paper/Searching-for-Activation-Functions-Ramachandran-Zoph/c8c4ab59ac29973a00df4e5c8df3773a3c59995a

It was published in Arxiv before your paper, so it should be cited and commented, in my opinion. They have found (or "found") through a search, the swish function with a beta factor inside the sigmoid, whereas you add one outside.

As far as I can see, for unconstrained weights a beta outside the sigmoid does exactly the same as increasing all the weights from the node, so the network will be able to represent exactly the same functions as pure swish (except the last layer may have no weights out). And a beta inside the sigmoid is equivalent to changing all the weights into the node (except the first layer may have no weights into it).

So basically, the beta parameters only affect the learning process, and will obviously interact with other learning parameters/choices and regularization. (Using SGD instead of Adam for a comparison based on another paper counts as such a choice.)

Please enlighten me if I am wrong.

MichaelFomenko · 2020-04-24T20:26:19Z

He clearly dont understand anything about Deep Learning, he only published this paper to have a published paper for his career.

hypnopump · 2020-04-24T20:40:05Z

Hi, @EliasHasle
I do cite the paper by Ramachandran et al. in my paper already.
wrt to the concern about the beta parameter, it's not the same:

As you increase the beta in swish, it resembles a ReLU function
As you increase the beta in e-swish, you augment the properties of the x*sigmoid(x) function.

I hope the image is clarifying!

hypnopump · 2020-04-24T20:41:12Z

@MichaelFomenko always glad to recieve constructive criticism

MichaelFomenko · 2020-04-26T00:04:33Z

Sorry EricAlcaide to tell you the truth, but you clearly don't understand Deep Learning, if you would understand Deep Learning you would know that the Beta in your E-Swish Function is just the Weight of the next Layer. This means that Mathematicaly there is no diverence between your E-Swish and the Swish Function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The second original Swish paper #1

The second original Swish paper #1

EliasHasle commented Oct 31, 2018 •

edited

Loading

MichaelFomenko commented Apr 24, 2020

hypnopump commented Apr 24, 2020

hypnopump commented Apr 24, 2020

MichaelFomenko commented Apr 26, 2020

The second original Swish paper #1

The second original Swish paper #1

Comments

EliasHasle commented Oct 31, 2018 • edited Loading

MichaelFomenko commented Apr 24, 2020

hypnopump commented Apr 24, 2020

hypnopump commented Apr 24, 2020

MichaelFomenko commented Apr 26, 2020

EliasHasle commented Oct 31, 2018 •

edited

Loading