Continous control #3

muupan · 2016-05-08T10:17:11Z

No description provided.

igrekun · 2016-05-14T19:40:17Z

I'm working on LSTM implementation (neon based) for the continuous case, sadly I failed to get any response from authors.

It is variance and entropy that puzzles me. Any thoughts on how that is implemented code-wise?
Currently it shows no signs of convergence on mujoco domain for me and most likely there are errors in learnt variance for gaussian policy.

muupan · 2016-05-15T03:54:09Z

Thanks for information. I haven't tried it yet, but the paper provides some information as below. Did you find it is not sufficient?

µ is modeled by a linear layer and σ2 by a SoftPlus operation, log(1 + exp(x)), as the activation computed as a function of the output of a linear layer.

we used a cost on the differential entropy of the normal distribution defined by the output of the actor network, −1/2 (log(2πσ2)+1), we used a constant multiplier of 10−4 for this cost across all of the tasks examined.

etienne87 · 2016-08-03T16:55:04Z

It is a bit vague for me so I will try to summarize in order to be corrected : we need a fully connected layer outputting 2 values, add 1 softplus operation for second value (so that variance is > 0 I suppose), sample according to this gaussian (use numpy.randn * sigma + mu ?) in each dimension of action space, and finally send −1/2 (log(2πσ2)+1 as logprob instead of log(softmax) ?

loofahcus · 2017-01-04T02:27:11Z

hi, @muupan , do you have a plan to implement continous control? : )

etienne87 · 2017-02-06T18:35:38Z

Here is an example:

class GaussianPolicyOutput(PolicyOutput):
    def __init__(self, logits_mu, logits_var):
        self.logits_mu = logits_mu
        self.logits_var = logits_var
        
        #print("self.logits_mu.data: ", self.logits_mu.data)
        
    @cached_property
    def action_indices(self):
        # the function has same name as for SoftmaxPolicyOutput so that the function
        # can be called from a3c.py without changes
        # however, the function samples from gaussian distributions
        
        mu, sigma2 = self.activation
        
        action = np.zeros(mu.data.shape, dtype = 'float32')
        
        #print("mu.data: ", mu.data)
        #print("sigma2.data: ", sigma2.data)
        for i in xrange(mu.data.shape[0]):
            action[i] = np.random.normal(mu.data[i], np.sqrt(sigma2.data[i]))
        #print("action: ", action)
        return action
    
    @cached_property
    def activation(self):
        mu = F.tanh(self.logits_mu) # output is in [-1,1]
        sigma2 = F.softplus(self.logits_var) #rectified output
        return mu, sigma2
        
    @cached_property
    def sampled_actions_log_probs(self):
        # returns chainer variable with log prob of the sampled action
    
        # activation
        mu, sigma2 = self.activation
        
        # sample action
        action = self.action_indices
        
        # compute neg. log likelihood
        #print("chainer.Variable(action).dtype: ", chainer.Variable(action).dtype)
        #print("mu.dtype: ", mu.dtype)
        #print("F.log(sigma2).dtype: ", F.log(sigma2).dtype)
        
        return -F.gaussian_nll(chainer.Variable(action), mu, F.log(sigma2))
    
    @cached_property
    def entropy(self):
        mu, sigma2 = self.activation
        return - F.sum(0.5*(np.log(2*np.pi*sigma2.data[0])+1))

haven't tested yet, so feel free to test/ correct

loofahcus · 2017-02-07T04:20:40Z

Thanks! @etienne87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continous control #3

Continous control #3

muupan commented May 8, 2016

igrekun commented May 14, 2016

muupan commented May 15, 2016

etienne87 commented Aug 3, 2016

loofahcus commented Jan 4, 2017

etienne87 commented Feb 6, 2017

loofahcus commented Feb 7, 2017

Continous control #3

Continous control #3

Comments

muupan commented May 8, 2016

igrekun commented May 14, 2016

muupan commented May 15, 2016

etienne87 commented Aug 3, 2016

loofahcus commented Jan 4, 2017

etienne87 commented Feb 6, 2017

loofahcus commented Feb 7, 2017