a question in DDPG.train_on_batch #2

lbgitjp · 2022-10-25T05:11:20Z

I think below code maybe has problem:

Lines 85 to 90 in 0c5c2f9

    
           # train actor-critic by target loss 
        
           self.actor_network.train( 
        
               self.critic_network.train( 
        
                   y_batch, action_batch, state_batch 
        
               ) 
        
           )

The two gradients need to be calculated separately, because their loss functions are different.
I think it should be changed to below:

`

    #for critic
    self.critic_network.train(y_batch, action_batch, state_batch)

    #for actor
    actor_loss =-self.critic_network.critic(self.actor_network.actor_action(state_batch), state_batch).mean()
    self.actor_network.optimizer.zero_grad()
    actor_loss.backward()
    self.actor_network.optimizer.step()

`
thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a question in DDPG.train_on_batch #2

a question in DDPG.train_on_batch #2

lbgitjp commented Oct 25, 2022

a question in DDPG.train_on_batch #2

a question in DDPG.train_on_batch #2

Comments

lbgitjp commented Oct 25, 2022