Alteration to default summation used to define TD loss of ensemble Q functions #421

joshuaspear · 2024-09-24T15:00:26Z

Is your feature request related to a problem? Please describe.
I have a feeling that the reduction used for computing the TD error over ensembles of Q functions should be a mean rather than a sum.
(https://github.com/takuseno/d3rlpy/blob/4b54bdde93d19f3915f3236367a5ec253ef99cee/d3rlpy/models/torch/q_functions/ensemble_q_function.py#L105C9-L105C30)

My understanding of the ensemble Q function is that the critic loss is backpropped to all the constituent Q-functions equally, as a result, I feel that a mean reduction would help control divergent networks and enable larger learning rates to be used. Experimentally, I have found the need to use smaller learning rates with n_critics>1 as, I believe that, when a td loss is dominated by a divergent network, the high magnitude updates caused by the TD loss sum can cause the other networks to also become divergent.

I'd more than happily be pushed back on this - FYI I couldn't find any literature on the use of ensemble Q-networks in this manor.

Describe the solution you'd like

assert target.ndim == 2
td_sum = torch.tensor(
    0.0,
    dtype=torch.float32,
    device=get_device(observations),
)
for forwarder in forwarders:
    loss = forwarder.compute_error(
        observations=observations,
        actions=actions,
        rewards=rewards,
        target=target,
        terminals=terminals,
        gamma=gamma,
        reduction="none",
    )
    td_sum += loss.mean()
td_sum = td_sum/len(forwarders)
return td_sum

The text was updated successfully, but these errors were encountered:

takuseno · 2024-09-29T02:14:03Z

Thanks for the issue. First of all, I intentionally use the sum of TD loss over ensemble networks. This is because addition/subtraction doesn't interfere gradients each other as its definition. The advantage of sum instead of mean is that the gradient scale is not affected by the number of critics. If it's mean, the gradient propagated to individual networks becomes smaller proportionally to the number of critics, which requires additional learning rate tuning.

joshuaspear added the enhancement New feature or request label Sep 24, 2024

takuseno mentioned this issue Sep 29, 2024

Reduction in TD loss for ensemble Q networks #420

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alteration to default summation used to define TD loss of ensemble Q functions #421

Alteration to default summation used to define TD loss of ensemble Q functions #421

joshuaspear commented Sep 24, 2024

takuseno commented Sep 29, 2024

Alteration to default summation used to define TD loss of ensemble Q functions #421

Alteration to default summation used to define TD loss of ensemble Q functions #421

Comments

joshuaspear commented Sep 24, 2024

takuseno commented Sep 29, 2024