Distributional Reinforcement Learning with Quantile Regression #3

yydxlv · 2018-03-31T04:09:44Z

Hi, what does the "u" means in the following code snippets? It seems that the "u" is not defined in the code? Thanks!

huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2)
huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k))
quantile_loss = (tau - (u < 0).float()).abs() * huber_loss

hohoCode · 2018-04-10T03:04:29Z

I think probably it should be something like:

u = dist - expected_quant

angmc · 2018-04-12T21:18:48Z

After adding u = dist - expected_quant

TypeError Traceback (most recent call last)
in ()
15
16 if len(replay_buffer) > batch_size:
---> 17 loss = compute_td_loss(batch_size)
18 losses.append(loss.data[0])
19

in compute_td_loss(batch_size)
17 huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2)
18 huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k))
---> 19 quantile_loss = (tau - (u < 0).float()).abs() * huber_loss
20 loss = quantile_loss.sum() / num_quant
21

/home/--/anaconda2/envs/tensorflow4/lib/python2.7/site-packages/torch/tensor.pyc in sub(self, other)
310
311 def sub(self, other):
--> 312 return self.sub(other)
313
314 def rsub(self, other):

TypeError: sub received an invalid combination of arguments - got (Variable), but expected one of:

(float value)
didn't match because some of the arguments have invalid types: (Variable)
(torch.FloatTensor other)
didn't match because some of the arguments have invalid types: (Variable)
(float value, torch.FloatTensor other)

qfettes · 2018-06-03T16:23:30Z

Should be something like:

u = expected_dist.t().unsqueeze(-1) - dist
loss = self.huber(u) * (self.tau.view(1, -1) - (u.detach() < 0).float()).abs()
loss = loss.mean(1).sum()

angmc · 2018-06-05T17:57:03Z

When I last looked at this it ran after converting to a variable:
u=expected_quant-dist
huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2)
huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k))
quantile_loss = (autograd.Variable(tau.cuda()) - ((u < 0).float())).abs() * (huber_loss)
loss = (quantile_loss.sum() / num_quant)

LRiver-wut · 2023-04-22T11:41:51Z

Friend, this a question.

LRiver-wut · 2023-04-22T11:42:08Z

It confused me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributional Reinforcement Learning with Quantile Regression #3

Distributional Reinforcement Learning with Quantile Regression #3

yydxlv commented Mar 31, 2018

hohoCode commented Apr 10, 2018

angmc commented Apr 12, 2018 •

edited

Loading

qfettes commented Jun 3, 2018

angmc commented Jun 5, 2018 •

edited

Loading

LRiver-wut commented Apr 22, 2023

LRiver-wut commented Apr 22, 2023

Distributional Reinforcement Learning with Quantile Regression #3

Distributional Reinforcement Learning with Quantile Regression #3

Comments

yydxlv commented Mar 31, 2018

hohoCode commented Apr 10, 2018

angmc commented Apr 12, 2018 • edited Loading

After adding u = dist - expected_quant

qfettes commented Jun 3, 2018

angmc commented Jun 5, 2018 • edited Loading

LRiver-wut commented Apr 22, 2023

LRiver-wut commented Apr 22, 2023

angmc commented Apr 12, 2018 •

edited

Loading

angmc commented Jun 5, 2018 •

edited

Loading