Value Iteration Networks

The author aims to combine the planning algorithm(value iteration) into a differentiable policy network, making the whole model able to train end-to-end.

Value iteration: V_{n+1}(s) = R(s,a) + max{Q_{n}(s,a)}

If we see the R(s,a) as the input of CNN and do convolution on it producing Q(s,a), then max-pooling on it. the whole process is similar to the right hand side of value iteration. If we apply this K times, it simply looks like doing K itetaion of value iteration. This method provides us a differential method to embed planning (value iteration) to our NN.

keypoints

embed value iteration into NN in a general way
the advantage of planning: it's invariant to that whether the observation is novel or not
can't directly generalize to continuous domain (perform "high-level" planning on a discrete, coarse grid-world representation of the continuous domain)

notes/question

the author pay attention on the archietecture of policy network, which makes me think of dueling network (happy to see that there's someone interested in this 😄)

reference

comments from @karpathy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value Iteration Networks.md

Value Iteration Networks.md

Value Iteration Networks

keypoints

notes/question

reference

Files

Value Iteration Networks.md

Latest commit

History

Value Iteration Networks.md

File metadata and controls

Value Iteration Networks

keypoints

notes/question

reference