Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor
-
Authors propose a Hierarchical Deep Reinforcement Learning Network (H-DRLN) capable of reusing previously learned skills.
-
The architecture was tested on Minecraft.
-
The main idea is to learn skills by solving sub-problems in the environment. These sub-problems are usually much simpler than the final task and allow the system to understand the dynamics of the environment and to learn "basic" behaviors which can be very useful in other scenarios. Building a house, for example, can be decomposed into several sub-problems like chopping wood, collecting materials and finally assembling the house.
-
To learn reusable skills in a lifelong learning setting the systems must:
- Learn skills
- Learn when a skill should be used and reused
- Efficiently accumulate skills
The H-DRLN architecture is shown in the figure above. The controller learns to solve complex tasks by learning reusable skills in the form of pre-trained Deep Skill Networks (DSNs) (trained a-priori on various sub-tasks using DQN). These skills are retained by incorporating them into a Deep Skill module. Given an input state s and a skill index i, the Deep Skill module outputs an action a according to the corresponding DSN policy. The authors proposed and tested two types of Deep Skill modules:
- DSN array: an array of pre-trained DSNs where each DSN is represented by a separate DQN
- Multi-skill distillation network: a single deep network that represents multiple DSNs. All the DSNs share the hidden layers while a separate output layer is trained for each DSN via policy distillation. This approach makes the architecture scalable with respect to the number of skills.
As shown in the figure, the H-DRLN controller learns a policy that determines when to use primitive actions and when to reuse pre-learned skills. If a primitive action is chosen, that action is executed for one time step. If the controller chooses to execute a skill i, then DSN i wull execute its policy until termination.
The authors properly modify the objective function to incorporate skills and define an experience replay memory capable of storing skill experiences.
The authors trained DSNs in sub-domains of Minecraft by defining "simple" problems like navigating from one point to another, picking up objects or breaking stones. These setups were simple enough that a DQN was capable of reaching a 100% success rate in all of them.
Once the DSNs had been trained, the authors proceeded to train the H-DRLN agent in a complex task.
The success rate of H-DRLN was significantly higher in the complex tasks compared to a standard DQN policy trained in the same task.
- Learning skills online instead of pre-training them in simple problems.
- Refine previously learned skills.
- Automatically add new skills to the Deep Skill module as new behaviors are learned.