The Reinforcement-Learning-Related Papers of ICLR 2019
[1] Temporal Difference Variational Auto-Encoder Chinese Paper Note
[2] Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow Chinese Paper Note
[3] Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
[4] Composing Complex Skills by Learning Transition Policies with Proximity Reward Induction Chinese Paper Note
[5] Exploration by random network distillation
[6] Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning
[7] Learning to Navigate the WebChinese Paper Note
[8] Variance Reduction for Reinforcement Learning in Input-Driven Environments
[9] ProMP: Proximal Meta-Policy Search
[10] Learning Self-Imitating Diverse PoliciesChinese Paper Note
[11] Recurrent Experience Replay in Distributed Reinforcement Learning
[12] Large-Scale Study of Curiosity-Driven Learning
[13] Diversity is All You Need: Learning Skills without a Reward FunctionChinese Paper Note
[14] Learning to Schedule Communication in Multi-agent Reinforcement Learning
[15] Episodic Curiosity through Reachability
[16] Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation LearningChinese Paper Note
[17] Knowledge Flow: Improve Upon Your Teachers
[18] Supervised Policy Update for Deep Reinforcement Learning
[19] DARTS: Differentiable Architecture Search
[20] Deep Online Learning Via Meta-Learning: Continual Adaptation for Model-Based RL
[21] Information-Directed Exploration for Deep Reinforcement Learning
[22] Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search[Paper Note TBW]
[23] Solving the Rubik's Cube with Approximate Policy Iteration
[24] Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference
[25] Hindsight policy gradientsChinese Paper Note
[26] Optimal Control Via Neural Networks: A Convex Approach
[27] NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning
[28] CEM-RL: Combining evolutionary and gradient-based methods for policy searchChinese Paper Note
[30] Policy Transfer with Strategy Optimization
[31] Unsupervised Control Through Non-Parametric Discriminative RewardsChinese Paper Note
[33] Emergent Coordination Through Competition
[34] Learning to Understand Goal Specifications by Modelling Reward
[35] Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy
[36] Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
[37] Optimal Completion Distillation for Sequence Learning
[38] SNAS: stochastic neural architecture search
[39] GO Gradient for Expectation-Based Objectives
[40] Analyzing Inverse Problems with Invertible Neural Networks
[41] Deep reinforcement learning with relational inductive biases
[42] Attention, Learn to Solve Routing Problems!
[43] Recall Traces: Backtracking Models for Efficient Reinforcement Learning
[44] DOM-Q-NET: Grounded RL on Structured Language
[45] Graph HyperNetworks for Neural Architecture Search
[46] Value Propagation Networks
[47] Contingency-Aware Exploration in Reinforcement Learning
[48] Learning Finite State Representations of Recurrent Policy Networks
[49] Initialized Equilibrium Propagation for Backprop-Free Training
[51] Stable Opponent Shaping in Differentiable Games
[52] Relational Forward Models for Multi-Agent Learning
[53] Preferences Implicit in the State of the World
[54] Remember and Forget for Experience Replay
[55] Reward Constrained Policy Optimization
[56] Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards
[57] Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees
[58] Learning To Simulate
[59] DHER: Hindsight Experience Replay for Dynamic Goals
[60] Neural Graph Evolution: Automatic Robot Design
[61] Hierarchical Visuomotor Control of Humanoids
[62] Information asymmetry in KL-regularized RL
[63] From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following
[64] Soft Q-Learning with Mutual-Information Regularization
[65] M^3RL: Mind-aware Multi-agent Management Reinforcement Learning
[66] Modeling the Long Term Future in Model-Based Reinforcement Learning
[67] Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering
[68] Probabilistic Planning with Sequential Monte Carlo methods
[69] Learning what you can do before doing anything
[70] Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic
[71] Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks
[72] A new dog learns old tricks: RL finds classic optimization algorithms
[73] Generating Multi-Agent Trajectories using Programmatic Weak Supervision
[74] Competitive experience replay
[75] Bayesian Policy Optimization for Model Uncertainty
[76] Environment Probing Interaction Policies
[77] Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
[78] Learning Multi-Level Hierarchies with Hindsight
[79] Generative predecessor models for sample-efficient imitation learning
[81] Adversarial Imitation via Variational Inverse Reinforcement Learning
[82] Variance Networks: When Expectation Does Not Meet Your Expectations
[83] Success at any cost: value constrained model-free continuous control
[84] Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization
[85] Stochastic Prediction of Multi-Agent Interactions from Partial Observations
[1] Policy Generalization In Capacity-Limited Reinforcement Learning
[2] EMI: Exploration with Mutual Information Maximizing State and Action Embeddings
[3] Lyapunov-based Safe Policy Optimization
[4] Towards Consistent Performance on Atari using Expert Demonstrations
[5] TarMAC: Targeted Multi-Agent Communication
[6] Neural MMO: A massively multiplayer game environment for intelligent agents
[7] Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation
[8] Uncovering Surprising Behaviors in Reinforcement Learning via Worst-case Analysis
[9] Reinforcement Learning with Perturbed Rewards
[10] On-Policy Trust Region Policy Optimisation with Replay Buffers
[11] Interactive Agent Modeling by Learning to Probe
[12] Learning Heuristics for Automated Reasoning through Reinforcement Learning
[13] Deep Imitative Models for Flexible Inference, Planning, and Control
[TBW]