Feng, Fei et al. 2020. “Provably Efficient Exploration for RL with Unsupervised Learning.” : 1–23. http://arxiv.org/abs/2003.06898.
Zhang, Zihan, Yuan Zhou, and Xiangyang Ji. 2020. “Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition.” http://arxiv.org/abs/2004.10019.
Tasse, Geraud Nangue, Steven James, and Benjamin Rosman. 2020. “A Boolean Task Algebra for Reinforcement Learning.” : 1–18. http://arxiv.org/abs/2001.01394.
Li, Jiachen, Quan Vuong, Shuang Liu, and Minghua Liu. 2020. “Multi-Task Batch Reinforcement Learning with Metric Learning.” (NeurIPS): 1–32.
Zhang, Yiming, Quan Vuong, and Keith W. Ross. 2020. “First Order Optimization in Policy Space for Constrained Deep Reinforcement Learning.” (Icml). http://arxiv.org/abs/2002.06506.
Khosla, Prannay et al. 2020. “Supervised Contrastive Learning.” : 1–18. http://arxiv.org/abs/2004.11362.
Ziyin, Liu, Tilman Hartwig, and Masahito Ueda. 2020. “Neural Networks Fail to Learn Periodic Functions and How to Fix It.” http://arxiv.org/abs/2006.08195.
Curi, Sebastian, Felix Berkenkamp, and Andreas Krause. 2020. “Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning.” : 1–50. http://arxiv.org/abs/2006.08684.
Kato, Masahiro, Masatoshi Uehara, and Shota Yasui. 2020. “Off-Policy Evaluation and Learning for External Validity under a Covariate Shift.” (2000): 1–30. http://arxiv.org/abs/2002.11642.
Wang, Ruosong, Simon S. Du, Lin F. Yang, and Sham M. Kakade. 2020. “Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning?” : 1–19. http://arxiv.org/abs/2005.00527.
Kumar, Aviral, and Sergey Levine. 2019. “Model Inversion Networks for Model-Based Optimization.” (1). http://arxiv.org/abs/1912.13464.
Lazic, Nevena et al. 2020. “A Maximum-Entropy Approach to off-Policy Evaluation in Average-Reward MDPs.” http://arxiv.org/abs/2006.12620.
Vieillard, Nino, Olivier Pietquin, and Matthieu Geist. 2020. “Munchausen Reinforcement Learning.” http://arxiv.org/abs/2007.14430.
Zhou, Wei et al. 2020. “Online Meta-Critic Learning for Off-Policy Actor-Critic Methods.” 3. http://arxiv.org/abs/2003.05334.
Kalweit, Gabriel, Maria Huegle, Moritz Werling, and Joschka Boedecker. 2020. “Deep Inverse Q-Learning with Constraints.” http://arxiv.org/abs/2008.01712.
Fujimoto, Scott, David Meger, and Doina Precup. 2020. “An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay.” http://arxiv.org/abs/2007.06049.
Li, Alexander C., Lerrel Pinto, and Pieter Abbeel. 2020. “Generalized Hindsight for Reinforcement Learning.” http://arxiv.org/abs/2002.11708.
Yue, Yuguang, Zhendong Wang, and Mingyuan Zhou. 2020. “Implicit Distributional Reinforcement Learning.” : 1–20. http://arxiv.org/abs/2007.06159.
Lee, Kuang-Huei et al. 2020. “Predictive Information Accelerates Learning in RL.” http://arxiv.org/abs/2007.12401.
W, Aaron Sonabend et al. 2020. “Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation.” http://arxiv.org/abs/2006.13189.
Gulcehre, Caglar et al. 2020. “RL Unplugged: Benchmarks for Offline Reinforcement Learning.” (3): 1–22. http://arxiv.org/abs/2006.13888.
Lee, Alex X., Anusha Nagabandi, Pieter Abbeel, and Sergey Levine. 2019. “Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model.” : 1–19. http://arxiv.org/abs/1907.00953.
Liu, Yao, Adith Swaminathan, Alekh Agarwal, and Emma Brunskill. 2020. “Provably Good Batch Reinforcement Learning Without Great Exploration.” http://arxiv.org/abs/2007.08202.
Yang, Mengjiao et al. 2020. “Off-Policy Evaluation via the Regularized Lagrangian.” http://arxiv.org/abs/2007.03438.
D’Oro, Pierluca, and Wojciech Jaśkowski. 2020. “How to Learn a Useful Critic? Model-Based Action-Gradient-Estimator Policy Optimization.” : 1–17. http://arxiv.org/abs/2004.14309.
Kidambi, Rahul, Aravind Rajeswaran, Praneeth Netrapalli, and Thorsten Joachims. 2020. “MOReL : Model-Based Offline Reinforcement Learning.” http://arxiv.org/abs/2005.05951.
Kumar, Aviral, Abhishek Gupta, and Sergey Levine. 2020. “DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction.” : 1–35. http://arxiv.org/abs/2003.07305.