Update 4. Batch Reinforcement Learning.ipynb

galnov · web-flow · commit dea46ae0d22b · 2021-06-28T10:40:53.000+03:00
diff --git a/tutorials/4. Batch Reinforcement Learning.ipynb b/tutorials/4. Batch Reinforcement Learning.ipynb
@@ -18,7 +18,7 @@
     "\n",
     "Alternatively, what do we do if we don't have a simulator, but instead we can actually deploy our policy on that real-world environment, and would just like to separate the new data collection part from the learning part (i.e. if we have a system that can quite easily run inference, but is very hard to integrate a reinforcement learning framework with, such as Coach, for learning a new policy).\n",
     "\n",
-    "We will try to address these questions and more in this tutorial, demonstrating how to use [Batch Reinforcement Learning](http://tgabel.de/cms/fileadmin/user_upload/documents/Lange_Gabel_EtAl_RL-Book-12.pdf). \n",
+    "We will try to address these questions and more in this tutorial, demonstrating how to use [Batch Reinforcement Learning](https://link.springer.com/chapter/10.1007/978-3-642-27645-3_2). \n",
     "\n",
     "First, let's use a simple environment to collect the data to be used for learning a policy using Batch RL. In reality, we probably would already have a dataset of transitions of the form `<current_observation, action, reward, next_state>` to be used for learning a new policy. Ideally, we would also have, for each transtion, $p(a|o)$ the probabilty of an action, given that transition's `current_observation`. "
    ]

Original file line number	Diff line number	Diff line change
`@@ -18,7 +18,7 @@`
`18`	`18`	`"\n",`
`19`	`19`	`"Alternatively, what do we do if we don't have a simulator, but instead we can actually deploy our policy on that real-world environment, and would just like to separate the new data collection part from the learning part (i.e. if we have a system that can quite easily run inference, but is very hard to integrate a reinforcement learning framework with, such as Coach, for learning a new policy).\n",`
`20`	`20`	`"\n",`
`21`		`- "We will try to address these questions and more in this tutorial, demonstrating how to use [Batch Reinforcement Learning](http://tgabel.de/cms/fileadmin/user_upload/documents/Lange_Gabel_EtAl_RL-Book-12.pdf). \n",`
	`21`	`+ "We will try to address these questions and more in this tutorial, demonstrating how to use [Batch Reinforcement Learning](https://link.springer.com/chapter/10.1007/978-3-642-27645-3_2). \n",`
`22`	`22`	`"\n",`
`23`	`23`	"First, let's use a simple environment to collect the data to be used for learning a policy using Batch RL. In reality, we probably would already have a dataset of transitions of the form `<current_observation, action, reward, next_state>` to be used for learning a new policy. Ideally, we would also have, for each transtion, $p(a\|o)$ the probabilty of an action, given that transition's `current_observation`. "
`24`	`24`	`]`