-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmain.tex
120 lines (107 loc) · 5.09 KB
/
main.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% LaTeX Template: Project Titlepage
%
% Source: http://www.howtotex.com
% Date: April 2011
%
% This is a title page template which be used for articles & reports.
%
% Feel free to distribute this example, but please keep the referral
% to howtotex.com
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How to use writeLaTeX:
%
% You edit the source code here on the left, and the preview on the
% right shows you the result within a few seconds.
%
% Bookmark this page and share the URL with your co-authors. They can
% edit at the same time!
%
% You can upload figures, bibliographies, custom classes and
% styles using the files menu.
%
% If you're new to LaTeX, the wikibook is a great place to start:
% http://en.wikibooks.org/wiki/LaTeX
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% --------------------------------------------------------------------
% Preamble
% --------------------------------------------------------------------
\documentclass[paper=a4, fontsize=11pt,twoside]{scrartcl} % KOMA
\usepackage[a4paper,pdftex]{geometry} % A4paper margins
\setlength{\oddsidemargin}{5mm} % Remove 'twosided' indentation
\setlength{\evensidemargin}{5mm}
\usepackage[english]{babel}
\usepackage[protrusion=true,expansion=true]{microtype}
\usepackage{amsmath,amsfonts,amsthm,amssymb}
\usepackage{graphicx}
\usepackage{hyperref}
% --------------------------------------------------------------------
% Definitions (do not change this)
% --------------------------------------------------------------------
\newcommand{\HRule}[1]{\rule{\linewidth}{#1}} % Horizontal rule
\makeatletter % Title
\def\printtitle{%
{\centering \@title\par}}
\makeatother
\makeatletter % Author
\def\printauthor{%
{\centering \large \@author}}
\makeatother
\hypersetup{
colorlinks=true,
linkcolor=blue,
filecolor=magenta,
urlcolor=cyan,
}
% --------------------------------------------------------------------
% Metadata (Change this)
% --------------------------------------------------------------------
\title{ \normalsize \textsc{Informatics, UoE 2021} % Subtitle
\\[2.0cm] % 2cm spacing
\HRule{0.5pt} \\ % Upper rule
\LARGE \textbf{\uppercase{Reinforcment Learning Tutorial 1}} % Title
\HRule{2pt} \\ [0.5cm] % Lower rule + 0.5cm spacing
\normalsize \today % Todays date
}
\author{
Fazl Barez\\
\texttt{[email protected]} \\
}
\begin{document}
% ------------------------------------------------------------------------------
% Maketitle
% ------------------------------------------------------------------------------
\thispagestyle{empty} % Remove page numbering on this page
\printtitle % Print the title data as defined above
\vfill
\printauthor % Print the author data as defined above
\newpage
% ------------------------------------------------------------------------------
% Begin document
% ------------------------------------------------------------------------------
\setcounter{page}{1} % Set page numbering to begin on this page
\section*{Introduction}
In this document students are encouraged to rise any questions, points from the tutorial.
The idea is to keep a backlog of all the questions raised during and after the tutorial. Student can refer to this document for reference.
\section{Monte Carlo (MC) Policy Evaluation}
\textbf{First-visit MC}: average returns only for \textit{first time} state $s$ is visited.\\
\textbf{Every-visit MC}: average returns for \textit{every time} state $s$ is visited.\\
Both of these methods are guaranteed to converge asymptotically to the optimal value function.
\section*{Discussion}
\begin{itemize}
\item {In the \href{https://en.wikipedia.org/wiki/Trolley_problem}{\textbf{trolley problem}} setting, who should decide what is the \textit{"correct"} decision?}
\item {How can we use supervised learning in RL? \href{https://bair.berkeley.edu/blog/2020/10/13/supervised-rl/}{\textbf{Supervised RL?}}}
\item{Given that the real world applications of RL is very challenging, how do we get Robots to do very complex tasks such as surgery?}
\item How is RL different from Control Theory?
\item {How is the \textit{train test} split in the context of RL?}
\item{RL vs MAB \url{https://boliu68.github.io/2017/Reinforcement-Learning-versus-Bandit/#:~:text=For%20another%2C%20the%20agent%20explores,Markov%20Decision%20Process%20(POMDP).&text=Compared%20to%20one%2Dstate%20RL%2C%20i.e.%20multi%2Darmed%20bandit.}
}
\item{In SARSA, the next action A' is chosen by following the policy derived from Q, so this is an on-policy method. However, in Q-learning we use the action which maximises the Q function. How is this off-policy? If we derived a policy from Q, would this action not be the one which maximises Q, and therefore we are still learning on-policy?}
\end{itemize}
% ------------------------------------------------------------------------------
% End document
% ------------------------------------------------------------------------------
\end{document}