-
Notifications
You must be signed in to change notification settings - Fork 2
/
sample.tex
158 lines (119 loc) · 6.1 KB
/
sample.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
\documentclass[a4paper, 11pt]{article}
\pdfoutput=1
% \usepackage{libertine}
\usepackage{tgtermes}
\usepackage{geometry}
\usepackage[round]{natbib} % For bibliography style
\renewcommand{\sfdefault}{lmss} % Computer Modern Sans
\renewcommand{\bfdefault}{bx}
% \renewcommand{\ttdefault}{ccr}
\usepackage{hyperref}
\usepackage{xcolor}
\usepackage{natbib}
\usepackage{listings}
\usepackage{color}
\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.85,0.85,0.92}
\lstdefinestyle{mystyle}{
backgroundcolor=\color{backcolour},
commentstyle=\color{codegreen},
keywordstyle=\color{magenta},
numberstyle=\tiny\color{codegray},
stringstyle=\color{codepurple},
basicstyle=\footnotesize\ttfamily,
breakatwhitespace=false,
breaklines=true,
captionpos=b,
keepspaces=true,
numbers=left,
numbersep=5pt,
showspaces=false,
showstringspaces=false,
showtabs=false,
tabsize=2
}
\lstset{style=mystyle}
\usepackage{titlesec}
\titleformat{\section}{\bfseries\sffamily\large}{\thesection}{1em}{}
\titleformat{\subsection}{\bfseries\sffamily}{\thesubsection}{0.5em}{}
\titleformat{\subsubsection}{\bfseries\sffamily}{\thesubsubsection}{0.4em}{}
\usepackage[acronym,smallcaps,nowarn,section,nonumberlist]{glossaries}
\glsdisablehyper{}
\usepackage{scalefnt,letltxmacro}
\LetLtxMacro{\oldtextsc}{\textsc}
\renewcommand{\textsc}[1]{\oldtextsc{\scalefont{1.10}#1}}
\newacronym{MDP}{mdp}{Markov decision process}
\newacronym{RL}{rl}{Reinforcement learning}
%----------------------------------------------------------------------------
\usepackage[alpha]{mdpn} % Load the MDP notation style
%----------------------------------------------------------------------------
% TITLE SECTION
%----------------------------------------------------------------------------
\author{Billy Okal\\\\
University of Freiburg}
\title{\bf \sffamily Example Usage of MDPNv1 Notation}
\date{}
\begin{document}
\maketitle
\section{Introduction}
\label{sec:intro}
Many \gls{RL} research papers contain paragraphs that define \gls{MDP} and related concepts~\cite[see][for detailed exposition]{SuttonBarto}.
These paragraphs take up space that could otherwise be used to present more useful/new content.
In this paper we demonstrate a package implementing a recently proposed notation\footnote{\url{http://arxiv.org/abs/1512.09075}} for \gls{MDP} that can be used as common foundation. Declaring the use this notation using a single sentence can replace several paragraphs of notational specifications in other papers.
\section{MDPs using MDPNv1 notation}
\label{sec:mdps}
Include the package using notation options of: {\tt alpha, beta, kappa}.
\begin{lstlisting}[language=Tex]
% ...
\usepackage[alpha]{mdpn} % Most verbose
%\usepackage[beta]{mdpn} % Compressed
%\usepackage[kappa]{mdpn} % Most compressed
% ...
\end{lstlisting}
The \gls{MDP} is then denoted by a tuple, $\MDP$, $\p$where;
%
\begin{enumerate}
\item We use $\step$ to denotes the time step, where $\NZ$ denotes the natural numbers {\em including zero}.
\item $\sset$ is the set of possible states that the agent can be in, and is called the {\em state set}. The state of the environment at time $t$ is a random variable that we denote by $\st{t}$. We will typically use $s$ to denote an element of the state set.
\item Similarly, $\aset$ is the set of possible actions the agent can perform. The action at time $t$ is denoted by $\at{t}$, while $a$ denotes an element of the action set.
\item $\rset$ is the set of possible rewards, defined as $\rsetdef$. Additionally, instantaneous reward at time $t$ is $\Rfun_t$. Elements of the reward set are denoted by $r$ while the infimum and supremum are $\rmin$ and $\rmax$ respectively.
\item $\Tdef$ is called the {\em transition function}. For all $(s,a,s',t) \in \sset \times \aset \times \sset \times \NZ$, let $\Tfun(s,a,s')\coloneqq\Pr(\st{t+1}=s' \mid \st{t}=s, \at{t}=a)$.
That is, $\Tfun$ characterizes the distribution over states at time $t+1$ given the state and action at time $t$.
We allow three alternate notations for $\Tfun$.
\begin{enumerate}
\item {\tt alpha}: $\T{s}{a}{s'} \coloneqq \Tfun(s,a,s')$.
This form takes approximately the same amount of space, but makes it more clear that $\Tfun$ is a conditional distribution over the next state given the current state and action.
\item {\tt beta}: $\Tfun_{s}^{a}(s') \coloneqq \Tfun(s,a,s')$.
This notation moves terms into subscripts and superscripts in order to save some space.
\item {\tt kappa}: $\Tfun_{s,s'}^{a} \coloneqq \Tfun(s,a,s')$.
This final form is particularly useful when space is limited.
\end{enumerate}
Once the author selects one the three notations modes, consistent within each paper is ensured automatically.
\item The reward function is denoted by $\Rfun$, and so on with three options provided.
\item $\isset$ is the initial distribution of states defined as $\issetdef$.
\item $\D$ is the discount factor defined as $\Ddef$.
\end{enumerate}
%
Further, $\p$ is the policy which is defined as $\pdef$ ...
\section{Acknowledgments}
\label{sec:thanks}
I want to thank Phillip Thomas for fruitful discussions on the naming and contents of this package.
%-----------------------------------------------------------------------------
% REFERENCE LIST
%-----------------------------------------------------------------------------
\begin{thebibliography}{1}
\providecommand{\natexlab}[1]{#1}
\providecommand{\url}[1]{\texttt{#1}}
\expandafter\ifx\csname urlstyle\endcsname\relax
\providecommand{\doi}[1]{doi: #1}\else
\providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi
\bibitem[Sutton and Barto(1998)]{SuttonBarto}
R.~S. Sutton and A.~G. Barto.
\newblock \emph{Reinforcement Learning: {A}n Introduction}.
\newblock MIT Press, Cambridge, MA, 1998.
\end{thebibliography}
\bibliographystyle{abbrvnat} % Uses author initials (requires abbrvnat.bst)
%-----------------------------------------------------------------------------
\end{document}