Skip to content

Commit

Permalink
Typos
Browse files Browse the repository at this point in the history
  • Loading branch information
adityam committed Jul 21, 2024
1 parent 078f974 commit 2694900
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions approx-mdps/model-approximation.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -538,7 +538,7 @@ In the analysis above, we assumed that the model $\ALPHABET M$ and the approxima

In addition, suppose we are given a surjective function $φ \colon \ALPHABET S \to \hat {\ALPHABET S}$. We can use $φ$ to **lift** any function $\hat f$ defined on $\hat {\ALPHABET S}$ to a function defined on $\ALPHABET S$ given by $f = \hat f \circ φ$, i.e.,
$$f(s) = \hat f(φ(s)), \quad \forall s \in \ALPHABET S.$$
We can also use $φ$ to **project** any function $f$ defined on $\ALPHABET S$ to a function defined on $\ALPHABET S$. For this, we assume that we are given a measure $ν$ on $\ALPHABET S$ that has the following property:
We can also use $φ$ to **project** any function $f$ defined on $\ALPHABET S$ to a function defined on $\hat {\ALPHABET S}$. For this, we assume that we are given a measure $ν$ on $\ALPHABET S$ that has the following property:
$$
ν(φ^{-1}(\hat s)) = 1, \quad \forall \hat s \in \hat {\ALPHABET S}.
$$
Expand Down Expand Up @@ -615,7 +615,7 @@ The Bellman mismatch functionals can be used to bound the performance difference
For any (possibly randomized) policy $π$ in $\ALPHABET M$ and $\hat π$ in $\hat {\ALPHABET M}$, we have
$$
\NORM{V^{\hat π \circ φ} - \hat V^{\hat π} \circ φ}_{∞} \le
\frac{1}{1-γ}\min\bigl\{ \MISMATCH^{π}_{φ} V^{π}, \MISMATCH^{\hat π}_{φ} \hat V^{\hat π} \bigr\}.
\frac{1}{1-γ}\min\bigl\{ \MISMATCH^{\hat π \circ φ}_{φ} V^{\hat π \circ φ}, \hat \MISMATCH^{\hat π}_{φ} \hat V^{\hat π} \bigr\}.
$$
:::

Expand All @@ -624,26 +624,26 @@ $$

The proof is similar to the proof of @prp-policy-error. The first bound is obtained as follows:
\begin{align}
\| V^{π} - \hat V^{π \SQ φ} \circ φ \|_
\| V^{\hat π \circ φ} - \hat V^{\hat π} \circ φ \|_
&=
\| \BELLMAN^π V^π - (\hat {\ALPHABET B}^{π \SQ φ} \hat V^{π \SQ φ}) \circ φ \|_
\| \BELLMAN^{\hat π \circ φ} V^{\hat π \circ φ} - (\hat {\ALPHABET B}^{\hat π} \hat V^{\hat π}) \circ φ \|_
\notag \\
&\le
\| \BELLMAN^π V^π - (\hat {\ALPHABET B}^{π\SQ φ} (V^{π} \SQ φ)) \circ φ \|_
\| \BELLMAN^{\hat π \circ φ} V^{\hat π \circ φ} - (\hat {\ALPHABET B}^{\hat π} (V^{\hat π \circ φ} \SQ φ)) \circ φ \|_
\notag \\
& \quad +
\| (\hat {\BELLMAN}^{π\SQ φ} (V^π \SQ φ)) \circ φ - (\hat {\ALPHABET B}^{π \SQ φ} \hat V^{π\SQ φ}) \circ φ \|_
\| (\hat {\BELLMAN}^{\hat π} (V^{\hat π \circ φ} \SQ φ)) \circ φ - (\hat {\ALPHABET B}^{\hat π} \hat V^{\hat π}) \circ φ \|_
\notag \\
&\le
\MISMATCH^π_{φ} V^π + γ \| V^π \SQ φ - \hat V^π \|_
\MISMATCH^{\hat π \circ φ}_{φ} V^{\hat π \circ φ} + γ \| V^{\hat π \circ φ} \SQ φ - \hat V^{\hat π} \|_
\label{eq:ineq-3-abstract}
\end{align}
where the first inequality follows from the triangle inequality, and the
second inequality follows from the definition of the Bellman mismatch functional,
the contraction property of Bellman operators, and the fact that $\NORM{f_1 \circ φ - f_2 \circ φ}_{∞} \le \NORM{f_1 - f_2}_{∞}$. Rearranging terms
in \\eqref{eq:ineq-3-abstract} gives us
\begin{equation}
\| V^{π} - \hat V^{π} \circ φ \|_∞ \le \frac{ \MISMATCH^π_{φ} V^{π}}{1 - γ}.
\| V^{\hat π \circ φ} - \hat V^{\hat π} \circ φ \|_∞ \le \frac{ \MISMATCH^{\hat π \circ φ}_{φ} V^{\hat π \circ φ}}{1 - γ}.
\label{eq:ineq-4-abstract}\end{equation}
This gives the first bound.

Expand All @@ -664,13 +664,13 @@ The second bound is obtained as follows
&\le
γ \| V^{\hat π \circ φ} - \hat V^\hat π \circ φ \|_
+
\MISMATCH^{\hat π}_{φ} \hat V^π
\hat \MISMATCH^{\hat π}_{φ} \hat V^π
\label{eq:ineq-13-abstract}
\end{align}
Rearranging terms in \\eqref{eq:ineq-13-abstract} gives
us
$$\begin{equation}
\| V^{\hat π \circ φ} - \hat V^{\hat π} \circ φ \|_∞ \le \frac{ \MISMATCH^{\hat π}_{φ} \hat V^{\hat π}}{1 - γ}.
\| V^{\hat π \circ φ} - \hat V^{\hat π} \circ φ \|_∞ \le \frac{ \hat \MISMATCH^{\hat π}_{φ} \hat V^{\hat π}}{1 - γ}.
\label{eq:ineq-14-abstract}\end{equation}$$
This gives the second bound.
:::
Expand Down Expand Up @@ -778,7 +778,7 @@ Similar to @thm-model-error-V-star, we now provide such a bound that depends on
The policy $\hat π^*$ is an $α$-optimal policy of $\ALPHABET M$ where
$$
α := \| V^* - V^{\hat π^* \circ φ} \|_∞ \le
\frac{1}{1-γ} \MISMATCH^{\hat π^*}_{φ} V^*
\frac{1}{1-γ} \MISMATCH^{\hat π^* \circ φ}_{φ} V^*
+
\frac{(1+γ)}{(1-γ)^2} \MISMATCH^*_{φ} V^* .
$$
Expand Down

0 comments on commit 2694900

Please sign in to comment.