-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add notes for unit 3 and 4 of procesos estocásticos
- Loading branch information
1 parent
2ebfe90
commit c4ca304
Showing
19 changed files
with
513 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
73 changes: 73 additions & 0 deletions
73
Mathematics/Discounted Policy Improvement Method for MDPs.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
date: 2024-05-18 | ||
type: 🧠 | ||
tags: | ||
- MAC/6/PE | ||
--- | ||
|
||
**Topics:** [[Policy Improvement Method for MDPs]] - [[Markov Decision Process]] | ||
|
||
--- | ||
|
||
The [[Policy Improvement Method for MDPs|policy improvement method]] can be used to find the optimal [[Policy Improvement Method for MDPs|policy]] for a given [[Markov Decision Process|MDP]]. | ||
|
||
When modelling certain phenomena under certain circumstances, it may prove useful to take a **discount factor** into consideration when determining the optimal policy. Say, for instance, we want to take into consideration the [[Devaluation|devaluation]] of a [[Currency|currency]]. | ||
|
||
For these cases, we can follow a procedure that is practically identical to the one for the policy improvement method, with just a few slight changes in the equations and expressions that are used. The resulting method is called the **discounted policy improvement method**. | ||
|
||
Compared to the standard one, this method only uses the [[Expected Total Cost of an MDP Starting from a State|expected total costs starting from a state]] ($V_{i} \{r\}$), though this time they are actually a variation thereof, since they are discounted. | ||
|
||
> [!info]- Changes Compared to the Standard Method | ||
> The changes in the equations and expressions used for this discounted method compared to the standard one can be summarised as follows: | ||
> - We don't have a $g\{r_{n}\}$ ([[Expected Long Term Cost for a Policy in an MDP|expected long term cost of the policy]]) | ||
> - We solve for every $V_{j}\{r_{n}\}$ instead, not setting the last one to $0$ | ||
> - We multiply the weighted sum of all the $V_{j}\{r_{n}\}$ by $\alpha$ | ||
> - The expressions on the right hand side don't subtract $V_{i}\{r_{n}\}$ | ||
> | ||
> For completeness's sake, and so that this note can stand on its own, I have rewritten all of the steps. | ||
# Algorithm | ||
|
||
This method is an iterative algorithm. We will use $n$ as the iteration number. | ||
|
||
## Step 0 | ||
|
||
Before we can formally begin, we need to arbitrarily choose a viable policy $r_{1}$ as our starting policy ($n = 1$, first iteration). | ||
|
||
Of course, we also have to define a **discount factor** $\alpha$. We can alternatively define an **interest rate** $i$, having: | ||
|
||
$$ | ||
\alpha = \frac{1}{1+i} = (1+i)^{-1} | ||
$$ | ||
|
||
## Step 1 | ||
|
||
The first step is to **solve the following linear equation system**: | ||
|
||
$$ | ||
\begin{cases} | ||
V_{i}\{r_{n}\} &= C_{ik} + \alpha \sum_{j=0}^{m} p_{ij}\{k\}\ V_{j}\{r_{n}\} & \text{ for i = 0, 1, ..., m} | ||
\end{cases} | ||
$$ | ||
|
||
…where $C_{ik}$ is the cost of making the decision $k$ in the state $i$ (as defined by the policy) and $p_{ij}\{k\}$ the [[Transition Matrix|transition probability]] from $i$ to $j$ when making the decision $k$ (as defined by the policy). | ||
|
||
We'll thus obtain every $V_{i}\{r_{n}\}$, the expected total cost starting from a state $i$ given the policy $r_{n}$. | ||
|
||
## Step 2 | ||
|
||
The second step consists in **finding an alternative policy $r_{n+1}$** such that, in every state $i$, $d_{i}\{r_{n+1}\} = k$ is the optimal decision to make. To do so, we will use the values previously we previously obtained. | ||
|
||
That is, **for every state $i$**, we will plug these values into the expressions: | ||
|
||
$$ | ||
C_{ik} + \alpha \sum_{j=0}^{m} p_{ij}\{k\}\ V_{j}\{r_{n}\} \quad \text{for k = 1, 2, ..., K} | ||
$$ | ||
|
||
…having **one for every decision** that can be made in $i$. We will then pick the decision that yields the optimal result (the smallest if we deal with true costs, the largest if we deal with earnings). This will result in a new alternative policy $r_{n+1}$. | ||
|
||
## Step 3 | ||
|
||
The third and last step is to **determine whether we've obtained the optimal policy** or not, continuing with another iteration if it is not the case. | ||
|
||
If $r_{n+1} = r$, then we have the optimal policy. If not, then we shall make another iteration ($n \to n+1$ and back to step 1). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
--- | ||
date: 2024-05-18 | ||
type: 🧠 | ||
tags: | ||
- MAC/6/PE | ||
--- | ||
|
||
**Topics:** [[Markov Decision Process]] - [[Optimisation]] | ||
|
||
--- | ||
|
||
Given an [[Markov Decision Process|MDP]], we can find the optimal [[Markov Decision Process Policy|policy]] with the **exhaustive enumeration method**. This method is one of the simplest for this purpose. | ||
|
||
# Algorithm | ||
|
||
The exhaustive enumeration method consists of the following steps. | ||
|
||
## Step 1 | ||
|
||
**Enumerating _exhaustively_ all of the _viable_ policies for the MDP.** | ||
|
||
- When every decision (with $k$ total) can be made in every [[State Set|state]] (with $m+1$ total), then the total amount of viable policies is $k^{m+1}$ ([[Permutations with Repetitions|permutations with repetitions]]). | ||
|
||
## Step 2 | ||
|
||
**Calculating the [[Expected Value|expected]] _long term_ cost for every policy $r$:** | ||
|
||
$$ | ||
\mathbb{E}[C_{r}] = \sum_{i=0}^{m} C_{ik} \pi_{i} | ||
$$ | ||
|
||
…where $C_{ik}$ denotes the cost incurred for making the decision $k$ (defined _according to the policy_) in the state $i$, while $\pi_{i}$ denotes the [[Steady State Probability|steady state probability]] of the state $i$ _given the policy_. | ||
|
||
**Don't forget to take the policy into consideration in this step**. It defines the decision $k$ we'll be taking in each state $i$, as well as the [[Transition Matrix|transition matrix]] that is used to obtain the steady state probabilities. We can obtain the latter by (1) selecting the column that corresponds to the state $i$ in the transition matrix for the decision $k$, and then (2) splicing them together in a new matrix. | ||
|
||
## Step 3 | ||
|
||
**Select the _optimal_ policy based on its cost.** | ||
|
||
If we're dealing with classical costs, then we'll select the policy that has the smallest expected cost. If our "costs" are actually earnings, then we'll select the policy that has the largest expected "cost" (earning). |
22 changes: 22 additions & 0 deletions
22
Mathematics/Expected Long Term Cost for a Policy in an MDP.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
date: 2024-05-18 | ||
type: 🧠 | ||
tags: | ||
- MAC/6/PE | ||
--- | ||
|
||
**Topics:** [[Policy Improvement Method for MDPs]] - [[Markov Decision Process]] | ||
|
||
--- | ||
|
||
_**(theorem)**_ | ||
|
||
In the context of an [[Markov Decision Process|MDP]], the **expected long term cost for a given [[Markov Decision Process Policy|policy]] $r$** can be expressed as: | ||
|
||
$$ | ||
g\{r\} = \sum_{i=0}^{m} \pi_{i}\ C_{ik} | ||
$$ | ||
|
||
…where $\pi_{i}$ denotes the [[Steady State Probability|steady state probability]] of the state $i$. | ||
|
||
**Do not forget** that the policy $r$ defines which decision $k$ is made in which state $i$. |
27 changes: 27 additions & 0 deletions
27
Mathematics/Expected Total Cost of an MDP Starting from a State.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
--- | ||
date: 2024-05-18 | ||
type: 🧠 | ||
tags: | ||
- MAC/6/PE | ||
--- | ||
|
||
**Topics:** [[Policy Improvement Method for MDPs]] - [[Markov Decision Process]] | ||
|
||
--- | ||
|
||
_**(definition)**_ | ||
|
||
In the context of an [[Markov Decision Process|MDP]], let $V_{i}^n\{r\}$ be the **[[Expected Value|expected]] total cost of a system that starts in [[State Set|state]] $i$ and evolves over $n$ periods**, given a specific [[Markov Decision Process Policy|policy]] $r$. This will be: | ||
|
||
$$ | ||
V_{i}^{n}\{k\} = C_{ik} + \alpha \sum_{j=0}^{m} p_{ij}\{k\} \ V_{j}^{n-1}\{r\} | ||
$$ | ||
|
||
…where $C_{ik}$ is the cost of making the decision $k$ in the state $i$ (as defined by the policy $r$) and $\alpha$ is the discount factor. If there is no discount, then $\alpha = 1$, as is the case when using the standard [[Policy Improvement Method for MDPs|policy improvement method]] (cf. the [[Discounted Policy Improvement Method for MDPs|discounted version]]). | ||
|
||
> [!tip]- Explanation | ||
> Notice that this sum is basically the sum of: | ||
> - $C_{ik}$, the cost of the first period, when the system goes from $i$ to another state $j$. | ||
> - $\sum_{j=0}^{m} p_{ij}\{k\} \ V_{j}^{n-1}\{r\}$, the weighted sum of the total costs that could be incurred by continuing from another state $j$ (and still following the same policy $r$). | ||
**Do not forget** that these calculations take into consideration a given **policy $r$**, which defines which decision $k$ is made in which state $i$. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
--- | ||
date: 2024-05-18 | ||
type: 🧠 | ||
tags: | ||
- MAC/6/PE | ||
--- | ||
|
||
**Topics:** [[Queuing Model]] - [[Birth-Death Markov Process]] | ||
|
||
--- | ||
|
||
_**(definition)**_ | ||
|
||
An **$M/M/S$ queue** is a [[Queuing Model|queuing model]] where: | ||
|
||
- The arrivals are determined by a [[Homogeneous Poisson Process|Poisson process]] (i.e. they follow a [[Poisson Distribution|Poisson distribution]]) | ||
- The service times distribute [[Exponential Distribution|exponentially]] | ||
- There are $S$ servers | ||
|
||
_**(observation)**_ | ||
|
||
Notice that this is basically a [[Birth-Death Markov Process|birth-death Markov process]], since their arrivals also follow a Poisson process and the time between deaths also distributes exponentially. | ||
|
||
# Arrival and Service Rates | ||
|
||
> [!tip]- $\lambda$ and $\mu$ | ||
> In a queuing model, $\lambda_{n}$ is the [[Expected Value|expected]] rate of arrivals per unit of time when there are $n$ clients in the system, while $\mu_{n}$ stands for the expected rate at which clients are being serviced per unit of time when there are $n$ clients in the system. | ||
_**(theorem)**_ | ||
|
||
A $M/M/S$ queue has the following rates: | ||
|
||
- If $S=1$, then: | ||
- $\lambda_{n} = \lambda$ for $n = 0, 1, 2, \dots$ | ||
- $\mu_{n} = \mu$ for $n = 1, 2, 3 \dots$ | ||
- If $S>1$, then: | ||
- $\lambda_{n} = \lambda$ for $n = 0, 1, 2, \dots$ | ||
- $\mu_{n} = n\mu$ for $n = 1, 2, 3 \dots$ | ||
- Alternatively, $\mu_{n} = S\mu$ for $n = S, S+1, S+2, \dots$ | ||
|
||
We consider an $M/M/S$ queue stable when its utilisation factor $\rho = \frac{\lambda}{S\mu}$ satisfies $\rho<1$ (i.e. $\lambda<\mu$). | ||
|
||
# Performance Measures for $M/M/1$ | ||
|
||
_**(theorem)**_ | ||
|
||
When $S = 1$, the model is $M/M/1$ and its [[Performance Measures for a Queuing Theory Model|performance measures]] are: | ||
|
||
- $\rho=\frac{\lambda}{\mu}$ (utilisation factor) | ||
- $C_{n} = \left( \frac{\lambda}{\mu} \right)^{2} = \rho^n$ for $n = 0, 1, 2, \dots$ | ||
- $P_{n} = \rho^n(1-\rho)$ for $n = 0, 1, 2, \dots$ (the probability of having $n$ clients in the system) | ||
- $L = \frac{\lambda}{\mu-\lambda}$ (the [[Expected Value|expected]] amount of clients in the system) | ||
- $L_{q} = \rho L =\frac{\lambda^{2}}{\mu(\mu -\lambda)}$ (the expected amount of clients _in the queue_) | ||
- $W = \frac{1}{\mu-\lambda}$ (the expected wait time in the system) | ||
- $W_{q} = \rho W = \frac{\lambda}{\mu(\mu-\lambda)}$ (the expected wait time _in the queue_) | ||
|
||
Do note that this assumes that the system is stable (i.e. $\rho < 1$). | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
--- | ||
date: 2024-05-18 | ||
type: 🧠 | ||
tags: | ||
- MAC/6/PE | ||
--- | ||
|
||
**Topics:** [[Markov Decision Process]] - [[Linear Programming Problem]] | ||
|
||
--- | ||
|
||
Given an [[Markov Decision Process|MDP]], we can find the optimal [[Markov Decision Process Policy|policy]] by formulating it as a [[Linear Programming Problem|linear programming problem]]. | ||
|
||
# Decision Variables | ||
|
||
## As Binary | ||
|
||
Let the following be our decision variables: | ||
|
||
$$ | ||
D_{ik} = | ||
\begin{cases} | ||
1 & \text{if the decision } k \text{ is made in the state } i \\ | ||
0 & \text{otherwise} | ||
\end{cases} | ||
$$ | ||
|
||
Once solved, the matrix of the values these variables optimally take characterises the optimal policy. | ||
|
||
## As Continuous | ||
|
||
Solving binary linear programming problems isn't as straightforward as it may normally be with continuous ones, so we can reinterpret these decision variables as: | ||
|
||
$$ | ||
D_{ik} = \mathbb{P}(k \mid i) | ||
$$ | ||
|
||
Id est, the probability of making the decision $k$ given that the state is in the [[State Set|state set]] $i$. | ||
|
||
Under this reinterpretation, we'll be actually using other variables in the [[Objective Function|objective function]]: $Y_{ik}$. These represent the [[Unconditional Transition Probability|unconditional]] [[Steady State Probability|steady state probability]] of being in the state $i$ _and_ taking the decision $k$, satisfying: | ||
|
||
$$ | ||
D_{ik} = \frac{Y_{ik}}{\sum_{k=1}^{K} Y_{ik}} | ||
$$ | ||
|
||
# Model | ||
|
||
Finally, with these considerations, the linear programming model for an MDP is: | ||
|
||
$$ | ||
\begin{cases} | ||
&\text{optimise } \mathbb{E}[C] = Z = \sum_{i=0}^{m} \sum_{k=1}^{K} C_{ik} Y_{ik} & \\[1em] | ||
& \text{s.t.} & \\[0.5em] | ||
&\sum_{i=0}^{m} \sum_{k=1}^{k} Y_{ik} = 1 \\[0.5em] | ||
&\sum_{k=1}^{K} Y_{jk} - \sum_{i=0}^{m} \sum_{k=1}^{K} Y_{ik}\ p_{ij} \{k\} = 0 \quad \text{for } j = 0, 1, \dots m | ||
\end{cases} | ||
$$ | ||
|
||
…where: | ||
|
||
- $C_{ik}$ denotes the cost of making the decision $k$ in the state $i$ | ||
- $p_{ij}\{k\}$ denotes the [[Transition Matrix|transition probability]] from state $i$ to $j$ when making the decision $k$ | ||
|
||
Evidently, we can find the optimal $Y_{ik}$ by solving this linear programming problem through any of the applicable methods ([[Simplex Method|simplex]], etc.). Then we can determine the optimal policy by calculating the $D_{ik}$ as established above. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
date: 2024-05-18 | ||
type: 🧠 | ||
tags: | ||
- MAC/6/PE | ||
aliases: | ||
- "policy" | ||
--- | ||
|
||
**Topics:** [[Markov Decision Process]] | ||
|
||
--- | ||
|
||
_**(definition)**_ | ||
|
||
In the context of a [[Markov Decision Process|Markov decision process]] (MDP), a **policy** is a [[Function|mapping]] of every one of the MDP's [[State Set|states]] to a a specific decision. | ||
|
||
In other words, a **policy** tells us which decisions are taken on which state. | ||
|
||
Policies are normally denoted by $r$, while the set of possible policies in a given MDP is denoted by $R$. | ||
|
||
We can denote the decision taken in a state $i$ according to the policy $r$ with $d_{i}\{r\}$: | ||
|
||
$$ | ||
r = \left(d_{0}\{r\}, d_{1}\{r\}, \dots \right) | ||
$$ |
Oops, something went wrong.