Add notes for unit 3 and 4 of procesos estocásticos

camargomau · May 19, 2024 · c4ca304 · c4ca304
1 parent 2ebfe90
commit c4ca304
Show file tree

Hide file tree

Showing 19 changed files with 513 additions and 12 deletions.
diff --git a/MAC/6th Semester/Procesos Estocásticos.md b/MAC/6th Semester/Procesos Estocásticos.md
@@ -106,3 +106,23 @@ We can calculate the [[Average First-Passage Time|average first-passage time]] f
 ## Absorption Probability
 
 When a Markov chain has absorbing states, the probability of reaching one of them from any other is called the [[Probability of Absorption|probability of absorption]]. 
+
+# Unit 3: Markov Decision Processes
+
+A [[Markov Decision Process|Markov decision process]] (MDP) is a Markov chain where we can make a decision in each state, incurring a cost for each one. MDPs are used to optimise diverse phenomena that can be modelled with them, finding the optimal [[Markov Decision Process Policy|policy]].
+
+We can find this optimal policy through [[Optimisation Methods for Markov Decision Processes|several methods]]: 
+
+- The [[Exhaustive Enumeration Method for MDPs|exhaustive enumeration method]] is the simplest of them all. It consists of enumeration all of the viable policies, calculating the expected cost for each one, then finally selecting the optimal one.
+
+- Solving the [[MDP as a Linear Programming Problem|MDP as a linear programming problem]] through any of the applicable methods. 
+
+- The [[Policy Improvement Method for MDPs|policy improvement method]] consists of solving a linear equation system, then finding an alternative policy that optimises the costs until the optimal policy is reached. 
+
+	- The [[Discounted Policy Improvement Method for MDPs|discounted policy improvement method]] is basically the same method applied for phenomena where it might prove useful to discount the costs the further they are in the future. 
+
+- The [[Successive Approximations Method for MDPs|successive approximations method]] finds an _approximation_ to the optimal policy by using the expected total costs starting from every state given $n$ increasing remaining periods. 
+
+# Unit 4: Poisson, Markov and Non-Markov Processes
+
+An [[M-M-S Queue|M/M/S queue]] is, basically, a birth-death Markov process analysed as a queuing model.
diff --git a/Mathematics/Absorbing Set.md b/Mathematics/Absorbing Set.md
@@ -11,8 +11,7 @@ tags:
 
 _**(definition)**_
 
-An [[Ergodic Set|ergodic set]] with a single element is called an **absorbing set**. 
-This single element of an absorbent set is called an **absorbing state**. 
+An [[Ergodic Set|ergodic set]] with a single element is called an **absorbing set**. This single element of an absorbent set is called an **absorbing state**. 
 
 It is called so because once we reach such a state, it _absorbs_ the process and it will remain there forever. 
 

diff --git a/Mathematics/Bayes' Theorem.md b/Mathematics/Bayes' Theorem.md
@@ -14,9 +14,12 @@ _**(theorem)**_
 Let $(B_i)_{i \in \mathbb{N}}$ be a [[Partition of a Sample Space|partition of a sample space]] and let $A$ be an [[Event|event]]. Then:
 
 $$
-\mathbb{P}(B_i \mid A) = \frac{\mathbb{P}(A \mid B_i) \mathbb{P}(B_i)}{\mathbb{P}(A)}
+\begin{align*}
+\mathbb{P}(B_i \mid A) &= \frac{\mathbb{P}(A \mid B_i) \mathbb{P}(B_i)}{\mathbb{P}(A)} \\[0.5em]
+&= \frac{\mathbb{P}(A \mid B_i) \mathbb{P}(B_i)}{\sum_{i \in \mathbb{N}} \mathbb{P}(A \mid B_i) \mathbb{P}(B_i)}
+\end{align*}
 $$
 
-(see [[Conditional Probability|conditional probability]])
+See [[Conditional Probability|conditional probability]]. The second equality follows after the [[Total Probability Formula|total probability formula]].
 
 Basically, Bayes' theorem describes the [[Probability of an Event|probability of an event]] based on prior knowledge of conditions that might be related to the event.
diff --git a/Mathematics/Discounted Policy Improvement Method for MDPs.md b/Mathematics/Discounted Policy Improvement Method for MDPs.md
@@ -0,0 +1,73 @@
+---
+date: 2024-05-18
+type: 🧠
+tags:
+  - MAC/6/PE
+---
+
+**Topics:** [[Policy Improvement Method for MDPs]] - [[Markov Decision Process]]
+
+---
+
+The [[Policy Improvement Method for MDPs|policy improvement method]] can be used to find the optimal [[Policy Improvement Method for MDPs|policy]] for a given [[Markov Decision Process|MDP]]. 
+
+When modelling certain phenomena under certain circumstances, it may prove useful to take a **discount factor** into consideration when determining the optimal policy. Say, for instance, we want to take into consideration the [[Devaluation|devaluation]] of a [[Currency|currency]]. 
+
+For these cases, we can follow a procedure that is practically identical to the one for the policy improvement method, with just a few slight changes in the equations and expressions that are used. The resulting method is called the **discounted policy improvement method**. 
+
+Compared to the standard one, this method only uses the [[Expected Total Cost of an MDP Starting from a State|expected total costs starting from a state]] ($V_{i} \{r\}$), though this time they are actually a variation thereof, since they are discounted. 
+
+> [!info]- Changes Compared to the Standard Method
+> The changes in the equations and expressions used for this discounted method compared to the standard one can be summarised as follows:
+> - We don't have a $g\{r_{n}\}$ ([[Expected Long Term Cost for a Policy in an MDP|expected long term cost of the policy]])
+> - We solve for every $V_{j}\{r_{n}\}$ instead, not setting the last one to $0$
+> - We multiply the weighted sum of all the $V_{j}\{r_{n}\}$ by $\alpha$
+> - The expressions on the right hand side don't subtract $V_{i}\{r_{n}\}$
+> 
+> For completeness's sake, and so that this note can stand on its own, I have rewritten all of the steps. 
+
+# Algorithm
+
+This method is an iterative algorithm. We will use $n$ as the iteration number.
+
+## Step 0
+
+Before we can formally begin, we need to arbitrarily choose a viable policy $r_{1}$ as our starting policy ($n = 1$, first iteration).
+
+Of course, we also have to define a **discount factor** $\alpha$. We can alternatively define an **interest rate** $i$, having:
+
+$$
+\alpha = \frac{1}{1+i} = (1+i)^{-1}
+$$
+
+## Step 1
+
+The first step is to **solve the following linear equation system**:
+
+$$
+\begin{cases}
+V_{i}\{r_{n}\} &= C_{ik} + \alpha \sum_{j=0}^{m} p_{ij}\{k\}\ V_{j}\{r_{n}\} & \text{ for i = 0, 1, ..., m}
+\end{cases}
+$$
+
+…where $C_{ik}$ is the cost of making the decision $k$ in the state $i$ (as defined by the policy) and $p_{ij}\{k\}$ the [[Transition Matrix|transition probability]] from $i$ to $j$ when making the decision $k$ (as defined by the policy). 
+
+We'll thus obtain every $V_{i}\{r_{n}\}$, the expected total cost starting from a state $i$ given the policy $r_{n}$.
+
+## Step 2
+
+The second step consists in **finding an alternative policy $r_{n+1}$** such that, in every state $i$, $d_{i}\{r_{n+1}\} = k$ is the optimal decision to make. To do so, we will use the values previously we previously obtained.
+
+That is, **for every state $i$**, we will plug these values into the expressions:
+
+$$
+C_{ik} + \alpha \sum_{j=0}^{m} p_{ij}\{k\}\ V_{j}\{r_{n}\} \quad \text{for k = 1, 2, ..., K} 
+$$
+
+…having **one for every decision** that can be made in $i$. We will then pick the decision that yields the optimal result (the smallest if we deal with true costs, the largest if we deal with earnings). This will result in a new alternative policy $r_{n+1}$.
+
+## Step 3
+
+The third and last step is to **determine whether we've obtained the optimal policy** or not, continuing with another iteration if it is not the case.
+
+If $r_{n+1} = r$, then we have the optimal policy. If not, then we shall make another iteration ($n \to n+1$ and back to step 1). 
diff --git a/Mathematics/Exhaustive Enumeration Method for MDPs.md b/Mathematics/Exhaustive Enumeration Method for MDPs.md
@@ -0,0 +1,40 @@
+---
+date: 2024-05-18
+type: 🧠
+tags:
+  - MAC/6/PE
+---
+
+**Topics:** [[Markov Decision Process]] - [[Optimisation]]
+
+---
+
+Given an [[Markov Decision Process|MDP]], we can find the optimal [[Markov Decision Process Policy|policy]] with the **exhaustive enumeration method**. This method is one of the simplest for this purpose.
+
+# Algorithm
+
+The exhaustive enumeration method consists of the following steps.
+
+## Step 1
+
+**Enumerating _exhaustively_ all of the _viable_ policies for the MDP.**
+
+- When every decision (with $k$ total) can be made in every [[State Set|state]] (with $m+1$ total), then the total amount of viable policies is $k^{m+1}$ ([[Permutations with Repetitions|permutations with repetitions]]).
+
+## Step 2
+
+**Calculating the [[Expected Value|expected]] _long term_ cost for every policy $r$:**
+
+$$
+\mathbb{E}[C_{r}] = \sum_{i=0}^{m} C_{ik} \pi_{i}
+$$
+
+…where $C_{ik}$ denotes the cost incurred for making the decision $k$ (defined _according to the policy_) in the state $i$, while $\pi_{i}$ denotes the [[Steady State Probability|steady state probability]] of the state $i$ _given the policy_. 
+
+**Don't forget to take the policy into consideration in this step**. It defines the decision $k$ we'll be taking in each state $i$, as well as the [[Transition Matrix|transition matrix]] that is used to obtain the steady state probabilities. We can obtain the latter by (1) selecting the column that corresponds to the state $i$ in the transition matrix for the decision $k$, and then (2) splicing them together in a new matrix. 
+
+## Step 3
+
+**Select the _optimal_ policy based on its cost.**
+
+If we're dealing with classical costs, then we'll select the policy that has the smallest expected cost. If our "costs" are actually earnings, then we'll select the policy that has the largest expected "cost" (earning). 
diff --git a/Mathematics/Expected Long Term Cost for a Policy in an MDP.md b/Mathematics/Expected Long Term Cost for a Policy in an MDP.md
@@ -0,0 +1,22 @@
+---
+date: 2024-05-18
+type: 🧠
+tags:
+  - MAC/6/PE
+---
+
+**Topics:** [[Policy Improvement Method for MDPs]] - [[Markov Decision Process]]
+
+---
+
+_**(theorem)**_
+
+In the context of an [[Markov Decision Process|MDP]], the **expected long term cost for a given [[Markov Decision Process Policy|policy]] $r$** can be expressed as:
+
+$$
+g\{r\} = \sum_{i=0}^{m} \pi_{i}\ C_{ik}
+$$
+
+…where $\pi_{i}$ denotes the [[Steady State Probability|steady state probability]] of the state $i$. 
+
+**Do not forget** that the policy $r$ defines which decision $k$ is made in which state $i$.
diff --git a/Mathematics/Expected Total Cost of an MDP Starting from a State.md b/Mathematics/Expected Total Cost of an MDP Starting from a State.md
@@ -0,0 +1,27 @@
+---
+date: 2024-05-18
+type: 🧠
+tags:
+  - MAC/6/PE
+---
+
+**Topics:** [[Policy Improvement Method for MDPs]] - [[Markov Decision Process]]
+
+---
+
+_**(definition)**_
+
+In the context of an [[Markov Decision Process|MDP]], let $V_{i}^n\{r\}$ be the **[[Expected Value|expected]] total cost of a system that starts in [[State Set|state]] $i$ and evolves over $n$ periods**, given a specific [[Markov Decision Process Policy|policy]] $r$. This will be:
+
+$$
+V_{i}^{n}\{k\} = C_{ik} + \alpha \sum_{j=0}^{m} p_{ij}\{k\} \ V_{j}^{n-1}\{r\}
+$$
+
+…where $C_{ik}$ is the cost of making the decision $k$ in the state $i$ (as defined by the policy $r$) and $\alpha$ is the discount factor. If there is no discount, then $\alpha = 1$, as is the case when using the standard [[Policy Improvement Method for MDPs|policy improvement method]] (cf. the [[Discounted Policy Improvement Method for MDPs|discounted version]]).
+
+> [!tip]- Explanation
+> Notice that this sum is basically the sum of:
+> - $C_{ik}$, the cost of the first period, when the system goes from $i$ to another state $j$.
+> - $\sum_{j=0}^{m} p_{ij}\{k\} \ V_{j}^{n-1}\{r\}$, the weighted sum of the total costs that could be incurred by continuing from another state $j$ (and still following the same policy $r$).
+
+**Do not forget** that these calculations take into consideration a given **policy $r$**, which defines which decision $k$ is made in which state $i$.
diff --git a/Mathematics/Gambler's Ruin Problem.md b/Mathematics/Gambler's Ruin Problem.md
@@ -9,7 +9,7 @@ tags:
 
 ---
 
-Say we're studying a game between two players A and B. In it, there's a probability $p$ of winning and $q$ of losing; there can't be a draw.
+Say we're studying a game between two players A and B. In it, there's a probability $p$ of A winning and $q$ of B losing; there can't be a draw.
 
 The two players agree on giving each other a monetary unit upon losing. Player A starts with $K$ units, while B starts with $N-K$, adding up to a total of $N$. 
 

diff --git a/Mathematics/M-M-S Queue.md b/Mathematics/M-M-S Queue.md
@@ -0,0 +1,58 @@
+---
+date: 2024-05-18
+type: 🧠
+tags:
+  - MAC/6/PE
+---
+
+**Topics:** [[Queuing Model]] - [[Birth-Death Markov Process]]
+
+---
+
+_**(definition)**_
+
+An **$M/M/S$ queue** is a [[Queuing Model|queuing model]] where:
+
+- The arrivals are determined by a [[Homogeneous Poisson Process|Poisson process]] (i.e. they follow a [[Poisson Distribution|Poisson distribution]])
+- The service times distribute [[Exponential Distribution|exponentially]]
+- There are $S$ servers
+
+_**(observation)**_
+
+Notice that this is basically a [[Birth-Death Markov Process|birth-death Markov process]], since their arrivals also follow a Poisson process and the time between deaths also distributes exponentially.
+
+# Arrival and Service Rates
+
+> [!tip]- $\lambda$ and $\mu$
+> In a queuing model, $\lambda_{n}$ is the [[Expected Value|expected]] rate of arrivals per unit of time when there are $n$ clients in the system, while $\mu_{n}$ stands for the expected rate at which clients are being serviced per unit of time when there are $n$ clients in the system. 
+
+_**(theorem)**_
+
+A $M/M/S$ queue has the following rates:
+
+- If $S=1$, then:
+	- $\lambda_{n} = \lambda$ for $n = 0, 1, 2, \dots$
+	- $\mu_{n} = \mu$ for $n = 1, 2, 3 \dots$
+- If $S>1$, then:
+	- $\lambda_{n} = \lambda$ for $n = 0, 1, 2, \dots$
+	- $\mu_{n} = n\mu$ for $n = 1, 2, 3 \dots$
+		- Alternatively, $\mu_{n} = S\mu$ for $n = S, S+1, S+2, \dots$
+
+We consider an $M/M/S$ queue stable when its utilisation factor $\rho = \frac{\lambda}{S\mu}$ satisfies $\rho<1$ (i.e. $\lambda<\mu$).
+
+# Performance Measures for $M/M/1$
+
+_**(theorem)**_
+
+When $S = 1$, the model is $M/M/1$ and its [[Performance Measures for a Queuing Theory Model|performance measures]] are:
+
+- $\rho=\frac{\lambda}{\mu}$ (utilisation factor)
+- $C_{n} = \left( \frac{\lambda}{\mu} \right)^{2} = \rho^n$ for $n = 0, 1, 2, \dots$
+- $P_{n} = \rho^n(1-\rho)$ for $n = 0, 1, 2, \dots$ (the probability of having $n$ clients in the system)
+- $L = \frac{\lambda}{\mu-\lambda}$ (the [[Expected Value|expected]] amount of clients in the system)
+- $L_{q} = \rho L =\frac{\lambda^{2}}{\mu(\mu -\lambda)}$ (the expected amount of clients _in the queue_)
+- $W = \frac{1}{\mu-\lambda}$ (the expected wait time in the system)
+- $W_{q} = \rho W = \frac{\lambda}{\mu(\mu-\lambda)}$ (the expected wait time _in the queue_)
+
+Do note that this assumes that the system is stable (i.e. $\rho < 1$).
+
diff --git a/Mathematics/MDP as a Linear Programming Problem.md b/Mathematics/MDP as a Linear Programming Problem.md
@@ -0,0 +1,64 @@
+---
+date: 2024-05-18
+type: 🧠
+tags:
+  - MAC/6/PE
+---
+
+**Topics:** [[Markov Decision Process]] - [[Linear Programming Problem]]
+
+---
+
+Given an [[Markov Decision Process|MDP]], we can find the optimal [[Markov Decision Process Policy|policy]] by formulating it as a [[Linear Programming Problem|linear programming problem]]. 
+
+# Decision Variables
+
+## As Binary
+
+Let the following be our decision variables:
+
+$$
+D_{ik} =
+\begin{cases}
+1 & \text{if the decision } k \text{ is made in the state } i \\
+0 & \text{otherwise}
+\end{cases}
+$$
+
+Once solved, the matrix of the values these variables optimally take characterises the optimal policy. 
+
+## As Continuous
+
+Solving binary linear programming problems isn't as straightforward as it may normally be with continuous ones, so we can reinterpret these decision variables as:
+
+$$
+D_{ik} = \mathbb{P}(k \mid i)
+$$
+
+Id est, the probability of making the decision $k$ given that the state is in the [[State Set|state set]] $i$. 
+
+Under this reinterpretation, we'll be actually using other variables in the [[Objective Function|objective function]]: $Y_{ik}$. These represent the [[Unconditional Transition Probability|unconditional]] [[Steady State Probability|steady state probability]] of being in the state $i$ _and_ taking the decision $k$, satisfying:
+
+$$
+D_{ik} = \frac{Y_{ik}}{\sum_{k=1}^{K} Y_{ik}}
+$$
+
+# Model
+
+Finally, with these considerations, the linear programming model for an MDP is:
+
+$$
+\begin{cases}
+&\text{optimise } \mathbb{E}[C] = Z = \sum_{i=0}^{m} \sum_{k=1}^{K} C_{ik} Y_{ik} & \\[1em]
+& \text{s.t.} & \\[0.5em]
+&\sum_{i=0}^{m} \sum_{k=1}^{k} Y_{ik} = 1 \\[0.5em]
+&\sum_{k=1}^{K} Y_{jk} - \sum_{i=0}^{m} \sum_{k=1}^{K} Y_{ik}\ p_{ij} \{k\} = 0 \quad \text{for } j = 0, 1, \dots m
+\end{cases}
+$$
+
+…where:
+
+- $C_{ik}$ denotes the cost of making the decision $k$ in the state $i$
+- $p_{ij}\{k\}$ denotes the [[Transition Matrix|transition probability]] from state $i$ to $j$ when making the decision $k$
+
+Evidently, we can find the optimal $Y_{ik}$ by solving this linear programming problem through any of the applicable methods ([[Simplex Method|simplex]], etc.). Then we can determine the optimal policy by calculating the $D_{ik}$ as established above.
diff --git a/Mathematics/Markov Decision Process Policy.md b/Mathematics/Markov Decision Process Policy.md
@@ -0,0 +1,26 @@
+---
+date: 2024-05-18
+type: 🧠
+tags:
+  - MAC/6/PE
+aliases:
+  - "policy"
+---
+
+**Topics:** [[Markov Decision Process]]
+
+---
+
+_**(definition)**_
+
+In the context of a [[Markov Decision Process|Markov decision process]] (MDP), a **policy** is a [[Function|mapping]] of every one of the MDP's [[State Set|states]] to a a specific decision. 
+
+In other words, a **policy** tells us which decisions are taken on which state. 
+
+Policies are normally denoted by $r$, while the set of possible policies in a given MDP is denoted by $R$. 
+
+We can denote the decision taken in a state $i$ according to the policy $r$ with $d_{i}\{r\}$:
+
+$$
+r = \left(d_{0}\{r\}, d_{1}\{r\}, \dots \right)
+$$