Skip to content

Commit

Permalink
Update two vignettes: manual addition of section numbering
Browse files Browse the repository at this point in the history
  • Loading branch information
aursiber committed Oct 1, 2024
1 parent 0a25d1d commit 26ef761
Show file tree
Hide file tree
Showing 2 changed files with 68 additions and 68 deletions.
42 changes: 21 additions & 21 deletions vignettes/Optimalgo.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,15 @@ options(digits = 3)
```


# Quick overview of main optimization methods
# 1. Quick overview of main optimization methods

We present very quickly the main optimization methods.
Please refer to **Numerical Optimization (Nocedal \& Wright, 2006)**
or **Numerical Optimization: theoretical and practical aspects
(Bonnans, Gilbert, Lemarechal \& Sagastizabal, 2006)** for a good introduction.
We consider the following problem $\min_x f(x)$ for $x\in\mathbb{R}^n$.

## Derivative-free optimization methods
## 1.1. Derivative-free optimization methods
The Nelder-Mead method is one of the most well known derivative-free methods
that use only values of $f$ to search for the minimum.
It consists in building a simplex of $n+1$ points and moving/shrinking
Expand Down Expand Up @@ -67,12 +67,12 @@ this simplex into the good direction.
The Nelder-Mead method is available in `optim`.
By default, in `optim`, $\alpha=1$, $\beta=1/2$, $\gamma=2$ and $\sigma=1/2$.

## Hessian-free optimization methods
## 1.2. Hessian-free optimization methods

For smooth non-linear function, the following method is generally used:
a local method combined with line search work on the scheme $x_{k+1} =x_k + t_k d_{k}$, where the local method will specify the direction $d_k$ and the line search will specify the step size $t_k \in \mathbb{R}$.

### Computing the direction $d_k$
### 1.2.1. Computing the direction $d_k$
A desirable property for $d_k$ is that $d_k$ ensures a descent $f(x_{k+1}) < f(x_{k})$.
Newton methods are such that $d_k$ minimizes a local quadratic approximation of $f$ based on a Taylor expansion, that is $q_f(d) = f(x_k) + g(x_k)^Td +\frac{1}{2} d^T H(x_k) d$ where $g$ denotes the gradient and $H$ denotes the Hessian.

Expand Down Expand Up @@ -121,7 +121,7 @@ See Yuan (2006) for other well-known schemes such as Hestenses-Stiefel, Dixon or
The three updates (Fletcher-Reeves, Polak-Ribiere, Beale-Sorenson) of the (non-linear) conjugate gradient are available in `optim`.


### Computing the stepsize $t_k$
### 1.2.2. Computing the stepsize $t_k$

Let $\phi_k(t) = f(x_k + t d_k)$ for a given direction/iterate $(d_k, x_k)$.
We need to find conditions to find a satisfactory stepsize $t_k$. In literature, we consider the descent condition: $\phi_k'(0) < 0$
Expand All @@ -136,7 +136,7 @@ Nocedal \& Wright (2006) presents a backtracking (or geometric) approach satisfy
This backtracking linesearch is available in `optim`.


## Benchmark
## 1.3. Benchmark

To simplify the benchmark of optimization methods, we create a `fitbench` function that computes
the desired estimation method for all optimization methods.
Expand All @@ -152,12 +152,12 @@ fitbench <- fitdistrplus:::fitbench



# Numerical illustration with the beta distribution
# 2. Numerical illustration with the beta distribution


## Log-likelihood function and its gradient for beta distribution
## 2.1. Log-likelihood function and its gradient for beta distribution

### Theoretical value
### 2.1.1. Theoretical value
The density of the beta distribution is given by
$$
f(x; \delta_1,\delta_2) = \frac{x^{\delta_1-1}(1-x)^{\delta_2-1}}{\beta(\delta_1,\delta_2)},
Expand All @@ -179,7 +179,7 @@ $$
where $\psi(x)=\Gamma'(x)/\Gamma(x)$ is the digamma function,
see the NIST Handbook of mathematical functions https://dlmf.nist.gov/.

### `R` implementation
### 2.1.2. `R` implementation
As in the `fitdistrplus` package, we minimize the opposite of the log-likelihood:
we implement the opposite of the gradient in `grlnL`. Both the log-likelihood and its gradient
are not exported.
Expand All @@ -191,7 +191,7 @@ grlnlbeta <- fitdistrplus:::grlnlbeta



## Random generation of a sample
## 2.2. Random generation of a sample

```{r, fig.height=4, fig.width=4}
#(1) beta distribution
Expand All @@ -204,7 +204,7 @@ curve(dbeta(x, 3, 3/4), col="green", add=TRUE)
legend("topleft", lty=1, col=c("red","green"), legend=c("empirical", "theoretical"), bty="n")
```

## Fit Beta distribution
## 2.3 Fit Beta distribution

Define control parameters.
```{r}
Expand Down Expand Up @@ -243,7 +243,7 @@ numerically approximated one).



## Results of the numerical investigation
## 2.4. Results of the numerical investigation
Results are displayed in the following tables:
(1) the original parametrization without specifying the gradient (`-B` stands for bounded version),
(2) the original parametrization with the (true) gradient (`-B` stands for bounded version and `-G` for gradient),
Expand Down Expand Up @@ -289,12 +289,12 @@ plot(b1, trueval = c(3, 3/4))
```


# Numerical illustration with the negative binomial distribution
# 3. Numerical illustration with the negative binomial distribution


## Log-likelihood function and its gradient for negative binomial distribution
## 3.1. Log-likelihood function and its gradient for negative binomial distribution

### Theoretical value
### 3.1.1. Theoretical value
The p.m.f. of the Negative binomial distribution is given by
$$
f(x; m,p) = \frac{\Gamma(x+m)}{\Gamma(m)x!} p^m (1-p)^x,
Expand Down Expand Up @@ -325,7 +325,7 @@ $$
where $\psi(x)=\Gamma'(x)/\Gamma(x)$ is the digamma function,
see the NIST Handbook of mathematical functions https://dlmf.nist.gov/.

### `R` implementation
### 3.1.2. `R` implementation
As in the `fitdistrplus` package, we minimize the opposite of the log-likelihood: we implement the opposite of the gradient in `grlnL`.
```{r}
grlnlNB <- function(x, obs, ...)
Expand All @@ -342,7 +342,7 @@ grlnlNB <- function(x, obs, ...)



## Random generation of a sample
## 3.2. Random generation of a sample

```{r, fig.height=4, fig.width=4}
#(2) negative binomial distribution
Expand All @@ -358,7 +358,7 @@ legend("topright", lty = 1, col = c("red", "green"),
legend = c("empirical", "theoretical"), bty="n")
```

## Fit a negative binomial distribution
## 3.3. Fit a negative binomial distribution

Define control parameters and make the benchmark.
```{r}
Expand Down Expand Up @@ -399,7 +399,7 @@ to minimize and its gradient (whether it is the theoretical gradient or the
numerically approximated one).


## Results of the numerical investigation
## 3.4. Results of the numerical investigation
Results are displayed in the following tables:
(1) the original parametrization without specifying the gradient (`-B` stands for bounded version),
(2) the original parametrization with the (true) gradient (`-B` stands for bounded version and `-G` for gradient),
Expand Down Expand Up @@ -447,7 +447,7 @@ plot(b1, trueval=trueval[c("size", "mu")])



# Conclusion
# 4. Conclusion

Based on the two previous examples, we observe that all methods converge to the same
point. This is reassuring.
Expand Down
Loading

0 comments on commit 26ef761

Please sign in to comment.