diff --git a/vignettes/Optimalgo.Rmd b/vignettes/Optimalgo.Rmd
index bbaeb2b..a0c3c65 100644
--- a/vignettes/Optimalgo.Rmd
+++ b/vignettes/Optimalgo.Rmd
@@ -24,7 +24,7 @@ options(digits = 3)
 ```
 
 
-# Quick overview of main optimization methods
+# 1. Quick overview of main optimization methods
 
 We present very quickly the main optimization methods.
 Please refer to **Numerical Optimization (Nocedal \& Wright, 2006)** 
@@ -32,7 +32,7 @@ or **Numerical Optimization: theoretical and practical aspects
 (Bonnans, Gilbert, Lemarechal \& Sagastizabal, 2006)** for a good introduction. 
 We consider the following problem $\min_x f(x)$ for $x\in\mathbb{R}^n$.
 
-## Derivative-free optimization methods
+## 1.1. Derivative-free optimization methods
 The Nelder-Mead method is one of the most well known derivative-free methods
 that use only values of $f$ to search for the minimum.
 It consists in building a simplex of $n+1$ points and moving/shrinking
@@ -67,12 +67,12 @@ this simplex into the good direction.
 The Nelder-Mead method is available in `optim`.
 By default, in `optim`, $\alpha=1$, $\beta=1/2$, $\gamma=2$ and $\sigma=1/2$.
 
-## Hessian-free optimization methods
+## 1.2. Hessian-free optimization methods
 
 For smooth non-linear function, the following method is generally used:
 a local method combined with line search work on the scheme $x_{k+1} =x_k + t_k d_{k}$, where the local method will specify the direction $d_k$ and the line search will specify the step size $t_k \in \mathbb{R}$.	
 
-### Computing the direction $d_k$
+### 1.2.1. Computing the direction $d_k$
 A desirable property for $d_k$ is that $d_k$ ensures a descent $f(x_{k+1}) < f(x_{k})$. 
 Newton methods are such that $d_k$ minimizes a local quadratic approximation of $f$ based on a Taylor expansion, that is  $q_f(d) = f(x_k) + g(x_k)^Td +\frac{1}{2} d^T H(x_k) d$ where $g$ denotes the gradient and $H$ denotes the Hessian.
 
@@ -121,7 +121,7 @@ See Yuan (2006) for other well-known schemes such as Hestenses-Stiefel, Dixon or
 The three updates (Fletcher-Reeves, Polak-Ribiere, Beale-Sorenson) of the (non-linear) conjugate gradient are available in `optim`.
 
 
-### Computing the stepsize $t_k$
+### 1.2.2. Computing the stepsize $t_k$
 
 Let $\phi_k(t) = f(x_k + t d_k)$ for a given direction/iterate $(d_k, x_k)$. 
 We need to find conditions to find a satisfactory stepsize $t_k$. In literature, we consider the  descent condition: $\phi_k'(0) < 0$
@@ -136,7 +136,7 @@ Nocedal \& Wright (2006) presents a backtracking (or geometric) approach satisfy
 This backtracking linesearch is available in `optim`.
 
 
-## Benchmark
+## 1.3. Benchmark
 
 To simplify the benchmark of optimization methods, we create a `fitbench` function that computes
 the desired estimation method for all optimization methods. 
@@ -152,12 +152,12 @@ fitbench <- fitdistrplus:::fitbench
 
 
 
-# Numerical illustration with the beta distribution
+# 2. Numerical illustration with the beta distribution
 
 
-## Log-likelihood function and its gradient for beta distribution
+## 2.1. Log-likelihood function and its gradient for beta distribution
 
-### Theoretical value
+### 2.1.1. Theoretical value
 The density of the beta distribution is given by 
 $$
 f(x; \delta_1,\delta_2) = \frac{x^{\delta_1-1}(1-x)^{\delta_2-1}}{\beta(\delta_1,\delta_2)},
@@ -179,7 +179,7 @@ $$
 where $\psi(x)=\Gamma'(x)/\Gamma(x)$ is the digamma function, 
 see the NIST Handbook of mathematical functions https://dlmf.nist.gov/.
 
-### `R` implementation
+### 2.1.2. `R` implementation
 As in the `fitdistrplus` package, we minimize the opposite of the log-likelihood: 
 we implement the opposite of the gradient in `grlnL`. Both the log-likelihood and its gradient
 are not exported.
@@ -191,7 +191,7 @@ grlnlbeta <- fitdistrplus:::grlnlbeta
 
 
 
-## Random generation of a sample
+## 2.2. Random generation of a sample
 
 ```{r, fig.height=4, fig.width=4}
 #(1) beta distribution
@@ -204,7 +204,7 @@ curve(dbeta(x, 3, 3/4), col="green", add=TRUE)
 legend("topleft", lty=1, col=c("red","green"), legend=c("empirical", "theoretical"), bty="n")
 ```
 
-## Fit Beta distribution
+## 2.3 Fit Beta distribution
 
 Define control parameters.
 ```{r}
@@ -243,7 +243,7 @@ numerically approximated one).
 
 
 
-## Results of the numerical investigation
+## 2.4. Results of the numerical investigation
 Results are displayed in the following tables:
 (1) the original parametrization without specifying the gradient (`-B` stands for bounded version),
 (2) the original parametrization with the (true) gradient (`-B` stands for bounded version and `-G` for gradient),
@@ -289,12 +289,12 @@ plot(b1, trueval = c(3, 3/4))
 ```
 
 
-# Numerical illustration with the negative binomial distribution
+# 3. Numerical illustration with the negative binomial distribution
 
 
-## Log-likelihood function and its gradient for negative binomial distribution
+## 3.1. Log-likelihood function and its gradient for negative binomial distribution
 
-### Theoretical value
+### 3.1.1. Theoretical value
 The p.m.f. of the Negative binomial distribution is given by 
 $$
 f(x; m,p) = \frac{\Gamma(x+m)}{\Gamma(m)x!} p^m (1-p)^x,
@@ -325,7 +325,7 @@ $$
 where $\psi(x)=\Gamma'(x)/\Gamma(x)$ is the digamma function, 
 see the NIST Handbook of mathematical functions https://dlmf.nist.gov/.
 
-### `R` implementation
+### 3.1.2. `R` implementation
 As in the `fitdistrplus` package, we minimize the opposite of the log-likelihood: we implement the opposite of the gradient in `grlnL`.
 ```{r}
 grlnlNB <- function(x, obs, ...)
@@ -342,7 +342,7 @@ grlnlNB <- function(x, obs, ...)
 
 
 
-## Random generation of a sample
+## 3.2. Random generation of a sample
 
 ```{r, fig.height=4, fig.width=4}
 #(2) negative binomial distribution
@@ -358,7 +358,7 @@ legend("topright", lty = 1, col = c("red", "green"),
        legend = c("empirical", "theoretical"), bty="n")
 ```
 
-## Fit a negative binomial distribution
+## 3.3. Fit a negative binomial distribution
 
 Define control parameters and make the benchmark.
 ```{r}
@@ -399,7 +399,7 @@ to minimize and its gradient (whether it is the theoretical gradient or the
 numerically approximated one).
 
 
-## Results of the numerical investigation
+## 3.4. Results of the numerical investigation
 Results are displayed in the following tables:
 (1) the original parametrization without specifying the gradient (`-B` stands for bounded version),
 (2) the original parametrization with the (true) gradient (`-B` stands for bounded version and `-G` for gradient),
@@ -447,7 +447,7 @@ plot(b1, trueval=trueval[c("size", "mu")])
 
 
 
-# Conclusion
+# 4. Conclusion
 
 Based on the two previous examples, we observe that all methods converge to the same
 point. This is reassuring.  
diff --git a/vignettes/starting-values.Rmd b/vignettes/starting-values.Rmd
index c332650..4621b68 100644
--- a/vignettes/starting-values.Rmd
+++ b/vignettes/starting-values.Rmd
@@ -34,25 +34,25 @@ distributions and refer to the bibliograhy sections for
 details.
 
 
-# Discrete distributions
+# 1. Discrete distributions
 
-## Base R distribution
+## 1.1. Base R distribution
 
-### Geometric distribution
+### 1.1.1. Geometric distribution
 
 The MME is used $\hat p=1/(1+m_1)$.
 
 
-### Negative binomial distribution
+### 1.1.2. Negative binomial distribution
 
 The MME is used $\hat n = m_1^2/(\mu_2-m_1)$.
 
 
-### Poisson distribution
+### 1.1.3. Poisson distribution
 
 Both the MME and the MLE is $\hat \lambda = m_1$.
 
-### Binomial distribution
+### 1.1.4. Binomial distribution
 
 The MME is used 
 $$
@@ -68,7 +68,7 @@ $$
 
 
 
-## logarithmic distribution
+## 1.2. logarithmic distribution
 
 The expectation simplifies for small values of $p$
 $$
@@ -82,18 +82,18 @@ $$
 \hat p = 1-1/m_1.
 $$
 
-## Zero truncated distributions
+## 1.3. Zero truncated distributions
 
 This distribution are the distribution of $X\vert X>0$ when $X$ follows a particular discrete distributions.
 Hence the initial estimate are the one used for base R on sample $x-1$.
 
-## Zero modified distributions
+## 1.4. Zero modified distributions
 
 The MLE of the probability parameter is the empirical mass at 0 $\hat p_0=\frac1n \sum_i 1_{x_i=0}$.
 For other estimators we use the classical estimator with probability
 parameter $1-\hat p_0$.
 
-## Poisson inverse Gaussian distribution
+## 1.5. Poisson inverse Gaussian distribution
 
 The first two moments are
 $$
@@ -107,17 +107,17 @@ $$
 $$
 
 
-# Continuous distributions
+# 2. Continuous distributions
 
-## Normal distribution
+## 2.1. Normal distribution
 
 The MLE is the MME so we use the empirical mean and variance.
 
-## Lognormal distribution
+## 2.2. Lognormal distribution
 
 The log sample follows a normal distribution, so same as normal on the log sample.
 
-## Beta distribution (of the first kind)
+## 2.3. Beta distribution (of the first kind)
 
 The density function for a beta $\mathcal Be(a,b)$ is
 $$
@@ -130,13 +130,13 @@ The initial estimate is the MME
 (\#eq:betaguessestimator)
 \end{equation}
 
-## Other continuous distribution in `actuar`
+## 2.4. Other continuous distribution in `actuar`
 
-### Log-gamma
+### 2.4.1. Log-gamma
 
 Use the gamma initial values on the sample $\log(x)$
 
-### Gumbel
+### 2.4.2. Gumbel
 
 The distribution function is
 $$
@@ -166,7 +166,7 @@ $$
 \hat\alpha = \hat\theta\log(\log(2)) + q_2.
 $$
 
-### Inverse Gaussian distribution
+### 2.4.3. Inverse Gaussian distribution
 
 The moments of this distribution are
 $$
@@ -175,7 +175,7 @@ Var[X] = \mu^3\phi.
 $$
 Hence the initial estimate are $\hat\mu=m_1$, $\hat\phi=\mu_2/m_1^3$.
 
-### Generalized beta
+### 2.4.4. Generalized beta
 
 This is the distribution of $\theta X^{1/\tau}$ when $X$ is beta distributed $\mathcal Be(a,b)$
 The moments are
@@ -205,7 +205,7 @@ then we use beta initial estimate on sample
 $(\frac{x_i}{\hat\theta})^{\hat\tau}$.
 
 
-## Feller-Pareto family
+## 2.5. Feller-Pareto family
 
 
 The Feller-Pareto distribution is the distribution $X=\mu+\theta(1/B-1)^{1/\gamma}$ when
@@ -314,7 +314,7 @@ Neglecting the value $\gamma$ on the right-hand side we obtain
 \end{equation}
 
 
-### Transformed beta
+### 2.5.1. Transformed beta
 
 This is the Feller-Pareto with $\mu=0$. 
 So the first component of \@ref(eq:fellerparetogradient) simplifies to with
@@ -358,7 +358,7 @@ Similar to Feller-Pareto, we set
 (\#eq:fellerparetogammahat)
 \end{equation}
 
-### Generalized Pareto
+### 2.5.2. Generalized Pareto
 
 This is the Feller-Pareto with $\mu=0$ $\gamma=1$. 
 So the first component of \@ref(eq:fellerparetogradient) simplifies to with
@@ -387,7 +387,7 @@ $$
 with initial values of a beta distribution which is based on MME
 (\@ref(eq:betaguessestimator)).
 
-### Burr
+### 2.5.3. Burr
 
 Burr is a Feller-Pareto distribution with $\mu=0$, $\tau=1$.
 
@@ -430,7 +430,7 @@ We use for $\hat\gamma$ \@ref(eq:fellerparetogammahat) with $\tau=1$ and $\alpha
 
 
 
-### Loglogistic
+### 2.5.4. Loglogistic
 
 Loglogistic is a Feller-Pareto distribution with $\mu=0$, $\tau=1$, $\alpha=1$.
 The survival function is
@@ -464,7 +464,7 @@ $$
 (\#eq:llogisthetahat)
 \end{equation}
 
-### Paralogistic
+### 2.5.5. Paralogistic
 
 Paralogistic is a Feller-Pareto distribution with $\mu=0$, $\tau=1$, $\alpha=\gamma$.
 The survival function is
@@ -533,19 +533,19 @@ Initial estimators are
 \end{equation}
 
 
-### Inverse Burr
+### 2.5.6. Inverse Burr
 
 Use Burr estimate on the sample $1/x$
 
-### Inverse paralogistic
+### 2.5.7. Inverse paralogistic
 
 Use paralogistic estimate on the sample $1/x$
 
-### Inverse pareto
+### 2.5.8. Inverse pareto
 
 Use pareto estimate on the sample $1/x$
 
-### Pareto IV
+### 2.5.9. Pareto IV
 
 The survival function is 
 $$
@@ -625,7 +625,7 @@ $\hat\alpha$ from \@ref(eq:pareto4alpharelation) with $\hat\mu$, $\hat\theta$ an
 
 
 
-### Pareto III
+### 2.5.10. Pareto III
 
 Pareto III corresponds to Pareto IV with $\alpha=1$. 
 \begin{equation}
@@ -661,7 +661,7 @@ $\hat\gamma$ from \eqref{eq:pareto3:gamma:relation},
 $\hat\theta$ from \eqref{eq:pareto3:theta:relation} with $\hat\gamma$.
 
 
-### Pareto II
+### 2.5.11. Pareto II
 
 Pareto II corresponds to Pareto IV with $\gamma=1$. 
 
@@ -682,7 +682,7 @@ $\hat\theta$ from \eqref{eq:pareto4:theta:relation} with $\alpha^\star$ and $\ga
 $\hat\alpha$ from \eqref{eq:pareto4:alpha:relation} with $\hat\mu$, $\hat\theta$ and $\gamma=1$,
 
 
-### Pareto I
+### 2.5.12. Pareto I
 
 Pareto I corresponds to Pareto IV with $\gamma=1$, $\mu=\theta$. 
 
@@ -705,7 +705,7 @@ $\hat\alpha$ from \eqref{eq:pareto1:alpha:mu:relation}.
 
 
 
-### Pareto
+### 2.5.13. Pareto
 
 Pareto corresponds to Pareto IV with $\gamma=1$, $\mu=0$. 
 \begin{equation}
@@ -727,9 +727,9 @@ with $m_i$ are empirical raw moment of order $i$,
 $\hat\theta$ from \eqref{eq:pareto4:theta:relation} with $\alpha^\star$ and $\gamma=1$,
 $\hat\alpha$ from \eqref{eq:pareto4:alpha:relation} with $\mu=0$, $\hat\theta$ and $\gamma=1$.
 
-## Transformed gamma family
+## 2.6. Transformed gamma family
 
-### Transformed gamma distribution
+### 2.6.1. Transformed gamma distribution
 
 The log-likelihood is given by
 $$
@@ -760,7 +760,7 @@ $$
 \hat\tau =  \frac{\psi(\hat\alpha)}{\frac1n\sum_i \log(x_i/\hat\theta) }.
 $$
 
-### gamma distribution
+### 2.6.2. gamma distribution
 
 Transformed gamma with $\tau=1$
 
@@ -772,7 +772,7 @@ We compute the moment-estimator given by
 \end{equation}
 
 
-### Weibull distribution
+### 2.6.3. Weibull distribution
 
 Transformed gamma with $\alpha=1$
 
@@ -803,18 +803,18 @@ with $p_1=1/4$, $p_2=3/4$, $p_3=1/2$, $x_i$ corresponding empirical quantiles.
 
 Initial parameters are $\tilde\tau$ and $\tilde\theta$ unless the empirical quantiles $x_1=x_2$, in that case we use $\hat\tau$, $\hat\theta$.
 
-### Exponential distribution
+### 2.6.4. Exponential distribution
 
 The MLE is the MME
 $\hat\lambda = 1/m_1.$
 
-## Inverse transformed gamma family
+## 2.7. Inverse transformed gamma family
 
-### Inverse transformed gamma distribution
+### 2.7.1. Inverse transformed gamma distribution
 
 Same as transformed gamma distribution with $(1/x_i)_i$.
 
-### Inverse gamma distribution
+### 2.7.2. Inverse gamma distribution
 
 We compute moment-estimator as 
 $$
@@ -822,31 +822,31 @@ $$
 \hat\theta= m_1m_2/(m_2-m_1^2).
 $$
 
-### Inverse Weibull distribution
+### 2.7.3. Inverse Weibull distribution
 
 We use the QME.
 
-### Inverse exponential 
+### 2.7.4. Inverse exponential 
 
 Same as transformed gamma distribution with $(1/x_i)_i$.
 
-# Bibliography
+# 3. Bibliography
 
-## General books
+## 3.1. General books
 
 - N. L. Johnson, S. Kotz, N. Balakrishnan (1994). Continuous univariate distributions, Volume 1, Wiley. 
 - N. L. Johnson, S. Kotz, N. Balakrishnan (1995). Continuous univariate distributions, Volume 2, Wiley. 
 - N. L. Johnson, A. W. Kemp, S. Kotz (2008). Univariate discrete distributions, Wiley. 
 - G. Wimmer (1999),  Thesaurus of univariate discrete probability distributions.
 
-## Books dedicated to a distribution family
+## 3.2. Books dedicated to a distribution family
 
 - M. Ahsanullah, B.M. Golam Kibria, M. Shakil (2014). Normal and Student's t Distributions and Their Applications, Springer. 
 - B. C. Arnold (2010). Pareto Distributions, Chapman and Hall. 
 - A. Azzalini (2013). The Skew-Normal and Related Families. 
 - N. Balakrishnan (2014). Handbook of the Logistic Distribution, CRC Press. 
 
-## Books with applications
+## 3.3. Books with applications
 
 - C. Forbes, M. Evans, N. Hastings, B. Peacock (2011). Statistical Distributions, Wiley. 
 - Z. A. Karian, E. J. Dudewicz, K. Shimizu (2010). Handbook of Fitting Statistical Distributions with R, CRC Press.