diff --git a/vignettes/Optimalgo.Rmd b/vignettes/Optimalgo.Rmd index bbaeb2b..a0c3c65 100644 --- a/vignettes/Optimalgo.Rmd +++ b/vignettes/Optimalgo.Rmd @@ -24,7 +24,7 @@ options(digits = 3) ``` -# Quick overview of main optimization methods +# 1. Quick overview of main optimization methods We present very quickly the main optimization methods. Please refer to **Numerical Optimization (Nocedal \& Wright, 2006)** @@ -32,7 +32,7 @@ or **Numerical Optimization: theoretical and practical aspects (Bonnans, Gilbert, Lemarechal \& Sagastizabal, 2006)** for a good introduction. We consider the following problem $\min_x f(x)$ for $x\in\mathbb{R}^n$. -## Derivative-free optimization methods +## 1.1. Derivative-free optimization methods The Nelder-Mead method is one of the most well known derivative-free methods that use only values of $f$ to search for the minimum. It consists in building a simplex of $n+1$ points and moving/shrinking @@ -67,12 +67,12 @@ this simplex into the good direction. The Nelder-Mead method is available in `optim`. By default, in `optim`, $\alpha=1$, $\beta=1/2$, $\gamma=2$ and $\sigma=1/2$. -## Hessian-free optimization methods +## 1.2. Hessian-free optimization methods For smooth non-linear function, the following method is generally used: a local method combined with line search work on the scheme $x_{k+1} =x_k + t_k d_{k}$, where the local method will specify the direction $d_k$ and the line search will specify the step size $t_k \in \mathbb{R}$. -### Computing the direction $d_k$ +### 1.2.1. Computing the direction $d_k$ A desirable property for $d_k$ is that $d_k$ ensures a descent $f(x_{k+1}) < f(x_{k})$. Newton methods are such that $d_k$ minimizes a local quadratic approximation of $f$ based on a Taylor expansion, that is $q_f(d) = f(x_k) + g(x_k)^Td +\frac{1}{2} d^T H(x_k) d$ where $g$ denotes the gradient and $H$ denotes the Hessian. @@ -121,7 +121,7 @@ See Yuan (2006) for other well-known schemes such as Hestenses-Stiefel, Dixon or The three updates (Fletcher-Reeves, Polak-Ribiere, Beale-Sorenson) of the (non-linear) conjugate gradient are available in `optim`. -### Computing the stepsize $t_k$ +### 1.2.2. Computing the stepsize $t_k$ Let $\phi_k(t) = f(x_k + t d_k)$ for a given direction/iterate $(d_k, x_k)$. We need to find conditions to find a satisfactory stepsize $t_k$. In literature, we consider the descent condition: $\phi_k'(0) < 0$ @@ -136,7 +136,7 @@ Nocedal \& Wright (2006) presents a backtracking (or geometric) approach satisfy This backtracking linesearch is available in `optim`. -## Benchmark +## 1.3. Benchmark To simplify the benchmark of optimization methods, we create a `fitbench` function that computes the desired estimation method for all optimization methods. @@ -152,12 +152,12 @@ fitbench <- fitdistrplus:::fitbench -# Numerical illustration with the beta distribution +# 2. Numerical illustration with the beta distribution -## Log-likelihood function and its gradient for beta distribution +## 2.1. Log-likelihood function and its gradient for beta distribution -### Theoretical value +### 2.1.1. Theoretical value The density of the beta distribution is given by $$ f(x; \delta_1,\delta_2) = \frac{x^{\delta_1-1}(1-x)^{\delta_2-1}}{\beta(\delta_1,\delta_2)}, @@ -179,7 +179,7 @@ $$ where $\psi(x)=\Gamma'(x)/\Gamma(x)$ is the digamma function, see the NIST Handbook of mathematical functions https://dlmf.nist.gov/. -### `R` implementation +### 2.1.2. `R` implementation As in the `fitdistrplus` package, we minimize the opposite of the log-likelihood: we implement the opposite of the gradient in `grlnL`. Both the log-likelihood and its gradient are not exported. @@ -191,7 +191,7 @@ grlnlbeta <- fitdistrplus:::grlnlbeta -## Random generation of a sample +## 2.2. Random generation of a sample ```{r, fig.height=4, fig.width=4} #(1) beta distribution @@ -204,7 +204,7 @@ curve(dbeta(x, 3, 3/4), col="green", add=TRUE) legend("topleft", lty=1, col=c("red","green"), legend=c("empirical", "theoretical"), bty="n") ``` -## Fit Beta distribution +## 2.3 Fit Beta distribution Define control parameters. ```{r} @@ -243,7 +243,7 @@ numerically approximated one). -## Results of the numerical investigation +## 2.4. Results of the numerical investigation Results are displayed in the following tables: (1) the original parametrization without specifying the gradient (`-B` stands for bounded version), (2) the original parametrization with the (true) gradient (`-B` stands for bounded version and `-G` for gradient), @@ -289,12 +289,12 @@ plot(b1, trueval = c(3, 3/4)) ``` -# Numerical illustration with the negative binomial distribution +# 3. Numerical illustration with the negative binomial distribution -## Log-likelihood function and its gradient for negative binomial distribution +## 3.1. Log-likelihood function and its gradient for negative binomial distribution -### Theoretical value +### 3.1.1. Theoretical value The p.m.f. of the Negative binomial distribution is given by $$ f(x; m,p) = \frac{\Gamma(x+m)}{\Gamma(m)x!} p^m (1-p)^x, @@ -325,7 +325,7 @@ $$ where $\psi(x)=\Gamma'(x)/\Gamma(x)$ is the digamma function, see the NIST Handbook of mathematical functions https://dlmf.nist.gov/. -### `R` implementation +### 3.1.2. `R` implementation As in the `fitdistrplus` package, we minimize the opposite of the log-likelihood: we implement the opposite of the gradient in `grlnL`. ```{r} grlnlNB <- function(x, obs, ...) @@ -342,7 +342,7 @@ grlnlNB <- function(x, obs, ...) -## Random generation of a sample +## 3.2. Random generation of a sample ```{r, fig.height=4, fig.width=4} #(2) negative binomial distribution @@ -358,7 +358,7 @@ legend("topright", lty = 1, col = c("red", "green"), legend = c("empirical", "theoretical"), bty="n") ``` -## Fit a negative binomial distribution +## 3.3. Fit a negative binomial distribution Define control parameters and make the benchmark. ```{r} @@ -399,7 +399,7 @@ to minimize and its gradient (whether it is the theoretical gradient or the numerically approximated one). -## Results of the numerical investigation +## 3.4. Results of the numerical investigation Results are displayed in the following tables: (1) the original parametrization without specifying the gradient (`-B` stands for bounded version), (2) the original parametrization with the (true) gradient (`-B` stands for bounded version and `-G` for gradient), @@ -447,7 +447,7 @@ plot(b1, trueval=trueval[c("size", "mu")]) -# Conclusion +# 4. Conclusion Based on the two previous examples, we observe that all methods converge to the same point. This is reassuring. diff --git a/vignettes/starting-values.Rmd b/vignettes/starting-values.Rmd index c332650..4621b68 100644 --- a/vignettes/starting-values.Rmd +++ b/vignettes/starting-values.Rmd @@ -34,25 +34,25 @@ distributions and refer to the bibliograhy sections for details. -# Discrete distributions +# 1. Discrete distributions -## Base R distribution +## 1.1. Base R distribution -### Geometric distribution +### 1.1.1. Geometric distribution The MME is used $\hat p=1/(1+m_1)$. -### Negative binomial distribution +### 1.1.2. Negative binomial distribution The MME is used $\hat n = m_1^2/(\mu_2-m_1)$. -### Poisson distribution +### 1.1.3. Poisson distribution Both the MME and the MLE is $\hat \lambda = m_1$. -### Binomial distribution +### 1.1.4. Binomial distribution The MME is used $$ @@ -68,7 +68,7 @@ $$ -## logarithmic distribution +## 1.2. logarithmic distribution The expectation simplifies for small values of $p$ $$ @@ -82,18 +82,18 @@ $$ \hat p = 1-1/m_1. $$ -## Zero truncated distributions +## 1.3. Zero truncated distributions This distribution are the distribution of $X\vert X>0$ when $X$ follows a particular discrete distributions. Hence the initial estimate are the one used for base R on sample $x-1$. -## Zero modified distributions +## 1.4. Zero modified distributions The MLE of the probability parameter is the empirical mass at 0 $\hat p_0=\frac1n \sum_i 1_{x_i=0}$. For other estimators we use the classical estimator with probability parameter $1-\hat p_0$. -## Poisson inverse Gaussian distribution +## 1.5. Poisson inverse Gaussian distribution The first two moments are $$ @@ -107,17 +107,17 @@ $$ $$ -# Continuous distributions +# 2. Continuous distributions -## Normal distribution +## 2.1. Normal distribution The MLE is the MME so we use the empirical mean and variance. -## Lognormal distribution +## 2.2. Lognormal distribution The log sample follows a normal distribution, so same as normal on the log sample. -## Beta distribution (of the first kind) +## 2.3. Beta distribution (of the first kind) The density function for a beta $\mathcal Be(a,b)$ is $$ @@ -130,13 +130,13 @@ The initial estimate is the MME (\#eq:betaguessestimator) \end{equation} -## Other continuous distribution in `actuar` +## 2.4. Other continuous distribution in `actuar` -### Log-gamma +### 2.4.1. Log-gamma Use the gamma initial values on the sample $\log(x)$ -### Gumbel +### 2.4.2. Gumbel The distribution function is $$ @@ -166,7 +166,7 @@ $$ \hat\alpha = \hat\theta\log(\log(2)) + q_2. $$ -### Inverse Gaussian distribution +### 2.4.3. Inverse Gaussian distribution The moments of this distribution are $$ @@ -175,7 +175,7 @@ Var[X] = \mu^3\phi. $$ Hence the initial estimate are $\hat\mu=m_1$, $\hat\phi=\mu_2/m_1^3$. -### Generalized beta +### 2.4.4. Generalized beta This is the distribution of $\theta X^{1/\tau}$ when $X$ is beta distributed $\mathcal Be(a,b)$ The moments are @@ -205,7 +205,7 @@ then we use beta initial estimate on sample $(\frac{x_i}{\hat\theta})^{\hat\tau}$. -## Feller-Pareto family +## 2.5. Feller-Pareto family The Feller-Pareto distribution is the distribution $X=\mu+\theta(1/B-1)^{1/\gamma}$ when @@ -314,7 +314,7 @@ Neglecting the value $\gamma$ on the right-hand side we obtain \end{equation} -### Transformed beta +### 2.5.1. Transformed beta This is the Feller-Pareto with $\mu=0$. So the first component of \@ref(eq:fellerparetogradient) simplifies to with @@ -358,7 +358,7 @@ Similar to Feller-Pareto, we set (\#eq:fellerparetogammahat) \end{equation} -### Generalized Pareto +### 2.5.2. Generalized Pareto This is the Feller-Pareto with $\mu=0$ $\gamma=1$. So the first component of \@ref(eq:fellerparetogradient) simplifies to with @@ -387,7 +387,7 @@ $$ with initial values of a beta distribution which is based on MME (\@ref(eq:betaguessestimator)). -### Burr +### 2.5.3. Burr Burr is a Feller-Pareto distribution with $\mu=0$, $\tau=1$. @@ -430,7 +430,7 @@ We use for $\hat\gamma$ \@ref(eq:fellerparetogammahat) with $\tau=1$ and $\alpha -### Loglogistic +### 2.5.4. Loglogistic Loglogistic is a Feller-Pareto distribution with $\mu=0$, $\tau=1$, $\alpha=1$. The survival function is @@ -464,7 +464,7 @@ $$ (\#eq:llogisthetahat) \end{equation} -### Paralogistic +### 2.5.5. Paralogistic Paralogistic is a Feller-Pareto distribution with $\mu=0$, $\tau=1$, $\alpha=\gamma$. The survival function is @@ -533,19 +533,19 @@ Initial estimators are \end{equation} -### Inverse Burr +### 2.5.6. Inverse Burr Use Burr estimate on the sample $1/x$ -### Inverse paralogistic +### 2.5.7. Inverse paralogistic Use paralogistic estimate on the sample $1/x$ -### Inverse pareto +### 2.5.8. Inverse pareto Use pareto estimate on the sample $1/x$ -### Pareto IV +### 2.5.9. Pareto IV The survival function is $$ @@ -625,7 +625,7 @@ $\hat\alpha$ from \@ref(eq:pareto4alpharelation) with $\hat\mu$, $\hat\theta$ an -### Pareto III +### 2.5.10. Pareto III Pareto III corresponds to Pareto IV with $\alpha=1$. \begin{equation} @@ -661,7 +661,7 @@ $\hat\gamma$ from \eqref{eq:pareto3:gamma:relation}, $\hat\theta$ from \eqref{eq:pareto3:theta:relation} with $\hat\gamma$. -### Pareto II +### 2.5.11. Pareto II Pareto II corresponds to Pareto IV with $\gamma=1$. @@ -682,7 +682,7 @@ $\hat\theta$ from \eqref{eq:pareto4:theta:relation} with $\alpha^\star$ and $\ga $\hat\alpha$ from \eqref{eq:pareto4:alpha:relation} with $\hat\mu$, $\hat\theta$ and $\gamma=1$, -### Pareto I +### 2.5.12. Pareto I Pareto I corresponds to Pareto IV with $\gamma=1$, $\mu=\theta$. @@ -705,7 +705,7 @@ $\hat\alpha$ from \eqref{eq:pareto1:alpha:mu:relation}. -### Pareto +### 2.5.13. Pareto Pareto corresponds to Pareto IV with $\gamma=1$, $\mu=0$. \begin{equation} @@ -727,9 +727,9 @@ with $m_i$ are empirical raw moment of order $i$, $\hat\theta$ from \eqref{eq:pareto4:theta:relation} with $\alpha^\star$ and $\gamma=1$, $\hat\alpha$ from \eqref{eq:pareto4:alpha:relation} with $\mu=0$, $\hat\theta$ and $\gamma=1$. -## Transformed gamma family +## 2.6. Transformed gamma family -### Transformed gamma distribution +### 2.6.1. Transformed gamma distribution The log-likelihood is given by $$ @@ -760,7 +760,7 @@ $$ \hat\tau = \frac{\psi(\hat\alpha)}{\frac1n\sum_i \log(x_i/\hat\theta) }. $$ -### gamma distribution +### 2.6.2. gamma distribution Transformed gamma with $\tau=1$ @@ -772,7 +772,7 @@ We compute the moment-estimator given by \end{equation} -### Weibull distribution +### 2.6.3. Weibull distribution Transformed gamma with $\alpha=1$ @@ -803,18 +803,18 @@ with $p_1=1/4$, $p_2=3/4$, $p_3=1/2$, $x_i$ corresponding empirical quantiles. Initial parameters are $\tilde\tau$ and $\tilde\theta$ unless the empirical quantiles $x_1=x_2$, in that case we use $\hat\tau$, $\hat\theta$. -### Exponential distribution +### 2.6.4. Exponential distribution The MLE is the MME $\hat\lambda = 1/m_1.$ -## Inverse transformed gamma family +## 2.7. Inverse transformed gamma family -### Inverse transformed gamma distribution +### 2.7.1. Inverse transformed gamma distribution Same as transformed gamma distribution with $(1/x_i)_i$. -### Inverse gamma distribution +### 2.7.2. Inverse gamma distribution We compute moment-estimator as $$ @@ -822,31 +822,31 @@ $$ \hat\theta= m_1m_2/(m_2-m_1^2). $$ -### Inverse Weibull distribution +### 2.7.3. Inverse Weibull distribution We use the QME. -### Inverse exponential +### 2.7.4. Inverse exponential Same as transformed gamma distribution with $(1/x_i)_i$. -# Bibliography +# 3. Bibliography -## General books +## 3.1. General books - N. L. Johnson, S. Kotz, N. Balakrishnan (1994). Continuous univariate distributions, Volume 1, Wiley. - N. L. Johnson, S. Kotz, N. Balakrishnan (1995). Continuous univariate distributions, Volume 2, Wiley. - N. L. Johnson, A. W. Kemp, S. Kotz (2008). Univariate discrete distributions, Wiley. - G. Wimmer (1999), Thesaurus of univariate discrete probability distributions. -## Books dedicated to a distribution family +## 3.2. Books dedicated to a distribution family - M. Ahsanullah, B.M. Golam Kibria, M. Shakil (2014). Normal and Student's t Distributions and Their Applications, Springer. - B. C. Arnold (2010). Pareto Distributions, Chapman and Hall. - A. Azzalini (2013). The Skew-Normal and Related Families. - N. Balakrishnan (2014). Handbook of the Logistic Distribution, CRC Press. -## Books with applications +## 3.3. Books with applications - C. Forbes, M. Evans, N. Hastings, B. Peacock (2011). Statistical Distributions, Wiley. - Z. A. Karian, E. J. Dudewicz, K. Shimizu (2010). Handbook of Fitting Statistical Distributions with R, CRC Press.