From e63d5494f2733bb793e31b9ed64023fa168d0cec Mon Sep 17 00:00:00 2001 From: Brian Ward Date: Thu, 26 Sep 2024 10:45:16 -0400 Subject: [PATCH 1/6] Start documentation of sum_to_zero_vector --- src/reference-manual/expressions.qmd | 15 ++++---- src/reference-manual/transforms.qmd | 45 ++++++++++++++++++++++- src/reference-manual/types.qmd | 29 ++++++++++++++- src/stan-users-guide/regression.qmd | 55 +++++----------------------- src/theming/stan.xml | 1 + 5 files changed, 88 insertions(+), 57 deletions(-) diff --git a/src/reference-manual/expressions.qmd b/src/reference-manual/expressions.qmd index a9e39e28c..d04eff572 100644 --- a/src/reference-manual/expressions.qmd +++ b/src/reference-manual/expressions.qmd @@ -149,9 +149,9 @@ any of the following. ``` int, real, complex, vector, simplex, unit_vector, -ordered, positive_ordered, row_vector, matrix, -cholesky_factor_corr, cholesky_factor_cov, -corr_matrix, cov_matrix, array +sum_to_zero_vector, ordered, positive_ordered, +row_vector, matrix, cholesky_factor_corr, +cholesky_factor_cov, corr_matrix, cov_matrix, array ``` The following built in functions are also reserved and @@ -810,9 +810,9 @@ In addition to single integer indexes, as described in [the language indexing section](#language-indexing.section), Stan supports multiple indexing. Multiple indexes can be integer arrays of indexes, lower bounds, upper bounds, lower and upper bounds, or simply shorthand for -all of the indexes. If the upper bound is smaller than the lower bound, -the range is empty (unlike, e.g., in R). The upper bound and lower bound can be -expressions that evaluate to integer. A complete list of index types is +all of the indexes. If the upper bound is smaller than the lower bound, +the range is empty (unlike, e.g., in R). The upper bound and lower bound can be +expressions that evaluate to integer. A complete list of index types is given in the following table. ##### Indexing Options Table {- #index-types-table} @@ -1078,6 +1078,7 @@ the following table shows the mapping from types to their primitive types. | `vector` | `vector` | | `simplex` | `vector` | | `unit_vector` | `vector` | + | `sum_to_zero_vector` | `vector` | | `ordered` | `vector` | | `positive_ordered` | `vector` | | `row_vector` | `row_vector` | @@ -1378,7 +1379,7 @@ model { } ``` -Algebraically, +Algebraically, [the distribution statement](statements.qmd#distribution-statements.section) in the model could be reduced to diff --git a/src/reference-manual/transforms.qmd b/src/reference-manual/transforms.qmd index 679e589d9..5e3120a28 100644 --- a/src/reference-manual/transforms.qmd +++ b/src/reference-manual/transforms.qmd @@ -15,7 +15,7 @@ constrained to be ordered, positive ordered, or simplexes. Matrices may be constrained to be correlation matrices or covariance matrices. This chapter provides a definition of the transforms used for each type of variable. For examples of how to declare and define these -variables in a Stan program, see section +variables in a Stan program, see section [Variable declaration](types.qmd#variable-declaration.section). Stan converts models to C++ classes which define probability @@ -39,7 +39,7 @@ the exact arithmetic: rounded to the boundary. This may cause unexpected warnings or errors, if in other parts of the code the boundary value is invalid. For example, we may observe floating-point value 0 for - a variance parameter that has been declared to be larger than 0. + a variance parameter that has been declared to be larger than 0. See more about [Floating point Arithmetic in Stan user's guide](../stan-users-guide/floating-point.qmd)). - CmdStan stores the output to CSV files with 6 significant digits accuracy by default, but the constraints are checked with 8 @@ -462,6 +462,47 @@ p_Y(y) $$ + +## Zero sum vector + +Vectors that are constrained to sum to zero are useful +in regression tasks where nonidentifiability is an issue +or for ability or difficulty parameters (but not both) in an IRT model. + +In constraint terms, a zero sum $K$-vector $x \in \mathbb{R}^K$ satisfies + +$$ +\sum_{k=1}^K x_k = 0. +$$ + + + +### Zero sum transform {-} + +TODO: Document isometric log ratio transform + +Egozcue, Juan Jose; Pawlowsky-Glahn, Vera; Mateu-Figueras, Gloria; +Barcelo-Vidal, Carles (2003), "Isometric logratio transformations for +compositional data analysis", Mathematical Geology, 35 (3): 279–300, +doi:10.1023/A:1023818214614, S2CID 122844634 + +See also, description of the same using "pivot coordinates" in + +Filzmoser, P., Hron, K., Templ, M. (2018). Geometrical Properties of +Compositional Data. In: Applied Compositional Data Analysis. Springer Series +in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-96422-5_3 + + +### Zero sum inverse transform {-} + +TODO: Document inverse isometric log ratio transform + + +### Absolute Jacobian determinant of the zero sum inverse transform {-} + +The inverse transform is a linear operation, leading to a constant Jacobian +determinant which is therefore not included. + ## Unit simplex {#simplex-transform.section} Variables constrained to the unit simplex show up in multivariate diff --git a/src/reference-manual/types.qmd b/src/reference-manual/types.qmd index b113778b8..ae0b6ee1d 100644 --- a/src/reference-manual/types.qmd +++ b/src/reference-manual/types.qmd @@ -117,9 +117,10 @@ vector[3] rho; ``` There are also special data types for structured vectors and -matrices. There are four constrained vector data types, `simplex` +matrices. There are five constrained vector data types, `simplex` for unit simplexes, `unit_vector` for unit-length vectors, -`ordered` for ordered vectors of scalars and +`sum_to_zero_vector` for vectors that sum to zero, +`ordered` for ordered vectors of scalars, and `positive_ordered` for vectors of positive ordered scalars. There are specialized matrix data types `corr_matrix` and `cov_matrix` for correlation matrices (symmetric, positive @@ -692,6 +693,26 @@ unit vectors, this is only done up to a statically specified accuracy threshold $\epsilon$ to account for errors arising from floating-point imprecision. +### Vectors that sum to zero {-} + +A zero-sum vector is constrained such that the +sum of its elements is always $0$. These are sometimes useful +for resolving identifiability issues in regression models. +While the underlying vector has only $N - 1$ degrees of freedom, +zero sum vectors are declared with their full dimensionality. +For instance, `beta` is declared to be a zero-sum $5$-vector (4 DoF) by + +```stan +sum_to_zero_vector[5] beta; +``` + +Zero sum vectors are implemented as vectors and may be assigned to other +vectors and vice-versa. Zero sum vector variables, like other constrained +variables, are validated to ensure that they are indeed unit length; for +zero sum vectors, this is only done up to a statically specified accuracy +threshold $\epsilon$ to account for errors arising from floating-point +imprecision. + ### Ordered vectors {-} An ordered vector type in Stan represents a vector whose entries are @@ -1636,6 +1657,8 @@ dimensions `matrix[K, K]` types. +---------------------+-------------------------+----------------------------------------------+ | | | `unit_vector[N]` | +---------------------+-------------------------+----------------------------------------------+ +| | | `sum_to_zero_vector[N]` | ++---------------------+-------------------------+----------------------------------------------+ | `row_vector` | `row_vector[N]` | `row_vector[N]` | +---------------------+-------------------------+----------------------------------------------+ | | | `row_vector[N]` | @@ -1700,6 +1723,8 @@ dimensions `matrix[K, K]` types. +---------------------+-------------------------+----------------------------------------------+ | | | `array[M] unit_vector[N]` | +---------------------+-------------------------+----------------------------------------------+ +| | | `array[M] sum_to_zero_vector[N]` | ++---------------------+-------------------------+----------------------------------------------+ diff --git a/src/stan-users-guide/regression.qmd b/src/stan-users-guide/regression.qmd index ab82b2b7a..42f49b37a 100644 --- a/src/stan-users-guide/regression.qmd +++ b/src/stan-users-guide/regression.qmd @@ -537,30 +537,23 @@ details). ### $K-1$ degrees of freedom {-} -There is more than one way to enforce a sum-to-zero constraint on a -parameter vector, the most efficient of which is to define the $K$-th -element as the negation of the sum of the elements $1$ through $K-1$. +As of Stan 2.36, there is a built in `sum_to_zero_vector` +type which constrains $K-1$ free parameters into a length-$K$ +vector that sums to zero. This is using a more sophisticated +transform than the previously recommended form of setting +the final element of the vector to the negative sum of the previous elements. ```stan parameters { - vector[K - 1] beta_raw; - // ... -} -transformed parameters { - vector[K] beta = append_row(beta_raw, -sum(beta_raw)); + sum_to_zero_vector[K] beta; // ... } ``` -Placing a prior on `beta_raw` in this parameterization leads to +Placing a prior on `beta` in this parameterization leads to a subtly different posterior than that resulting from the same prior on `beta` in the original parameterization without the -sum-to-zero constraint. Most notably, a simple prior on each -component of `beta_raw` produces different results than putting -the same prior on each component of an unconstrained $K$-vector -`beta`. For example, providing a $\textsf{normal}(0,5)$ prior -on `beta` will produce a different posterior mode than placing -the same prior on `beta_raw`. +sum-to-zero constraint. #### Marginal distribution of sum-to-zero components {-} @@ -576,43 +569,13 @@ model { ``` The components are not independent, as they must sum zero. No -Jacobian is required because summation and negation are linear +Jacobian is required because the transform uses only linear operations (and thus have constant Jacobians). To generate distributions with marginals other than standard normal, the resulting `beta` may be scaled by some factor `sigma` and translated to some new location `mu`. - -### QR decomposition {-} - -Aaron Goodman, on the Stan forums, also provided this approach, which -calculates a QR decomposition in the transformed data block, then uses -it to transform to a sum-to-zero parameter `x`, - -```stan -transformed data{ - matrix[K, K] A = diag_matrix(rep_vector(1, K)); - matrix[K, K - 1] A_qr; - for (i in 1:K - 1) A[K, i] = -1; - A[K, K] = 0; - A_qr = qr_Q(A)[ , 1:(K - 1)]; -} -parameters { - vector[K - 1] beta_raw; -} -transformed parameters{ - vector[K] beta = A_qr * beta_raw; -} -model { - beta_raw ~ normal(0, inv(sqrt(1 - inv(K)))); -} -``` - -This produces a marginal standard normal distribution on the values of -`beta`, which will sum to zero by construction of the QR decomposition. - - ### Translated and scaled simplex {-} An alternative approach that's less efficient, but amenable to a diff --git a/src/theming/stan.xml b/src/theming/stan.xml index d41c7cc06..73777de01 100644 --- a/src/theming/stan.xml +++ b/src/theming/stan.xml @@ -41,6 +41,7 @@ positive_ordered simplex unit_vector + sum_to_zero_vector row_vector matrix array From 7b3d6c7841b9d6aa37951048517f6f63f9efdcec Mon Sep 17 00:00:00 2001 From: Brian Ward Date: Tue, 1 Oct 2024 16:14:00 -0400 Subject: [PATCH 2/6] Update grammar --- src/reference-manual/grammar.txt | 9 +++++++++ src/reference-manual/syntax.qmd | 6 ++++++ 2 files changed, 15 insertions(+) diff --git a/src/reference-manual/grammar.txt b/src/reference-manual/grammar.txt index c18fea74c..657eb8b7c 100644 --- a/src/reference-manual/grammar.txt +++ b/src/reference-manual/grammar.txt @@ -24,6 +24,7 @@ ::= IDENTIFIER | TRUNCATE + | JACOBIAN ::= | @@ -57,10 +58,13 @@ | POSITIVEORDERED | SIMPLEX | UNITVECTOR + | SUMTOZERO | CHOLESKYFACTORCORR | CHOLESKYFACTORCOV | CORRMATRIX | COVMATRIX + | STOCHASTICCOLUMNMATRIX + | STOCHASTICROWMATRIX | PRINT | REJECT | FATAL_ERROR @@ -165,11 +169,16 @@ | POSITIVEORDERED LBRACK RBRACK | SIMPLEX LBRACK RBRACK | UNITVECTOR LBRACK RBRACK + | SUMTOZERO LBRACK RBRACK | CHOLESKYFACTORCORR LBRACK RBRACK | CHOLESKYFACTORCOV LBRACK [COMMA ] RBRACK | CORRMATRIX LBRACK RBRACK | COVMATRIX LBRACK RBRACK + | STOCHASTICCOLUMNMATRIX LBRACK COMMA + RBRACK + | STOCHASTICROWMATRIX LBRACK COMMA + RBRACK ::= [LABRACK RABRACK] | LABRACK RABRACK diff --git a/src/reference-manual/syntax.qmd b/src/reference-manual/syntax.qmd index d8bfad955..276a80348 100644 --- a/src/reference-manual/syntax.qmd +++ b/src/reference-manual/syntax.qmd @@ -113,6 +113,7 @@ The raw output is available [here](https://raw.githubusercontent.com/stan-dev/do ``` ::= IDENTIFIER | TRUNCATE + | JACOBIAN ::= @@ -175,11 +176,16 @@ The raw output is available [here](https://raw.githubusercontent.com/stan-dev/do | POSITIVEORDERED LBRACK RBRACK | SIMPLEX LBRACK RBRACK | UNITVECTOR LBRACK RBRACK + | SUMTOZERO LBRACK RBRACK | CHOLESKYFACTORCORR LBRACK RBRACK | CHOLESKYFACTORCOV LBRACK [COMMA ] RBRACK | CORRMATRIX LBRACK RBRACK | COVMATRIX LBRACK RBRACK + | STOCHASTICCOLUMNMATRIX LBRACK COMMA + RBRACK + | STOCHASTICROWMATRIX LBRACK COMMA + RBRACK ::= [LABRACK RABRACK] | LABRACK RABRACK From ab05de715553b784d72a8ec76e36dcf99b9c401f Mon Sep 17 00:00:00 2001 From: Bob Carpenter Date: Fri, 4 Oct 2024 17:15:12 -0400 Subject: [PATCH 3/6] sum to zero reference doc --- src/bibtex/all.bib | 18 ++++++++ src/reference-manual/transforms.qmd | 72 +++++++++++++++++++++-------- 2 files changed, 72 insertions(+), 18 deletions(-) diff --git a/src/bibtex/all.bib b/src/bibtex/all.bib index aa7eb4975..5e38e543d 100644 --- a/src/bibtex/all.bib +++ b/src/bibtex/all.bib @@ -1845,3 +1845,21 @@ @article{Timonen+etal:2023:ODE-PSIS pages = {e614} } +@article{egozcue+etal:2003, + title={Isometric logratio transformations for compositional data analysis}, + author={Egozcue, Juan Jos{\'e} and Pawlowsky-Glahn, Vera and Mateu-Figueras, Gl{\`o}ria and Barcelo-Vidal, Carles}, + journal={Mathematical Geology}, + volume={35}, + number={3}, + pages={279--300}, + year={2003} +} + +book{filzmoser+etal:2018, + title={Geometrical properties of compositional data}, + author={Filzmoser, Peter and Hron, Karel and Templ, Matthias}, + booktitle={Applied Compositional Data Analysis: With Worked Examples in R}, + pages={35--68}, + year={2018}, + publisher={Springer} +} \ No newline at end of file diff --git a/src/reference-manual/transforms.qmd b/src/reference-manual/transforms.qmd index 5e3120a28..ae63ce10e 100644 --- a/src/reference-manual/transforms.qmd +++ b/src/reference-manual/transforms.qmd @@ -465,44 +465,80 @@ $$ ## Zero sum vector -Vectors that are constrained to sum to zero are useful -in regression tasks where nonidentifiability is an issue -or for ability or difficulty parameters (but not both) in an IRT model. - -In constraint terms, a zero sum $K$-vector $x \in \mathbb{R}^K$ satisfies +Vectors that are constrained to sum to zero are useful for, among +other things, additive varying effects, such as varying slopes or +intercepts in a regression model (e.g., for income deciles). +A zero sum $K$-vector $x \in \mathbb{R}^K$ satisfies the constraint $$ \sum_{k=1}^K x_k = 0. $$ +For the transform, Stan uses the first part of an isometric log ratio +transform; see [@egozcue+etal:2003] for the basic +definitions and Chapter 3 of [@filzmoser+etal:2018] for the pivot +coordinate version used here. ### Zero sum transform {-} -TODO: Document isometric log ratio transform - -Egozcue, Juan Jose; Pawlowsky-Glahn, Vera; Mateu-Figueras, Gloria; -Barcelo-Vidal, Carles (2003), "Isometric logratio transformations for -compositional data analysis", Mathematical Geology, 35 (3): 279–300, -doi:10.1023/A:1023818214614, S2CID 122844634 - -See also, description of the same using "pivot coordinates" in - -Filzmoser, P., Hron, K., Templ, M. (2018). Geometrical Properties of -Compositional Data. In: Applied Compositional Data Analysis. Springer Series -in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-96422-5_3 +The (unconstraining) transform is defined iteratively. Given an $x \in +\mathbb{R}^{N + 1}$ that sums to zero (i.e., $\sum_{n=1}^{N+1} x_n = +0$), the transform proceeds as follows to produce an unconstrained $y +\in \mathbb{R}^N$. +The transform is initialized by setting +$$ +S_N = 0 +$$ +and +$$ +y_N = -x_{N + 1} \cdot \frac{\sqrt{N \cdot (N + 1)}{N}}. +$$ +The for each $n$ from $N - 1$ down to $1$, let +$$ +w_{n + 1} = \frac{y_{n + 1}}{\sqrt{(n + 1) \cdot (n + 2)}}, +$$ +$$ +S_n = S_{n + 1} + w_{n + 1}, +$$ +and +$$ +y_n = (S_n - x_{n + 1}) \cdot \frac{\sqrt{n \cdot (n + 1)}}{n}. +$$ ### Zero sum inverse transform {-} -TODO: Document inverse isometric log ratio transform +The inverse (constraining) transform follows the isometric logratio tranform. +It maps an unconstrained vector $y \in \mathbb{R}^N$ to a zero-sum vector $x \in +\mathbb{R}^{N + 1}$ such that +$$ +\sum_{n=1}^{N + 1} x_n = 0. +$$ +The values are defined inductively, starting with +$$ +x_1 = \sum_{n=1}^N \frac{y_n}{\sqrt{n \cdot (n + 1)}} +$$ +and then setting +$$ +x_{n + 1} = \sum_{i = n + 1}^N \frac{\sqrt{y_i}}{\sqrt{i \cdot (i + 1)}} +- n \cdot \frac{y_n}{\sqrt{n \cdot (n + 1)}}. +$$ +for $n \in 1{:}N$. +The definition is such that +$$ +\sum_{n = 1}^{N + 1} x_n = 0 +$$ +by construction, because each of the terms added to $x_{n}$ is then +subtracted from $x_{n + 1}$ the number of times it shows up in earlier terms. ### Absolute Jacobian determinant of the zero sum inverse transform {-} The inverse transform is a linear operation, leading to a constant Jacobian determinant which is therefore not included. + ## Unit simplex {#simplex-transform.section} Variables constrained to the unit simplex show up in multivariate From 71cea94b1a2a01103ab0e76ef4097e85f77e216a Mon Sep 17 00:00:00 2001 From: Bob Carpenter Date: Fri, 4 Oct 2024 17:24:27 -0400 Subject: [PATCH 4/6] sorted reference links --- src/bibtex/all.bib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/bibtex/all.bib b/src/bibtex/all.bib index 5e38e543d..999519190 100644 --- a/src/bibtex/all.bib +++ b/src/bibtex/all.bib @@ -1855,7 +1855,7 @@ @article{egozcue+etal:2003 year={2003} } -book{filzmoser+etal:2018, +@book{filzmoser+etal:2018, title={Geometrical properties of compositional data}, author={Filzmoser, Peter and Hron, Karel and Templ, Matthias}, booktitle={Applied Compositional Data Analysis: With Worked Examples in R}, From 0daba60d07b697f584049af9c5d0b7fe1c17e699 Mon Sep 17 00:00:00 2001 From: Bob Carpenter Date: Fri, 15 Nov 2024 15:14:22 -0500 Subject: [PATCH 5/6] updated sum-to-zero doc --- src/bibtex/all.bib | 8 ++ src/reference-manual/transforms.qmd | 12 ++- src/stan-users-guide/regression.qmd | 142 +++++++++++++++------------- 3 files changed, 95 insertions(+), 67 deletions(-) diff --git a/src/bibtex/all.bib b/src/bibtex/all.bib index 999519190..422727d55 100644 --- a/src/bibtex/all.bib +++ b/src/bibtex/all.bib @@ -1862,4 +1862,12 @@ @book{filzmoser+etal:2018 pages={35--68}, year={2018}, publisher={Springer} +} + +@misc{seyboldt:2024, + author="Seyboldt, Adrian", + title="Add ZeroSumNormal distribution", + note="pyro-ppl GitHub repository issue \#1751", + year = "2024", + url ="https://github.com/pyro-ppl/numpyro/pull/1751#issuecomment-1980569811" } \ No newline at end of file diff --git a/src/reference-manual/transforms.qmd b/src/reference-manual/transforms.qmd index ae63ce10e..5f6ab8b27 100644 --- a/src/reference-manual/transforms.qmd +++ b/src/reference-manual/transforms.qmd @@ -475,9 +475,15 @@ $$ $$ For the transform, Stan uses the first part of an isometric log ratio -transform; see [@egozcue+etal:2003] for the basic -definitions and Chapter 3 of [@filzmoser+etal:2018] for the pivot -coordinate version used here. +transform; see [@egozcue+etal:2003] for the basic definitions and +Chapter 3 of [@filzmoser+etal:2018] for the pivot coordinate version +used here. Stan uses the isometric log ratio transform because it +induces a geometry with zero correlation among the dimensions, making +it easier for HMC to explore than simpler alternatives such as setting +the final element to the negative sum of the first elements; see, e.g., +[@seyboldt:2024]. + + ### Zero sum transform {-} diff --git a/src/stan-users-guide/regression.qmd b/src/stan-users-guide/regression.qmd index 42f49b37a..232c89f50 100644 --- a/src/stan-users-guide/regression.qmd +++ b/src/stan-users-guide/regression.qmd @@ -521,27 +521,43 @@ centered around zero, as is typical for regression coefficients. ## Parameterizing centered vectors -It is often convenient to define a parameter vector $\beta$ that is -centered in the sense of satisfying the sum-to-zero constraint, -$$ -\sum_{k=1}^K \beta_k = 0. -$$ - -Such a parameter vector may be used to identify a multi-logit -regression parameter vector (see the [multi-logit -section](#multi-logit.section) for details), or may be used for -ability or difficulty parameters (but not both) in an IRT model (see -the [item-response model section](#item-response-models.section) for -details). - - -### $K-1$ degrees of freedom {-} - -As of Stan 2.36, there is a built in `sum_to_zero_vector` -type which constrains $K-1$ free parameters into a length-$K$ -vector that sums to zero. This is using a more sophisticated -transform than the previously recommended form of setting -the final element of the vector to the negative sum of the previous elements. +When there are varying effects in a regression, the resulting +likelihood is not identified unless further steps are taken. For +example, we might have a global intercept $\alpha$ and then a varying +effect $\beta_k$ for age group $k$ to make a linear predictor $\alpha + +\beta_k$. With this predictor, we can add a constant to $\alpha$ and +subtract from each $\beta_k$ and get exactly the same likelihood. + +The traditional approach to identifying such a model is to pin the +first varing effect to zero, i.e., $\beta_1 = 0$. With one of the +varying effects fixed, you can no longer add a constant to all of them +and the model's likelihood is identified. In addition to the +difficulty in specifying such a model in Stan, it is awkward to +formulate priors because the other coefficients are all interpreted +relative to $\beta_1$. + +In a Bayesian setting, a proper prior on each of the $\beta$ is enough +to identify the model. Unfortunately, this can lead to inefficiency +during sampling as the model is still only weakly identified through +the prior---there is a very simple example of the difference in +the discussion of collinearity in @collinearity.section. + +An alternative identification strategy that allows a symmetric prior +is to enforce a sum-to-zero constraint on the varying effects, i.e., +$\sum_{k=1}^K \beta_k = 0.$ + +A parameter vector constrained to sum to zero may also be used to +identify a multi-logit regression parameter vector (see the +[multi-logit section](#multi-logit.section) for details), or may be +used for ability or difficulty parameters (but not both) in an IRT +model (see the [item-response model +section](#item-response-models.section) for details). + + +### Built-in sum-to-zero vector {-} + +As of Stan 2.36, there is a built in `sum_to_zero_vector` type, which +can be used as follows. ```stan parameters { @@ -550,10 +566,34 @@ parameters { } ``` -Placing a prior on `beta` in this parameterization leads to -a subtly different posterior than that resulting from the same prior -on `beta` in the original parameterization without the -sum-to-zero constraint. +This produces a vector of size `K` such that `sum(beta) = 0`. In the +unconstrained representation requires only `K - 1` values because the +last is determined by the first `K - 1`. + +Placing a prior on `beta` in this parameterization, for example, + +```stan + beta ~ normal(0, 1); +``` + +leads to a subtly different posterior than what you would get with the +same prior on an unconstrained size-`K` vector. As explained below, +the variance is reduced. + +The sum-to-zero constraint can be implemented naively by setting the +last element to the negative sum of the first elements, i.e., $\beta_K += -\sum_{k=1}^{K-1} \beta_k.$ But that leads to high correlation among +the $\beta_k$. + +The transform used in Stan eliminates these correlations by +constructing an orthogonal basis and applying it to the +zero-sum-constraint; @seyboldt:2024 provides an explanation. The +*Stan Reference Manual* provides the details in the chapter on +transforms. Although any orthogonal basis can be used, Stan uses the +inverse isometric log transform because it is convenient to describe +and the transform simplifies to efficient scalar operations rather +than more expensive matrix operations. + #### Marginal distribution of sum-to-zero components {-} @@ -568,53 +608,27 @@ model { } ``` -The components are not independent, as they must sum zero. No -Jacobian is required because the transform uses only linear -operations (and thus have constant Jacobians). +The scale component can be multiplied by `sigma` to produce a +`normal(0, sigma)` prior marginally. To generate distributions with marginals other than standard normal, the resulting `beta` may be scaled by some factor `sigma` and translated to some new location `mu`. -### Translated and scaled simplex {-} - -An alternative approach that's less efficient, but amenable to a -symmetric prior, is to offset and scale a simplex. - -```stan -parameters { - simplex[K] beta_raw; - real beta_scale; - // ... -} -transformed parameters { - vector[K] beta; - beta = beta_scale * (beta_raw - inv(K)); - // ... -} -``` - -Here `inv(K)` is just a short way to write `1.0 / K`. Given -that `beta_raw` sums to 1 because it is a simplex, the -elementwise subtraction of `inv(K)` is guaranteed to sum to zero. -Because the magnitude of the elements of the simplex is bounded, a -scaling factor is required to provide `beta` with $K$ degrees of -freedom necessary to take on every possible value that sums to zero. - -With this parameterization, a Dirichlet prior can be placed on -`beta_raw`, perhaps uniform, and another prior put on -`beta_scale`, typically for "shrinkage." - ### Soft centering {-} -Adding a prior such as $\beta \sim \textsf{normal}(0,\sigma)$ will provide a kind -of soft centering of a parameter vector $\beta$ by preferring, all -else being equal, that $\sum_{k=1}^K \beta_k = 0$. This approach is only -guaranteed to roughly center if $\beta$ and the elementwise addition $\beta + c$ -for a scalar constant $c$ produce the same likelihood (perhaps by -another vector $\alpha$ being transformed to $\alpha - c$, as in the -IRT models). This is another way of achieving a symmetric prior. +Adding a prior such as $\beta \sim \textsf{normal}(0,\epsilon)$ for a +small $\epsilon$ will provide a kind of soft centering of a parameter +vector $\beta$ by preferring, all else being equal, that $\sum_{k=1}^K +\beta_k = 0$. This approach is only guaranteed to roughly center if +$\beta$ and the elementwise addition $\beta + c$ for a scalar constant +$c$ produce the same likelihood (perhaps by another vector $\alpha$ +being transformed to $\alpha - c$, as in the IRT models). This is +another way of achieving a symmetric prior, though it requires +choosing an $\epsilon$. If $\epsilon$ is too large, there won't be a +strong enough centering effect and if it is too small, it will add +high curvature to the target density and impede sampling. ## Ordered logistic and probit regression {#ordered-logistic.section} From 9096b5afc8618afda667135994304a8cb348a6c7 Mon Sep 17 00:00:00 2001 From: Bob Carpenter Date: Fri, 15 Nov 2024 16:11:39 -0500 Subject: [PATCH 6/6] remove redundant space --- src/reference-manual/types.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/reference-manual/types.qmd b/src/reference-manual/types.qmd index ae0b6ee1d..999751551 100644 --- a/src/reference-manual/types.qmd +++ b/src/reference-manual/types.qmd @@ -707,7 +707,7 @@ sum_to_zero_vector[5] beta; ``` Zero sum vectors are implemented as vectors and may be assigned to other -vectors and vice-versa. Zero sum vector variables, like other constrained +vectors and vice-versa. Zero sum vector variables, like other constrained variables, are validated to ensure that they are indeed unit length; for zero sum vectors, this is only done up to a statically specified accuracy threshold $\epsilon$ to account for errors arising from floating-point