From e63d5494f2733bb793e31b9ed64023fa168d0cec Mon Sep 17 00:00:00 2001
From: Brian Ward <bward@flatironinstitute.org>
Date: Thu, 26 Sep 2024 10:45:16 -0400
Subject: [PATCH 1/6] Start documentation of sum_to_zero_vector

---
 src/reference-manual/expressions.qmd | 15 ++++----
 src/reference-manual/transforms.qmd  | 45 ++++++++++++++++++++++-
 src/reference-manual/types.qmd       | 29 ++++++++++++++-
 src/stan-users-guide/regression.qmd  | 55 +++++-----------------------
 src/theming/stan.xml                 |  1 +
 5 files changed, 88 insertions(+), 57 deletions(-)

diff --git a/src/reference-manual/expressions.qmd b/src/reference-manual/expressions.qmd
index a9e39e28c..d04eff572 100644
--- a/src/reference-manual/expressions.qmd
+++ b/src/reference-manual/expressions.qmd
@@ -149,9 +149,9 @@ any of the following.
 
 ```
 int, real, complex, vector, simplex, unit_vector,
-ordered, positive_ordered, row_vector, matrix,
-cholesky_factor_corr, cholesky_factor_cov,
-corr_matrix, cov_matrix, array
+sum_to_zero_vector, ordered, positive_ordered,
+row_vector, matrix, cholesky_factor_corr,
+cholesky_factor_cov, corr_matrix, cov_matrix, array
 ```
 
 The following built in functions are also reserved and
@@ -810,9 +810,9 @@ In addition to single integer indexes, as described in
 [the language indexing section](#language-indexing.section), Stan supports multiple indexing.
 Multiple indexes can be integer arrays of indexes, lower
 bounds, upper bounds, lower and upper bounds, or simply shorthand for
-all of the indexes.  If the upper bound is smaller than the lower bound, 
-the range is empty (unlike, e.g., in R). The upper bound and lower bound can be 
-expressions that evaluate to integer. A complete list of index types is 
+all of the indexes.  If the upper bound is smaller than the lower bound,
+the range is empty (unlike, e.g., in R). The upper bound and lower bound can be
+expressions that evaluate to integer. A complete list of index types is
 given in the following table.
 
 ##### Indexing Options Table {- #index-types-table}
@@ -1078,6 +1078,7 @@ the following table shows the mapping from types to their primitive types.
    | `vector`               | `vector`             |
    | `simplex`              | `vector`             |
    | `unit_vector`          | `vector`             |
+   | `sum_to_zero_vector`   | `vector`             |
    | `ordered`              | `vector`             |
    | `positive_ordered`     | `vector`             |
    | `row_vector`           | `row_vector`         |
@@ -1378,7 +1379,7 @@ model {
 }
 ```
 
-Algebraically, 
+Algebraically,
 [the distribution statement](statements.qmd#distribution-statements.section)
 in the model could be reduced to
 
diff --git a/src/reference-manual/transforms.qmd b/src/reference-manual/transforms.qmd
index 679e589d9..5e3120a28 100644
--- a/src/reference-manual/transforms.qmd
+++ b/src/reference-manual/transforms.qmd
@@ -15,7 +15,7 @@ constrained to be ordered, positive ordered, or simplexes.  Matrices
 may be constrained to be correlation matrices or covariance matrices.
 This chapter provides a definition of the transforms used for each
 type of variable.   For examples of how to declare and define these
-variables in a Stan program, see section 
+variables in a Stan program, see section
 [Variable declaration](types.qmd#variable-declaration.section).
 
 Stan converts models to C++ classes which define probability
@@ -39,7 +39,7 @@ the exact arithmetic:
     rounded to the boundary. This may cause unexpected warnings or
     errors, if in other parts of the code the boundary value is
     invalid. For example, we may observe floating-point value 0 for
-    a variance parameter that has been declared to be larger than 0. 
+    a variance parameter that has been declared to be larger than 0.
 	See more about [Floating point Arithmetic in Stan user's guide](../stan-users-guide/floating-point.qmd)).
   - CmdStan stores the output to CSV files with 6 significant digits
     accuracy by default, but the constraints are checked with 8
@@ -462,6 +462,47 @@ p_Y(y)
 $$
 
 
+
+## Zero sum vector
+
+Vectors that are constrained to sum to zero are useful
+in regression tasks where nonidentifiability is an issue
+or for ability or difficulty parameters (but not both) in an IRT model.
+
+In constraint terms, a zero sum $K$-vector $x \in \mathbb{R}^K$ satisfies
+
+$$
+\sum_{k=1}^K x_k = 0.
+$$
+
+
+
+### Zero sum transform {-}
+
+TODO: Document isometric log ratio transform
+
+Egozcue, Juan Jose; Pawlowsky-Glahn, Vera; Mateu-Figueras, Gloria;
+Barcelo-Vidal, Carles (2003), "Isometric logratio transformations for
+compositional data analysis", Mathematical Geology, 35 (3): 279–300,
+doi:10.1023/A:1023818214614, S2CID 122844634
+
+See also, description of the same using "pivot coordinates" in
+
+Filzmoser, P., Hron, K., Templ, M. (2018). Geometrical Properties of
+Compositional Data. In: Applied Compositional Data Analysis. Springer Series
+in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-96422-5_3
+
+
+### Zero sum inverse transform {-}
+
+TODO: Document inverse isometric log ratio transform
+
+
+### Absolute Jacobian determinant of the zero sum inverse transform {-}
+
+The inverse transform is a linear operation, leading to a constant Jacobian
+determinant which is therefore not included.
+
 ## Unit simplex {#simplex-transform.section}
 
 Variables constrained to the unit simplex show up in multivariate
diff --git a/src/reference-manual/types.qmd b/src/reference-manual/types.qmd
index b113778b8..ae0b6ee1d 100644
--- a/src/reference-manual/types.qmd
+++ b/src/reference-manual/types.qmd
@@ -117,9 +117,10 @@ vector<lower=-1, upper=1>[3] rho;
 ```
 
 There are also special data types for structured vectors and
-matrices. There are four constrained vector data types, `simplex`
+matrices. There are five constrained vector data types, `simplex`
 for unit simplexes, `unit_vector` for unit-length vectors,
-`ordered` for ordered vectors of scalars and
+`sum_to_zero_vector` for vectors that sum to zero,
+`ordered` for ordered vectors of scalars, and
 `positive_ordered` for vectors of positive ordered
 scalars. There are specialized matrix data types `corr_matrix`
 and `cov_matrix` for correlation matrices (symmetric, positive
@@ -692,6 +693,26 @@ unit vectors, this is only done up to a statically specified accuracy
 threshold $\epsilon$ to account for errors arising from floating-point
 imprecision.
 
+### Vectors that sum to zero {-}
+
+A zero-sum vector is constrained such that the
+sum of its elements is always $0$. These are sometimes useful
+for resolving identifiability issues in regression models.
+While the underlying vector has only $N - 1$ degrees of freedom,
+zero sum vectors are declared with their full dimensionality.
+For instance, `beta` is declared to be a zero-sum $5$-vector (4 DoF) by
+
+```stan
+sum_to_zero_vector[5] beta;
+```
+
+Zero sum vectors are implemented as vectors and may be assigned to other
+vectors and vice-versa.  Zero sum  vector variables, like other constrained
+variables, are validated to ensure that they are indeed unit length; for
+zero sum  vectors, this is only done up to a statically specified accuracy
+threshold $\epsilon$ to account for errors arising from floating-point
+imprecision.
+
 ### Ordered vectors {-}
 
 An ordered vector type in Stan represents a vector whose entries are
@@ -1636,6 +1657,8 @@ dimensions `matrix[K, K]` types.
 +---------------------+-------------------------+----------------------------------------------+
 |                     |                         | `unit_vector[N]`                             |
 +---------------------+-------------------------+----------------------------------------------+
+|                     |                         | `sum_to_zero_vector[N]`                      |
++---------------------+-------------------------+----------------------------------------------+
 | `row_vector`        | `row_vector[N]`         | `row_vector[N]`                              |
 +---------------------+-------------------------+----------------------------------------------+
 |                     |                         | `row_vector[N]<lower=L>`                     |
@@ -1700,6 +1723,8 @@ dimensions `matrix[K, K]` types.
 +---------------------+-------------------------+----------------------------------------------+
 |                     |                         | `array[M] unit_vector[N]`                    |
 +---------------------+-------------------------+----------------------------------------------+
+|                     |                         | `array[M] sum_to_zero_vector[N]`             |
++---------------------+-------------------------+----------------------------------------------+
 
 <a name="id:constrained-types.figure"></a>
 
diff --git a/src/stan-users-guide/regression.qmd b/src/stan-users-guide/regression.qmd
index ab82b2b7a..42f49b37a 100644
--- a/src/stan-users-guide/regression.qmd
+++ b/src/stan-users-guide/regression.qmd
@@ -537,30 +537,23 @@ details).
 
 ### $K-1$ degrees of freedom {-}
 
-There is more than one way to enforce a sum-to-zero constraint on a
-parameter vector, the most efficient of which is to define the $K$-th
-element as the negation of the sum of the elements $1$ through $K-1$.
+As of Stan 2.36, there is a built in `sum_to_zero_vector`
+type which constrains $K-1$ free parameters into a length-$K$
+vector that sums to zero. This is using a more sophisticated
+transform than the previously recommended form of setting
+the final element of the vector to the negative sum of the previous elements.
 
 ```stan
 parameters {
-  vector[K - 1] beta_raw;
-  // ...
-}
-transformed parameters {
-  vector[K] beta = append_row(beta_raw, -sum(beta_raw));
+  sum_to_zero_vector[K] beta;
   // ...
 }
 ```
 
-Placing a prior on `beta_raw` in this parameterization leads to
+Placing a prior on `beta` in this parameterization leads to
 a subtly different posterior than that resulting from the same prior
 on `beta` in the original parameterization without the
-sum-to-zero constraint.  Most notably, a simple prior on each
-component of `beta_raw` produces different results than putting
-the same prior on each component of an unconstrained $K$-vector
-`beta`.  For example, providing a $\textsf{normal}(0,5)$ prior
-on `beta` will produce a different posterior mode than placing
-the same prior on `beta_raw`.
+sum-to-zero constraint.
 
 #### Marginal distribution of sum-to-zero components {-}
 
@@ -576,43 +569,13 @@ model {
 ```
 
 The components are not independent, as they must sum zero.  No
-Jacobian is required because summation and negation are linear
+Jacobian is required because the transform uses only linear
 operations (and thus have constant Jacobians).
 
 To generate distributions with marginals other than standard normal,
 the resulting `beta` may be scaled by some factor `sigma` and
 translated to some new location `mu`.
 
-
-### QR decomposition {-}
-
-Aaron Goodman, on the Stan forums, also provided this approach, which
-calculates a QR decomposition in the transformed data block, then uses
-it to transform to a sum-to-zero parameter `x`,
-
-```stan
-transformed data{
-  matrix[K, K] A = diag_matrix(rep_vector(1, K));
-  matrix[K, K - 1] A_qr;
-  for (i in 1:K - 1) A[K, i] = -1;
-  A[K, K] = 0;
-  A_qr = qr_Q(A)[ , 1:(K - 1)];
-}
-parameters {
-  vector[K - 1] beta_raw;
-}
-transformed parameters{
-   vector[K] beta =  A_qr * beta_raw;
-}
-model {
-  beta_raw ~ normal(0, inv(sqrt(1 - inv(K))));
-}
-```
-
-This produces a marginal standard normal distribution on the values of
-`beta`, which will sum to zero by construction of the QR decomposition.
-
-
 ### Translated and scaled simplex {-}
 
 An alternative approach that's less efficient, but amenable to a
diff --git a/src/theming/stan.xml b/src/theming/stan.xml
index d41c7cc06..73777de01 100644
--- a/src/theming/stan.xml
+++ b/src/theming/stan.xml
@@ -41,6 +41,7 @@
       <item>positive_ordered</item>
       <item>simplex</item>
       <item>unit_vector</item>
+      <item>sum_to_zero_vector</item>
       <item>row_vector</item>
       <item>matrix</item>
       <item>array</item>

From 7b3d6c7841b9d6aa37951048517f6f63f9efdcec Mon Sep 17 00:00:00 2001
From: Brian Ward <bward@flatironinstitute.org>
Date: Tue, 1 Oct 2024 16:14:00 -0400
Subject: [PATCH 2/6] Update grammar

---
 src/reference-manual/grammar.txt | 9 +++++++++
 src/reference-manual/syntax.qmd  | 6 ++++++
 2 files changed, 15 insertions(+)

diff --git a/src/reference-manual/grammar.txt b/src/reference-manual/grammar.txt
index c18fea74c..657eb8b7c 100644
--- a/src/reference-manual/grammar.txt
+++ b/src/reference-manual/grammar.txt
@@ -24,6 +24,7 @@
 
 <identifier> ::= IDENTIFIER
                | TRUNCATE
+               | JACOBIAN
 
 <decl_identifier> ::= <identifier>
                     | <reserved_word>
@@ -57,10 +58,13 @@
                   | POSITIVEORDERED
                   | SIMPLEX
                   | UNITVECTOR
+                  | SUMTOZERO
                   | CHOLESKYFACTORCORR
                   | CHOLESKYFACTORCOV
                   | CORRMATRIX
                   | COVMATRIX
+                  | STOCHASTICCOLUMNMATRIX
+                  | STOCHASTICROWMATRIX
                   | PRINT
                   | REJECT
                   | FATAL_ERROR
@@ -165,11 +169,16 @@
                  | POSITIVEORDERED LBRACK <expression> RBRACK
                  | SIMPLEX LBRACK <expression> RBRACK
                  | UNITVECTOR LBRACK <expression> RBRACK
+                 | SUMTOZERO LBRACK <expression> RBRACK
                  | CHOLESKYFACTORCORR LBRACK <expression> RBRACK
                  | CHOLESKYFACTORCOV LBRACK <expression> [COMMA <expression>]
                    RBRACK
                  | CORRMATRIX LBRACK <expression> RBRACK
                  | COVMATRIX LBRACK <expression> RBRACK
+                 | STOCHASTICCOLUMNMATRIX LBRACK <expression> COMMA
+                   <expression> RBRACK
+                 | STOCHASTICROWMATRIX LBRACK <expression> COMMA <expression>
+                   RBRACK
 
 <type_constraint> ::= [LABRACK <range> RABRACK]
                     | LABRACK <offset_mult> RABRACK
diff --git a/src/reference-manual/syntax.qmd b/src/reference-manual/syntax.qmd
index d8bfad955..276a80348 100644
--- a/src/reference-manual/syntax.qmd
+++ b/src/reference-manual/syntax.qmd
@@ -113,6 +113,7 @@ The raw output is available [here](https://raw.githubusercontent.com/stan-dev/do
 ```
 <identifier> ::= IDENTIFIER
                | TRUNCATE
+               | JACOBIAN
 
 <decl_identifier> ::= <identifier>
 
@@ -175,11 +176,16 @@ The raw output is available [here](https://raw.githubusercontent.com/stan-dev/do
                  | POSITIVEORDERED LBRACK <expression> RBRACK
                  | SIMPLEX LBRACK <expression> RBRACK
                  | UNITVECTOR LBRACK <expression> RBRACK
+                 | SUMTOZERO LBRACK <expression> RBRACK
                  | CHOLESKYFACTORCORR LBRACK <expression> RBRACK
                  | CHOLESKYFACTORCOV LBRACK <expression> [COMMA <expression>]
                    RBRACK
                  | CORRMATRIX LBRACK <expression> RBRACK
                  | COVMATRIX LBRACK <expression> RBRACK
+                 | STOCHASTICCOLUMNMATRIX LBRACK <expression> COMMA
+                   <expression> RBRACK
+                 | STOCHASTICROWMATRIX LBRACK <expression> COMMA <expression>
+                   RBRACK
 
 <type_constraint> ::= [LABRACK <range> RABRACK]
                     | LABRACK <offset_mult> RABRACK

From ab05de715553b784d72a8ec76e36dcf99b9c401f Mon Sep 17 00:00:00 2001
From: Bob Carpenter <bcarpenter@flatironinstitute.org>
Date: Fri, 4 Oct 2024 17:15:12 -0400
Subject: [PATCH 3/6] sum to zero reference doc

---
 src/bibtex/all.bib                  | 18 ++++++++
 src/reference-manual/transforms.qmd | 72 +++++++++++++++++++++--------
 2 files changed, 72 insertions(+), 18 deletions(-)

diff --git a/src/bibtex/all.bib b/src/bibtex/all.bib
index aa7eb4975..5e38e543d 100644
--- a/src/bibtex/all.bib
+++ b/src/bibtex/all.bib
@@ -1845,3 +1845,21 @@ @article{Timonen+etal:2023:ODE-PSIS
   pages = {e614} 
 }
 
+@article{egozcue+etal:2003,
+  title={Isometric logratio transformations for compositional data analysis},
+  author={Egozcue, Juan Jos{\'e} and Pawlowsky-Glahn, Vera and Mateu-Figueras, Gl{\`o}ria and Barcelo-Vidal, Carles},
+  journal={Mathematical Geology},
+  volume={35},
+  number={3},
+  pages={279--300},
+  year={2003}
+}
+
+book{filzmoser+etal:2018,
+  title={Geometrical properties of compositional data},
+  author={Filzmoser, Peter and Hron, Karel and Templ, Matthias},
+  booktitle={Applied Compositional Data Analysis: With Worked Examples in R},
+  pages={35--68},
+  year={2018},
+  publisher={Springer}
+}
\ No newline at end of file
diff --git a/src/reference-manual/transforms.qmd b/src/reference-manual/transforms.qmd
index 5e3120a28..ae63ce10e 100644
--- a/src/reference-manual/transforms.qmd
+++ b/src/reference-manual/transforms.qmd
@@ -465,44 +465,80 @@ $$
 
 ## Zero sum vector
 
-Vectors that are constrained to sum to zero are useful
-in regression tasks where nonidentifiability is an issue
-or for ability or difficulty parameters (but not both) in an IRT model.
-
-In constraint terms, a zero sum $K$-vector $x \in \mathbb{R}^K$ satisfies
+Vectors that are constrained to sum to zero are useful for, among
+other things, additive varying effects, such as varying slopes or
+intercepts in a regression model (e.g., for income deciles).
 
+A zero sum $K$-vector $x \in \mathbb{R}^K$ satisfies the constraint
 $$
 \sum_{k=1}^K x_k = 0.
 $$
 
+For the transform, Stan uses the first part of an isometric log ratio
+transform;  see [@egozcue+etal:2003] for the basic
+definitions and Chapter 3 of [@filzmoser+etal:2018] for the pivot
+coordinate version used here.
 
 
 ### Zero sum transform {-}
 
-TODO: Document isometric log ratio transform
-
-Egozcue, Juan Jose; Pawlowsky-Glahn, Vera; Mateu-Figueras, Gloria;
-Barcelo-Vidal, Carles (2003), "Isometric logratio transformations for
-compositional data analysis", Mathematical Geology, 35 (3): 279–300,
-doi:10.1023/A:1023818214614, S2CID 122844634
-
-See also, description of the same using "pivot coordinates" in
-
-Filzmoser, P., Hron, K., Templ, M. (2018). Geometrical Properties of
-Compositional Data. In: Applied Compositional Data Analysis. Springer Series
-in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-96422-5_3
+The (unconstraining) transform is defined iteratively.  Given an $x \in
+\mathbb{R}^{N + 1}$ that sums to zero (i.e., $\sum_{n=1}^{N+1} x_n =
+0$), the transform proceeds as follows to produce an unconstrained $y
+\in \mathbb{R}^N$.
 
+The transform is initialized by setting
+$$
+S_N = 0
+$$
+and
+$$
+y_N = -x_{N + 1} \cdot \frac{\sqrt{N \cdot (N + 1)}{N}}.
+$$
+The for each $n$ from $N - 1$ down to $1$, let
+$$
+w_{n + 1} = \frac{y_{n + 1}}{\sqrt{(n + 1) \cdot (n + 2)}},
+$$
+$$
+S_n = S_{n + 1} + w_{n + 1},
+$$
+and
+$$
+y_n = (S_n - x_{n + 1}) \cdot \frac{\sqrt{n \cdot (n + 1)}}{n}.
+$$
 
 ### Zero sum inverse transform {-}
 
-TODO: Document inverse isometric log ratio transform
+The inverse (constraining) transform follows the isometric logratio tranform.
+It maps an unconstrained vector $y \in \mathbb{R}^N$ to a zero-sum vector $x \in
+\mathbb{R}^{N + 1}$ such that
+$$
+\sum_{n=1}^{N + 1} x_n = 0.
+$$
+The values are defined inductively, starting with 
+$$
+x_1 = \sum_{n=1}^N \frac{y_n}{\sqrt{n \cdot (n + 1)}}
+$$
+and then setting
+$$
+x_{n + 1} = \sum_{i = n + 1}^N \frac{\sqrt{y_i}}{\sqrt{i \cdot (i + 1)}}
+- n \cdot \frac{y_n}{\sqrt{n \cdot (n + 1)}}.
+$$
+for $n \in 1{:}N$.
 
+The definition is such that
+$$
+\sum_{n = 1}^{N + 1} x_n = 0
+$$
+by construction, because each of the terms added to $x_{n}$ is then
+subtracted from $x_{n + 1}$ the number of times it shows up in earlier terms. 
 
 ### Absolute Jacobian determinant of the zero sum inverse transform {-}
 
 The inverse transform is a linear operation, leading to a constant Jacobian
 determinant which is therefore not included.
 
+
 ## Unit simplex {#simplex-transform.section}
 
 Variables constrained to the unit simplex show up in multivariate

From 71cea94b1a2a01103ab0e76ef4097e85f77e216a Mon Sep 17 00:00:00 2001
From: Bob Carpenter <bcarpenter@flatironinstitute.org>
Date: Fri, 4 Oct 2024 17:24:27 -0400
Subject: [PATCH 4/6] sorted reference links

---
 src/bibtex/all.bib | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bibtex/all.bib b/src/bibtex/all.bib
index 5e38e543d..999519190 100644
--- a/src/bibtex/all.bib
+++ b/src/bibtex/all.bib
@@ -1855,7 +1855,7 @@ @article{egozcue+etal:2003
   year={2003}
 }
 
-book{filzmoser+etal:2018,
+@book{filzmoser+etal:2018,
   title={Geometrical properties of compositional data},
   author={Filzmoser, Peter and Hron, Karel and Templ, Matthias},
   booktitle={Applied Compositional Data Analysis: With Worked Examples in R},

From 0daba60d07b697f584049af9c5d0b7fe1c17e699 Mon Sep 17 00:00:00 2001
From: Bob Carpenter <bcarpenter@flatironinstitute.org>
Date: Fri, 15 Nov 2024 15:14:22 -0500
Subject: [PATCH 5/6] updated sum-to-zero doc

---
 src/bibtex/all.bib                  |   8 ++
 src/reference-manual/transforms.qmd |  12 ++-
 src/stan-users-guide/regression.qmd | 142 +++++++++++++++-------------
 3 files changed, 95 insertions(+), 67 deletions(-)

diff --git a/src/bibtex/all.bib b/src/bibtex/all.bib
index 999519190..422727d55 100644
--- a/src/bibtex/all.bib
+++ b/src/bibtex/all.bib
@@ -1862,4 +1862,12 @@ @book{filzmoser+etal:2018
   pages={35--68},
   year={2018},
   publisher={Springer}
+}
+
+@misc{seyboldt:2024,
+  author="Seyboldt, Adrian",
+  title="Add ZeroSumNormal distribution",
+  note="pyro-ppl GitHub repository issue \#1751",
+  year = "2024",
+  url ="https://github.com/pyro-ppl/numpyro/pull/1751#issuecomment-1980569811"
 }
\ No newline at end of file
diff --git a/src/reference-manual/transforms.qmd b/src/reference-manual/transforms.qmd
index ae63ce10e..5f6ab8b27 100644
--- a/src/reference-manual/transforms.qmd
+++ b/src/reference-manual/transforms.qmd
@@ -475,9 +475,15 @@ $$
 $$
 
 For the transform, Stan uses the first part of an isometric log ratio
-transform;  see [@egozcue+etal:2003] for the basic
-definitions and Chapter 3 of [@filzmoser+etal:2018] for the pivot
-coordinate version used here.
+transform; see [@egozcue+etal:2003] for the basic definitions and
+Chapter 3 of [@filzmoser+etal:2018] for the pivot coordinate version
+used here.  Stan uses the isometric log ratio transform because it
+induces a geometry with zero correlation among the dimensions, making
+it easier for HMC to explore than simpler alternatives such as setting
+the final element to the negative sum of the first elements; see, e.g.,
+[@seyboldt:2024].
+
+
 
 
 ### Zero sum transform {-}
diff --git a/src/stan-users-guide/regression.qmd b/src/stan-users-guide/regression.qmd
index 42f49b37a..232c89f50 100644
--- a/src/stan-users-guide/regression.qmd
+++ b/src/stan-users-guide/regression.qmd
@@ -521,27 +521,43 @@ centered around zero, as is typical for regression coefficients.
 
 ## Parameterizing centered vectors
 
-It is often convenient to define a parameter vector $\beta$ that is
-centered in the sense of satisfying the sum-to-zero constraint,
-$$
-\sum_{k=1}^K \beta_k = 0.
-$$
-
-Such a parameter vector may be used to identify a multi-logit
-regression parameter vector (see the [multi-logit
-section](#multi-logit.section) for details), or may be used for
-ability or difficulty parameters (but not both) in an IRT model (see
-the [item-response model section](#item-response-models.section) for
-details).
-
-
-### $K-1$ degrees of freedom {-}
-
-As of Stan 2.36, there is a built in `sum_to_zero_vector`
-type which constrains $K-1$ free parameters into a length-$K$
-vector that sums to zero. This is using a more sophisticated
-transform than the previously recommended form of setting
-the final element of the vector to the negative sum of the previous elements.
+When there are varying effects in a regression, the resulting
+likelihood is not identified unless further steps are taken.  For
+example, we might have a global intercept $\alpha$ and then a varying
+effect $\beta_k$ for age group $k$ to make a linear predictor $\alpha +
+\beta_k$.  With this predictor, we can add a constant to $\alpha$ and
+subtract from each $\beta_k$ and get exactly the same likelihood.
+
+The traditional approach to identifying such a model is to pin the
+first varing effect to zero, i.e., $\beta_1 = 0$.  With one of the
+varying effects fixed, you can no longer add a constant to all of them
+and the model's likelihood is identified.  In addition to the
+difficulty in specifying such a model in Stan, it is awkward to
+formulate priors because the other coefficients are all interpreted
+relative to $\beta_1$.  
+
+In a Bayesian setting, a proper prior on each of the $\beta$ is enough
+to identify the model.  Unfortunately, this can lead to inefficiency
+during sampling as the model is still only weakly identified through
+the prior---there is a very simple example of the difference in
+the discussion of collinearity in @collinearity.section.
+
+An alternative identification strategy that allows a symmetric prior
+is to enforce a sum-to-zero constraint on the varying effects, i.e.,
+$\sum_{k=1}^K \beta_k = 0.$
+
+A parameter vector constrained to sum to zero may also be used to
+identify a multi-logit regression parameter vector (see the
+[multi-logit section](#multi-logit.section) for details), or may be
+used for ability or difficulty parameters (but not both) in an IRT
+model (see the [item-response model
+section](#item-response-models.section) for details).
+
+
+### Built-in sum-to-zero vector {-}
+
+As of Stan 2.36, there is a built in `sum_to_zero_vector` type, which
+can be used as follows.
 
 ```stan
 parameters {
@@ -550,10 +566,34 @@ parameters {
 }
 ```
 
-Placing a prior on `beta` in this parameterization leads to
-a subtly different posterior than that resulting from the same prior
-on `beta` in the original parameterization without the
-sum-to-zero constraint.
+This produces a vector of size `K` such that `sum(beta) = 0`.  In the
+unconstrained representation requires only `K - 1` values because the
+last is determined by the first `K - 1`.  
+
+Placing a prior on `beta` in this parameterization, for example,
+
+```stan
+  beta ~ normal(0, 1);
+```
+
+leads to a subtly different posterior than what you would get with the
+same prior on an unconstrained size-`K` vector.  As explained below,
+the variance is reduced.
+
+The sum-to-zero constraint can be implemented naively by setting the
+last element to the negative sum of the first elements, i.e., $\beta_K
+= -\sum_{k=1}^{K-1} \beta_k.$ But that leads to high correlation among
+the $\beta_k$.
+
+The transform used in Stan eliminates these correlations by
+constructing an orthogonal basis and applying it to the
+zero-sum-constraint; @seyboldt:2024 provides an explanation.  The
+*Stan Reference Manual* provides the details in the chapter on
+transforms.  Although any orthogonal basis can be used, Stan uses the
+inverse isometric log transform because it is convenient to describe
+and the transform simplifies to efficient scalar operations rather
+than more expensive matrix operations.
+
 
 #### Marginal distribution of sum-to-zero components {-}
 
@@ -568,53 +608,27 @@ model {
 }
 ```
 
-The components are not independent, as they must sum zero.  No
-Jacobian is required because the transform uses only linear
-operations (and thus have constant Jacobians).
+The scale component can be multiplied by `sigma` to produce a
+`normal(0, sigma)` prior marginally.
 
 To generate distributions with marginals other than standard normal,
 the resulting `beta` may be scaled by some factor `sigma` and
 translated to some new location `mu`.
 
-### Translated and scaled simplex {-}
-
-An alternative approach that's less efficient, but amenable to a
-symmetric prior, is to offset and scale a simplex.
-
-```stan
-parameters {
-  simplex[K] beta_raw;
-  real beta_scale;
-  // ...
-}
-transformed parameters {
-  vector[K] beta;
-  beta = beta_scale * (beta_raw - inv(K));
-  // ...
-}
-```
-
-Here `inv(K)` is just a short way to write `1.0 / K`.  Given
-that `beta_raw` sums to 1 because it is a simplex, the
-elementwise subtraction of `inv(K)` is guaranteed to sum to zero.
-Because the magnitude of the elements of the simplex is bounded, a
-scaling factor is required to provide `beta` with $K$ degrees of
-freedom necessary to take on every possible value that sums to zero.
-
-With this parameterization, a Dirichlet prior can be placed on
-`beta_raw`, perhaps uniform, and another prior put on
-`beta_scale`, typically for "shrinkage."
-
 
 ### Soft centering {-}
 
-Adding a prior such as $\beta \sim \textsf{normal}(0,\sigma)$ will provide a kind
-of soft centering of a parameter vector $\beta$ by preferring, all
-else being equal, that $\sum_{k=1}^K \beta_k = 0$.  This approach is only
-guaranteed to roughly center  if $\beta$ and the elementwise addition $\beta + c$
-for a scalar constant $c$ produce the same likelihood (perhaps by
-another vector $\alpha$ being transformed to $\alpha - c$, as in the
-IRT models).  This is another way of achieving a symmetric prior.
+Adding a prior such as $\beta \sim \textsf{normal}(0,\epsilon)$ for a
+small $\epsilon$ will provide a kind of soft centering of a parameter
+vector $\beta$ by preferring, all else being equal, that $\sum_{k=1}^K
+\beta_k = 0$.  This approach is only guaranteed to roughly center if
+$\beta$ and the elementwise addition $\beta + c$ for a scalar constant
+$c$ produce the same likelihood (perhaps by another vector $\alpha$
+being transformed to $\alpha - c$, as in the IRT models).  This is
+another way of achieving a symmetric prior, though it requires
+choosing an $\epsilon$.  If $\epsilon$ is too large, there won't be a
+strong enough centering effect and if it is too small, it will add
+high curvature to the target density and impede sampling.
 
 
 ## Ordered logistic and probit regression {#ordered-logistic.section}

From 9096b5afc8618afda667135994304a8cb348a6c7 Mon Sep 17 00:00:00 2001
From: Bob Carpenter <bcarpenter@flatironinstitute.org>
Date: Fri, 15 Nov 2024 16:11:39 -0500
Subject: [PATCH 6/6] remove redundant space

---
 src/reference-manual/types.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/reference-manual/types.qmd b/src/reference-manual/types.qmd
index ae0b6ee1d..999751551 100644
--- a/src/reference-manual/types.qmd
+++ b/src/reference-manual/types.qmd
@@ -707,7 +707,7 @@ sum_to_zero_vector[5] beta;
 ```
 
 Zero sum vectors are implemented as vectors and may be assigned to other
-vectors and vice-versa.  Zero sum  vector variables, like other constrained
+vectors and vice-versa.  Zero sum vector variables, like other constrained
 variables, are validated to ensure that they are indeed unit length; for
 zero sum  vectors, this is only done up to a statically specified accuracy
 threshold $\epsilon$ to account for errors arising from floating-point