Skip to content

Commit 189a727

Browse files
committed
update
1 parent 002e570 commit 189a727

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+738
-3106
lines changed

OPM_application_nominal.ipynb

+502
Large diffs are not rendered by default.

OPM_application_nominal.qmd

+16-12
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,11 @@ jupyter: ir
1111

1212
```{r}
1313
source('code/tplotfunctions.R')
14-
source('code/optimal_predictor_machine-nominal.R')
14+
source('code/OPM-nominal/buildP.R')
15+
source('code/OPM-nominal/forecastP.R')
16+
source('code/OPM-nominal/guessmetadata.R')
17+
source('code/OPM-nominal/plotFsamples1D.R')
18+
source('code/OPM-nominal/rF.R')
1519
```
1620
```{r}
1721
#| echo: false
@@ -71,7 +75,7 @@ Our agent at the moment doesn't know anything at all, not even about the existen
7175
Let us give it the basic background information about the population: the variates' names and domains. We do this through the function `finfo()`: it has a `data` argument, which we omit for the moment, and a `metadata` argument. The latter can simply be the name of the file containing the metadata (NB: this file must have a specific format):
7276

7377
```{r}
74-
priorknowledge <- buildK(metadata='datasets/glass_metadata-4_lev.csv')
78+
priorknowledge <- buildP(metadata='datasets/glass_metadata-4_lev.csv')
7579
```
7680

7781
The agent now possesses this basic background knowlege, encoded in the `priorknowledge` object. The encoding uses a particular mathematical representation which, however, is of no interest to us^[If you're curious you can have a glimpse at it with the command `str(priorknowledge)`, which displays structural information about an object.]. Other representations could also be used, but the knowledge would be the same. Think of this as encoding an image into a `png` or other lossless format: the representation of the image would be different, but the image would be the same.
@@ -86,7 +90,7 @@ Let's ask the agent: what is the marginal frequency distribution for the variate
8690
This probability distribution for the $\vType$ variate is calculated by the function `fmarginal()`. It has arguments `finfo`: the agent's information; and `variates`: the names of the variates of which we want the marginal frequencies:
8791

8892
```{r}
89-
priorknowledge_type <- fprobability(K=priorknowledge, marginal='Type', Kout=TRUE)
93+
priorknowledge_type <- forecastP(P=priorknowledge, marginal='Type', Kout=TRUE)
9094
9195
```
9296

@@ -99,7 +103,7 @@ The function `plotsamples1D()` does this kind of visual representation. It has a
99103
How do you think this probability distribution will look like? what kind of marginal frequencies we do expect in the full population?
100104

101105
```{r}
102-
plotsamples1D(K=priorknowledge_type, n=100, predict=FALSE)
106+
plotsamples1D(P=priorknowledge_type, n=100, predict=FALSE)
103107
```
104108

105109
You see that anything goes: Some frequency distributions give frequency almost `1` to a specific value, and almost `0` to the others. Other frequency distributions spread out the frequencies more evenly, with some peaks here or there.
@@ -120,7 +124,7 @@ Before continuing, ask yourself the same question: which probabilities would you
120124
The agent's answer this time is a probability distribution over seven values, which we can draw faithfully. The function `plotsamples1D()` can draw this probability as well, if we give the argument `predict=TRUE` (default):
121125

122126
```{r}
123-
plotsamples1D(K=priorknowledge_type)
127+
plotsamples1D(P=priorknowledge_type)
124128
```
125129

126130
This plot shows the [probability distribution]{.blue} for the next unit in [blue]{.blue}, together with a sample of 100 possible frequency distributions for the $\vType$ variate over the full population. Note that samples are drawn anew every time, so they can look somewhat differently from time to time.^[To have reproducible plots, use `set.seed(314)` (or any integer you like) before calling the plot function.]
@@ -139,10 +143,10 @@ Inspect the agent's inferences for other variates.
139143

140144
### Learning from the sample data
141145

142-
Now let's give the agent the data from the sample of 214 glass fragments. This is done again with the `buildK()` function, but providing the `data` argument, which can be the name of the data file:
146+
Now let's give the agent the data from the sample of 214 glass fragments. This is done again with the `buildP()` function, but providing the `data` argument, which can be the name of the data file:
143147

144148
```{r}
145-
postknowledge <- buildK(data='datasets/glass_data-4_lev.csv', metadata='datasets/glass_metadata-4_lev.csv')
149+
postknowledge <- buildP(data='datasets/glass_data-4_lev.csv', metadata='datasets/glass_metadata-4_lev.csv')
146150
```
147151

148152
The `postknowledge` object contains the agent's knowledge from the metadata and the sample data. This object can be used in the same way as the object representing the agent's background knowledge.
@@ -154,9 +158,9 @@ Now that the agent has learned from the data, we can ask it again what is the ma
154158
We calculate the probability for the possible marginal frequency distributions, and then plot it as a set of 100 representative samples:
155159

156160
```{r}
157-
postknowledge_type <- fprobability(K=postknowledge, marginal='Type', Kout=TRUE)
161+
postknowledge_type <- forecastP(P=postknowledge, marginal='Type', Kout=TRUE)
158162
159-
plotsamples1D(K=postknowledge_type, predict=FALSE)
163+
plotsamples1D(P=postknowledge_type, predict=FALSE)
160164
```
161165

162166
This plot shows two important aspects of this probability distribution and of the agent's current state of knowledge:
@@ -185,7 +189,7 @@ Finally we ask the agent what $\vType$ value we should observe in the next glass
185189
```{r}
186190
#| label: fig-unconditional-glass
187191
#| fig-cap: "[Frequency distributions for full population]{.grey}, and [probability distribution for next unit]{.blue}"
188-
plotsamples1D(K=postknowledge_type)
192+
plotsamples1D(P=postknowledge_type)
189193
```
190194

191195

@@ -213,7 +217,7 @@ The detectives would like to know what's the possible origin of this fragment, t
213217
First, the agent can calculate the probability distribution over the *conditional frequencies* ([§ @sec-conditional-freqs]) of the $\vType$ values for the subpopulation ([§ @sec-subpopulations]) of units having the specific variate values above. This calculation is done with the function `fconditional()`, with arguments `finfo`: the agent's current knowledge, and `unitdata`: the partial data obtained from the unit.
214218

215219
```{r}
216-
condknowledge_type <- fprobability(K=postknowledge, marginal='Type', conditional=newfragment, Kout=TRUE)
220+
condknowledge_type <- forecastP(P=postknowledge, marginal='Type', conditional=newfragment, Kout=TRUE)
217221
```
218222

219223
The `condknowledge` object contains the agent's knowledge conditional on the variates given; this knowledge is about the remaining variates, which in this case are the single variate $\vType$ (so the `fmarginal()` calculation is actually redundant in this case).
@@ -225,7 +229,7 @@ Both inferences can be visualized in the usual way:
225229
```{r}
226230
#| label: fig-conditional-glass
227231
#| fig-cap: "[Conditional frequency distributions for full population]{.grey}, and [conditional probability distribution for next unit]{.blue}"
228-
plotsamples1D(K=condknowledge_type)
232+
plotsamples1D(P=condknowledge_type)
229233
```
230234

231235
The agent thus gives a probability around $80\%$ to the fragment's being of $\cat{T1}$ type, around $10\%$ of being $\cat{T2}$ type, and around $5\%$ of being $\cat{T5}$ type. It also shows that further training data could change these probabilities by even $\pm 10\%$ or even $\pm 15\%$.

OPM_application_nominal_files/execute-results/html.json

+11
Large diffs are not rendered by default.

_quarto.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ book:
107107
chapters:
108108
- dirichlet-mixture.qmd
109109
- prototype_code.qmd
110-
- OPM_application_nominal.qmd
110+
# - OPM_application_nominal.qmd
111111

112112
- part: "[**Decision theory**]{.lightblue}"
113113
chapters:

code/OPM-nominal/buildK.R code/OPM-nominal/buildP.R

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
buildK <- function(metadata, data=NULL, alphas=NULL){
1+
buildP <- function(metadata, data=NULL, alphas=NULL){
22
#### Build object encoding background knowledge and learned knowledge
33
#### Requires 'data.table'
44
##

code/OPM-nominal/forecastK.R code/OPM-nominal/forecastP.R

+8-8
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
forecastK <- function(K, conditional=NULL){
1+
forecastP <- function(P, conditional=NULL){
22
#### Calculate conditional or unconditional probability
3-
variates <- names(dimnames(K[['freqs']]))
4-
M <- length(K[['freqs']])
3+
variates <- names(dimnames(P[['freqs']]))
4+
M <- length(P[['freqs']])
55
## Selection of conditional values
66
## select subarray of freqs corresponding to the conditional values
77
if(!is.null(conditional)){
@@ -18,23 +18,23 @@ forecastK <- function(K, conditional=NULL){
1818
iconditional <- match(names(conditional), variates)
1919
totake <- as.list(rep(TRUE, length(variates)))
2020
totake[iconditional] <- conditional
21-
freqs <- do.call(`[`, c(list(K[['freqs']]), totake))
21+
freqs <- do.call(`[`, c(list(P[['freqs']]), totake))
2222
if(is.null(dim(freqs))){
2323
dim(freqs) <- length(freqs)
24-
dimnames(freqs) <- dimnames(K[['freqs']])[-iconditional]
24+
dimnames(freqs) <- dimnames(P[['freqs']])[-iconditional]
2525
}
2626
}else{
27-
freqs <- K[['freqs']]
27+
freqs <- P[['freqs']]
2828
}
2929
##
3030
## create an array of forecast variates and alphas
3131
freqs <- aperm(
32-
sapply(K[['alphas']], function(alpha){
32+
sapply(P[['alphas']], function(alpha){
3333
log(M*alpha + freqs)
3434
}, simplify='array'),
3535
c(length(dim(freqs))+1, 1:length(dim(freqs)))
3636
)
37-
freqs <- freqs - max(freqs) + K[['valphas']]
37+
freqs <- freqs - max(freqs) + P[['valphas']]
3838
##
3939
temp <- dimnames(freqs)[-1]
4040
freqs <- colSums(exp(freqs))

code/OPM-nominal/plotFsamples1D.R

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
plotFsamples1D <- function(K, n=100, predict=TRUE, file=NULL){
1+
plotFsamples1D <- function(P, n=100, predict=TRUE, file=NULL){
22
#### Plot samples of full-population freq. distributions for one variate
33
#### Requires 'png' to plot png
4-
if(length(dim(K[['freqs']])) > 1){
4+
if(length(dim(P[['freqs']])) > 1){
55
stop('State of knowledge comprises more than one variate.')
66
}
7-
samples <- rF(n=n, K=K)
7+
samples <- rF(n=n, P=P)
88
if(!is.null(file)){
99
filext <- sub(".*\\.|.*", "", file, perl=TRUE)
1010
if(filext == 'pdf'){
@@ -21,7 +21,7 @@ plotFsamples1D <- function(K, n=100, predict=TRUE, file=NULL){
2121
lty=1, lwd=1, pch=16, col=7, alpha=0.5, cex=0.75
2222
)
2323
if(predict){
24-
fmean <- fprobability(K=K, Kout=F)
24+
fmean <- fprobability(P=P, Kout=F)
2525
tplot(y=fmean, x=1:ncol(samples), type='b',
2626
lty=1, lwd=4, pch=18, col=1, alpha=0.25, cex=1, add=T
2727
)

code/OPM-nominal/rF.R

+6-6
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
rF <- function(n=1, K){
1+
rF <- function(n=1, P){
22
#### Return a sample of full-population frequency
33
#### Requires 'extraDistr'
4-
alphasample <- sample(rep(K[['alphas']],2), size=n, replace=T,
5-
prob=rep(K[['palphas']],2))
6-
ff <- extraDistr::rdirichlet(n, alpha=outer(alphasample, c(K[['freqs']]), `+`))
7-
dim(ff) <- c(n,dim(K[['freqs']]))
8-
dimnames(ff) <- c(list(sample=NULL), dimnames(K[['freqs']]))
4+
alphasample <- sample(rep(P[['alphas']],2), size=n, replace=T,
5+
prob=rep(P[['palphas']],2))
6+
ff <- extraDistr::rdirichlet(n, alpha=outer(alphasample, c(P[['freqs']]), `+`))
7+
dim(ff) <- c(n,dim(P[['freqs']]))
8+
dimnames(ff) <- c(list(sample=NULL), dimnames(P[['freqs']]))
99
ff
1010
}

0 commit comments

Comments
 (0)