sonarqube_issues.Rmd

---
title: "SonarQube Issues Model"
author: Hampus Broman & William Levén
date: 2021-05
output: 
  html_document: 
    pandoc_args: [ "-o", "docs/sonarqube_issues.html" ]
---

```{r include-setup, include=FALSE}
# Load setup file
source(knitr::purl('setup.Rmd', output = tempfile()))
```

## Looking at the data {.tabset}
it looks like there might be a difference between the high and low debt groups, the variance is decidedly higher in the high debt group.

```{r plot-1}
d.both_completed %>%
  ggplot(aes(x=sonarqube_issues, fill=high_debt_version)) + 
  geom_boxplot() +
  labs(
    title = "Number of issuess for the different debt levels",
    x ="Number of issues"
  ) +
  scale_y_continuous(breaks = NULL) +
  scale_fill_manual(
    name = "Debt level", 
    labels = c("High debt", "Low debt"), 
    values = c("#7070FF", "lightblue"), 
    guide = guide_legend(reverse = TRUE)
  ) 
```

## Descriptive Statistics:
```{r descriptive-statistics}
d.both_completed %>%
  pull(sonarqube_issues) %>% 
  summary()

sprintf("Variance: %.2f", var(pull(d.both_completed, sonarqube_issues)))
```

## Initial model
Variable names are modeled using the negative binomial family rather than poisson since the variance is greater than the mean.

We include `high_debt_verison` as a predictor in our model as this variable represent the very effect we want to measure.
We also include a varying intercept for each individual to prevent the model from learning too much from single participants with extreme measurements.

### Selecting priors {.tabset}

We iterate over the model until we have sane priors, in this case a prior that reasonably cna fit our data without being too restrictive.

#### Base model with priors

```{r initial-model-definition, class.source = 'fold-show'}
sonarqube_issues.with <- extendable_model(
  base_name = "sonarqube_issues",
  base_formula = "sonarqube_issues ~ 1 + high_debt_version + (1 | session)",
  base_priors = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(1.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = d.both_completed,
  base_control = list(adapt_delta = 0.98)
)
```

#### Default priors

```{r default-priors}
prior_summary(sonarqube_issues.with(only_priors= TRUE))
```
#### Selected priors

```{r selected-priors}
prior_summary(sonarqube_issues.with(sample_prior = "only"))
```

#### Prior predictive check

```{r priors-check, message=FALSE, warning=FALSE}
pp_check(sonarqube_issues.with(sample_prior = "only"), nsamples = 400, type = "bars")  + xlim(-1, 15)
```

#### Beta parameter influence

We choose a beta prior that allows for large effects (+-10 issues) but is skeptical to any effects larger than +-4 issues.

```{r priors-beta, warning=FALSE}
sim.size <- 1000
sim.intercept <- rnorm(sim.size, 1.5, 1)
sim.beta <- rnorm(sim.size, 0, 1)
sim.beta.diff <- exp(sim.intercept + sim.beta) - exp(sim.intercept)
sim.beta.diff.min <- sim.beta.diff

data.frame(x = sim.beta.diff.min) %>%
  ggplot(aes(x)) +
  geom_density() +
  xlim(-15, 15) +
  labs(
    title = "Beta parameter prior influence",
    x = "Issues difference",
    y = "Density"
  )

```

### Model fit  {.tabset}
We check the posterior distribution and can see that the model seems to have been able to fit the data well
Sampling seems to also have worked well as Rhat values are close to 1 and the sampling plots look nice.

#### Posterior predictive check

```{r base-pp-check, message=FALSE, warning=FALSE}
pp_check(sonarqube_issues.with(), nsamples = 200, type = "bars") + xlim(-1, 15)
```

#### Summary

```{r base-summary}
summary(sonarqube_issues.with())
```

#### Sampling plots

```{r base-plot}
plot(sonarqube_issues.with(), ask = FALSE)
```

## Model predictor extenstions {.tabset}

```{r mo-priors}
# default prior for monotonic predictor
edlvl_prior <- prior(dirichlet(2), class = "simo", coef = "moeducation_level1")
```

### One variable {.tabset}

```{r model-extension-1, warning=FALSE, class.source = 'fold-show'}
loo_result <- loo(
  # Benchmark model(s)
  sonarqube_issues.with(),
  # New model(s)
  sonarqube_issues.with("modified_lines"),
  sonarqube_issues.with("work_domain"),
  sonarqube_issues.with("work_experience_programming.s"),
  sonarqube_issues.with("work_experience_java.s"),
  sonarqube_issues.with("education_field"),
  sonarqube_issues.with("mo(education_level)", edlvl_prior),
  sonarqube_issues.with("workplace_peer_review"),
  sonarqube_issues.with("workplace_td_tracking"),
  sonarqube_issues.with("workplace_pair_programming"),
  sonarqube_issues.with("workplace_coding_standards"),
  sonarqube_issues.with("scenario"),
  sonarqube_issues.with("group")
)


```

#### Comparison

```{r model-extension-1-sum, warning=FALSE}
loo_result[2]
```

#### Diagnostics

```{r model-extension-1-dig, warning=FALSE}
loo_result[1]
```

## Candidate models {.tabset}

We inspect some of our top performing models. 

All models seems to have sampled nicely (rhat = 1 and fluffy plots) they also have about the same fit to the data and similar estimates for the high_debt_version beta parameter.

### sonarqube_issues0  {.tabset}

We select the simplest model as a baseline.

```{r sonarqube_issues0, class.source = 'fold-show'}
sonarqube_issues0 <- brm(
  "sonarqube_issues ~ 1 + high_debt_version + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(1.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = d.both_completed,
  control = list(adapt_delta = 0.97),
  file = "fits/sonarqube_issues0",
  file_refit = "on_change",
  seed = 20210421
)

```

#### Summary

```{r sonarqube_issues0-sum}
summary(sonarqube_issues0)
```

#### Random effects

```{r sonarqube_issues0-raneff}
ranef(sonarqube_issues0)
```

#### Sampling plots

```{r sonarqube_issues0-plot}
plot(sonarqube_issues0, ask = FALSE)
```

#### Posterior predictive check

```{r sonarqube_issues0-pp, message=FALSE, warning=FALSE}
pp_check(sonarqube_issues0, nsamples = 200, type = "bars")  + xlim(-1, 15)
```

### sonarqube_issues1  {.tabset}
We select the best performing model with one variable.
  
```{r sonarqube_issues1, class.source = 'fold-show'}
sonarqube_issues1 <- brm(
  "sonarqube_issues ~ 1 + high_debt_version + (1 | session) + group",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(1.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = d.both_completed,
  control = list(adapt_delta = 0.97),
  file = "fits/sonarqube_issues1",
  file_refit = "on_change",
  seed = 20210421
)

```

#### Summary

```{r sonarqube_issues1-sum}
summary(sonarqube_issues1)
```

#### Random effects

```{r sonarqube_issues1-raneff}
ranef(sonarqube_issues1)
```

#### Sampling plots

```{r sonarqube_issues1-plot}
plot(sonarqube_issues1, ask = FALSE)
```

#### Posterior predictive check

```{r sonarqube_issues1-pp, message=FALSE, warning=FALSE}
pp_check(sonarqube_issues1, nsamples = 200, type = "bars") + xlim(-1, 15)
```

## Final model 
All candidate models look nice, none is significantly better than the others, we will proceed the simplest model: `sonarqube_issues0`

### Variations {.tabset}
We will try a few different variations of the selected candidate model.

#### All data points {.tabset}

Some participants did only complete one scenario. Those has been excluded from the initial dataset to improve sampling of the models. We do however want to use all data we can and will therefore try to fit the model with the complete dataset.

```{r variation.all, class.source = 'fold-show'}
sonarqube_issues0.all <- brm(
  "sonarqube_issues ~ 1 + high_debt_version + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(1.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.97),
  file = "fits/sonarqube_issues0.all",
  file_refit = "on_change",
  seed = 20210421
)
```

##### Summary

```{r variation.all-sum}
summary(sonarqube_issues0.all)
```

##### Random effects

```{r variation.all-raneff}
ranef(sonarqube_issues0.all)
```

##### Sampling plots

```{r variation.all-plot}
plot(sonarqube_issues0.all, ask = FALSE)
```

##### Posterior predictive check

```{r variation.all-pp, message=FALSE, warning=FALSE}
pp_check(sonarqube_issues0.all, nsamples = 200, type = "bars") + xlim(-1, 15)
```

#### With experience predictor {.tabset}

As including all data points didn't harm the model we will create this variant with all data points as well.

This variation includes `work_experience_programming.s` predictors as it can give further insight into how experience play a factor in the effect we try to measure. This is especially important as our sampling shewed towards containing less experienced developer than the population at large.

```{r variation.all.exp, class.source = 'fold-show'}

sonarqube_issues0.all.exp <- brm(
  "sonarqube_issues ~ 1 + high_debt_version + (1 | session) + work_experience_programming.s",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(1.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.99),
  file = "fits/sonarqube_issues0.all.exp",
  file_refit = "on_change",
  seed = 20210421
)

```

##### Summary

```{r variation.all.exp-sum}
summary(sonarqube_issues0.all.exp)
```

##### Random effects

```{r variation.all.exp-raneff}
ranef(sonarqube_issues0.all.exp)
```

##### Loo comparison

```{r variation.all.exp-loo, warning=FALSE}
loo(
  sonarqube_issues0.all,
  sonarqube_issues0.all.exp
)
```

##### Sampling plots

```{r variation.all.exp-plot}
plot(sonarqube_issues0.all.exp, ask = FALSE)
```

##### Posterior predictive check

```{r variation.all.exp-pp, message=FALSE, warning=FALSE}
pp_check(sonarqube_issues0.all.exp, nsamples = 200, type = "bars") + xlim(-1, 15)
```


### Final model
* Fitting the model to all data point did not significantly damage the model and will be used as is a more fair representation of reality.
* Adding the experience predictors did not significantly damage the model and will be used as it provides useful insight.

This means that our final model, with all data points and experience predictors, is `sonarqube_issues0.all.exp`

## Interpreting the model
To begin interpreting the model we look at how it's parameters were estimated. As our research is focused on how the outcome of the model is effected we will mainly analyze the $\beta$ parameters.

### $\beta$ parameters
```{r interpret-beta-plot, warning=FALSE, message=FALSE}
mcmc_areas(sonarqube_issues0.all.exp, pars = c("b_high_debt_versionfalse", "b_work_experience_programming.s"), prob = 0.95) + scale_y_discrete() +
  scale_y_discrete(labels=c("High debt version: false", "Professional programming experience")) +
  ggtitle("Beta parameters densities in sonarqube issues model", subtitle = "Shaded region marks 95% of the density. Line marks the median")
```


### Effects sizes

```{r effect-size-1, message=FALSE, warning=FALSE}
scale_programming_experience <- function(x) {
  (x - mean(d.completed$work_experience_programming))/ sd(d.completed$work_experience_programming)
}
unscale_programming_experience <- function(x) {
  x * sd(d.completed$work_experience_programming) + mean(d.completed$work_experience_programming)
}

post_settings <- expand.grid(
  high_debt_version = c("false", "true"),
  session = NA,
  work_experience_programming.s = sapply(c(0, 3, 10, 25, 40), scale_programming_experience)
)

post <- posterior_predict(sonarqube_issues0.all.exp, newdata = post_settings) %>%
  melt(value.name = "estimate", varnames = c("sample_number", "settings_id")) %>%
  left_join(
    rowid_to_column(post_settings, var= "settings_id"),
    by = "settings_id"
  ) %>%
  mutate(work_experience_programming = unscale_programming_experience(work_experience_programming.s)) %>%
  select(
    estimate,
    high_debt_version,
    work_experience_programming
  )%>%
  mutate(estimate = estimate)

ggplot(post, aes(x=estimate, fill = high_debt_version)) +
  geom_bar(position = "dodge2") +
  scale_fill_manual(
    name = "Debt version",
    labels = c("Low debt", "High debt"),
      values = c("lightblue", "darkblue")
  ) +
  facet_grid(rows = vars(work_experience_programming)) +
  labs(
    title = "SonarQube issues introduced / years of programming experience",
    subtitle = "Estimated for five different experience levels",
    x = "Issued introduced",
    y = "Incidence rate"
  ) + 
  xlim(-1, 10) + 
  scale_x_continuous(limits = c(-1,7), breaks = c(0,1,2,3,4,5,6,7), labels = c("0","1","2","3","4","5","6","7")) +
  scale_y_continuous(limits = NULL, breaks = sapply(c(0.1, 0.3, 0.5), function(x) x*nrow(post) / 10), labels = c("10%","30%","50%")) + 
  theme(legend.position = "top")

```


```{r effect-size-diff, warning=FALSE, message=FALSE}
post.diff <- post %>% filter(high_debt_version == "true")
post.diff$estimate = post.diff$estimate -  filter(post, high_debt_version == "false")$estimate

ggplot(post.diff, aes(x=estimate)) +
  geom_boxplot(quantile_lines = TRUE, quantile_fun = hdi, vline_linetype = 2) +
  xlim(-7, 7) +
  facet_grid(rows = vars(work_experience_programming)) +
  labs(
    title = "SonarQube issues introduced difference / years of programming experience",
    subtitle = "Difference as: high debt issues - low debt issues",
    x = "Issues # difference"
  ) +
  scale_y_continuous(breaks = NULL)
```

We can then proceed to calculate some likelihoods:

```{r es-a, class.source = 'fold-show'}
d <- post
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
```

Given all the simulated cases we find that they introduce `r scales::label_percent()(x - 1)` more issues in the high debt version.

```{r es-10, class.source = 'fold-show'}
d <- post %>% filter(work_experience_programming == 10)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
```

Considering developers with 10 years of professional programming experience we find that they introduce `r scales::label_percent()(x - 1)` more issues in the high debt version.

```{r es-0, class.source = 'fold-show'}
d <- post %>% filter(work_experience_programming == 0)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
```

Considering developers with no of professional programming experience we find that they introduce `r scales::label_percent()(x - 1)` more issues in the high debt version.


```{r es-25, class.source = 'fold-show'}
d <- post %>% filter(work_experience_programming == 25)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
```

Considering developers with 25 years of professional programming experience we find that they introduce `r scales::label_percent()(x - 1)` more issues in the high debt version.