08_Regression.Rmd

---
title: "08 Regression models"
output: 
  html_document:
    keep_md: true
    toc: true
    toc_depth: 3
    toc_float: true
    code_folding: hide
    df_print: paged   
---


Regression analysis of hydrographic and plankton variables.

Check out package dynlm for including lagged variables - see https://stackoverflow.com/a/13096824/1734247 

## 0. Libraries
```{r}
# to check which libraries are set
#.libPaths()
#.libPaths("C:/Data/R/R-3.5.1/library")
library(tidyverse)
library(readxl)
library(MASS)
library (ggplot2)
library(visreg)
library(vegan)
library(lme4) # for mixed effect models
#install.packages("standardize")
library(standardize) # standardize data for modelling
#install.packages("afex")
library(afex) # adding p-values to lmer output
#install.packages("sjPlot") 
library(sjPlot) # effectplots lmer
# library(pander)
#install.packages("glmulti")
#library(glmulti)
#install.packages("rJava") # NB! I needed to installe Java for Windows x64 here: https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

#install.packages("MuMIn") 
library(MuMIn)
#citation ("MuMIn")
#citation ("ggplot2")
#citation ("rkt")
#citation ()

#library(rJava)
```

## 1. Data  
### a. Read
```{r}
dat_a <- read.csv("Data_produced/06_dat_a.csv")
dat_q <- read.csv("Data_produced/06_dat_q.csv")
```

### b. Check variables
```{r}
unique(dat_a$Variable)
```

### c. Make data on broad format
```{r}
dat <- 
  dat_a %>% 
  spread(Variable, Value)
```


### d. Add NAO data
```{r}
df_nao <- read.table("https://climatedataguide.ucar.edu/sites/default/files/nao_station_djfm.txt", skip = 1, header = FALSE)
colnames(df_nao) <- c("Year", "Winter_NAO")

dat <- left_join(dat, df_nao)

```

### e. Wind data ??? Dag you mentioned you had files for Færder and Torungen?


### f. Checking missing years in dataset
* NB! TSM has been interpolated for 2012-2013, we decided not to interpolate the _Deep nutrients and POM, because these are at the end of the time series
```{r}

str(dat)

check <-  dat %>%
  summarise_all(funs(sum(is.na(.))))

complete.cases(dat)
# there are missing values for four first years (1990-1994) and the last four years (2013-2016)
  #River TOC missing in 1990 and 1991, 
  #P measurements from hydro missing 1990
  #Plankton group data start in 1994
  #River_Si starts in 1995
  # DIN_deep, POC_deep, PON,_deep and POP_deep is missing from 2013-2016

# Suggestion: 
  #1. We cut 1990 and 1991 from the dataset (many missing variables)
  #2. we cut variables that have more than one year missing (have them separate below so we can test)

# NB! keep in mind that the deep POC and PON are the variables with highest upward trends in the MannKendal test. but a possible proxy could be the TSM_deep which is sampled over the whole time period, and has fairly high correlation with POC (0.63). High correlations also between the deep and intermediate values for POC (0.75), PON (0.74) and DIN (0.72). 
plot (dat$Hydro_TSM_Deep, dat$Hydro_POC_Deep)
plot (dat$Hydro_TSM_Deep, dat$Hydro_PON_Deep)
plot (dat$Hydro_POC_Deep, dat$Hydro_POC_Intermediate)
plot (dat$Hydro_DIN_Deep, dat$Hydro_DIN_Intermediate)
cor(dat$Hydro_POC_Surface, dat$Hydro_POC_Intermediate, use = "complete.obs")
#cor(dat$Hydro_TSM_Deep, dat$Hydro_POC_Deep, use = "complete.obs")
#cor(dat$Hydro_PON_Deep, dat$Hydro_PON_Intermediate, use = "complete.obs")
#cor(dat$Hydro_DIN_Deep, dat$Hydro_DIN_Intermediate, use = "complete.obs")

# filter the datasets according to suggestion above
cols <- c("Hydro_DIN_Deep", "Hydro_POC_Deep", "Hydro_PON_Deep", "Hydro_POP_Deep", "River_Si")

dat_sel <- dat %>% 
  filter(Year != 1990 & Year != 1991) %>% 
  dplyr::select(-starts_with('Plank')) %>% 
  dplyr::select(-one_of(cols))

```


# PART ONE - Hydrographic data

**Background**

* overall aim is to look at relationship between climate drivers (temperature, river discharge and transport) and observed responses in hydrography. 

* suggested approach:
* response variables chosen from apriori knowledge and MannKendall trends
* predictor variables for each response chosen from ordination plots (in point 1, to reduce the number of highly correlated predictors) and expert judgement


## 1. PCA of environmental variables
```{r environmental PCA, fig.height = 10, fig.width = 10}
#tail(dat)

# only complete cases now, se 1f above for selection
complete.cases(dat_sel)
dat.pca <- dat_sel[complete.cases(dat_sel), ]

##### on all vars in dat_sel
# ser på miljøvaribelrommet
var.pca <- rda(dat.pca[,-1], scale = TRUE)
envvar  <- data.frame(scores(var.pca, display = "species"))
#head(envvar)
years   <- data.frame(scores(var.pca, display = "sites"))
# make dataset for ggplot
pca_all <- bind_rows(envvar, years)
pca_all$obj <- c(row.names(envvar), dat.pca$Year)
pca_all$type <- c(rep("env", nrow(envvar)), rep("yr", nrow(years)))
# create ggplot
envvarplot_all <- pca_all %>% 
  ggplot(aes(PC1, PC2)) +
  geom_point(data = pca_all[pca_all$type=="yr", ], aes(color = as.numeric(as.character(obj)))) +
  labs(color = "Year") +
  geom_segment(data = pca_all[pca_all$type=="env", ], aes(x=0, xend=PC1*10, y=0, yend=PC2*10), 
                 linetype="dashed", 
                 arrow = arrow(length = unit(0.2, "cm"), type="closed"),
                 size = 0.1, col = "dark gray") +
  geom_text(data = pca_all[pca_all$type=="env", ], aes(x = PC1*10, y = PC2*10, label = obj),  color = "red", size = 3)

envvarplot_all
ggsave ("Figures_rapp/Ordination_hydro_rivers.png", width = 8, height = 8, dpi=500)

##### on all hydro vars
str (dat.pca)
hydro.pca <- dat.pca %>% 
  dplyr::select(starts_with("Yea"), starts_with("Hydro"))

# ser på miljøvaribelrommet
var.pca <- rda(hydro.pca[,-1], scale = TRUE)
envvar  <- data.frame(scores(var.pca, display = "species"))
#head(envvar)
years   <- data.frame(scores(var.pca, display = "sites"))
# make dataset for ggplot
pca_all <- bind_rows(envvar, years)
pca_all$obj <- c(row.names(envvar), hydro.pca$Year)
pca_all$type <- c(rep("env", nrow(envvar)), rep("yr", nrow(years)))
# create ggplot
envvarplot_hydro <- pca_all %>% 
  ggplot(aes(PC1, PC2)) +
  geom_point(data = pca_all[pca_all$type=="yr", ], aes(color = as.numeric(as.character(obj)))) +
  labs(color = "Year") +
  geom_segment(data = pca_all[pca_all$type=="env", ], aes(x=0, xend=PC1*10, y=0, yend=PC2*10), 
                 linetype="dashed", 
                 arrow = arrow(length = unit(0.2, "cm"), type="closed"),
                 size = 0.1, col = "dark gray") +
  geom_text(data = pca_all[pca_all$type=="env", ], aes(x = PC1*10, y = PC2*10, label = obj),  color = "red", size = 3)

  
envvarplot_hydro
ggsave ("Figures_rapp/Ordination_hydro.png", width = 8, height = 8, dpi=500)

##### on all river vars
#str (dat.pca)
river.pca <- dat.pca %>% 
  dplyr::select(starts_with("Yea"), starts_with("River"))

# ser på miljøvaribelrommet
var.pca <- rda(river.pca[,-1], scale = TRUE)
envvar  <- data.frame(scores(var.pca, display = "species"))
#head(envvar)
years   <- data.frame(scores(var.pca, display = "sites"))
# make dataset for ggplot
pca_all <- bind_rows(envvar, years)
pca_all$obj <- c(row.names(envvar), river.pca$Year)
pca_all$type <- c(rep("env", nrow(envvar)), rep("yr", nrow(years)))
# create ggplot
envvarplot_river <- pca_all %>% 
  ggplot(aes(PC1, PC2)) +
  geom_point(data = pca_all[pca_all$type=="yr", ], aes(color = as.numeric(as.character(obj)))) +
  labs(color = "Year") +
  geom_segment(data = pca_all[pca_all$type=="env", ], aes(x=0, xend=PC1*3, y=0, yend=PC2*3), 
                 linetype="dashed", 
                 arrow = arrow(length = unit(0.2, "cm"), type="closed"),
                 size = 0.1, col = "dark gray") +
  geom_text(data = pca_all[pca_all$type=="env", ], aes(x = PC1*3, y = PC2*3, label = obj),  color = "red", size = 3)

  
envvarplot_river
ggsave ("Figures_rapp/Ordination_rivers.png", width = 8, height = 8, dpi=500)


```

## 2. Regression analyses

### Testing assumptions
```{r}

str (dat_sel)

# linear relationship between variables?
scatter.smooth(x=dat_sel$River_TOC, y=dat_sel$Hydro_TSM_Deep, main="TSM ~ River TOC")

# normal distribution?
hist (dat_sel$Hydro_TSM_Deep)
hist (dat_sel$Hydro_POC_Surface)
hist (dat_sel$Hydro_POC_Intermediate)
hist (dat_sel$Hydro_Temperature_Deep)

## Have a look at the densities
plot(density(dat_sel$Hydro_TSM_Deep));plot(density (dat_sel$River_TOC))
## Perform the test
shapiro.test(dat_sel$Hydro_TSM_Deep); shapiro.test(dat_sel$River_TOC)
## Plot using a qqplot
qqnorm(dat_sel$Hydro_TSM_Deep);qqline(dat_sel$Hydro_TSM_Deep, col = 2)
qqnorm(dat_sel$River_TOC);qqline(dat_sel$River_TOC, col = 2)
qqnorm(dat_sel$Hydro_POC_Surface);qqline(dat_sel$Hydro_POC_Surface, col = 2)

```


### Linear models - selected responses using BIC
```{r}
options (na.action= "na.fail") 
?dredge

# tab_model(BB05NQI_lm, BB35NQI_lm, show.est = FALSE, show.intercept = FALSE)

save_plots <- TRUE

### 1a POC Surface
full_model <- lm(Hydro_POC_Surface ~ River_TOC + River_SPM + Winter_NAO + Hydro_Salinity_Surface + Hydro_DIN_Surface + Hydro_PO4_Surface + Hydro_TotP_Surface + Hydro_TotN_Surface + Hydro_Chla_Surface + Hydro_Secchi_Surface + Hydro_Temperature_Surface, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, rank = "BIC") 
# All models with dBIC < 2
#subset(min_model, delta <= 2)
model_POCsurf <- get.models(min_model, 1)[[1]]
summary (model_POCsurf)
# Plot model
# model diagnostics
plot(model_POCsurf)
plot_model(model_POCsurf, type = "diag")
# plot model estimates (neutral line; vertical intercept that indicates no effect)
plot_model(model_POCsurf, sort.est = TRUE)
#effektplot
plot_model(model_POCsurf, type = "slope", show.data = TRUE)
par(mfrow = c(1,3), mar = c(4,5,2,1))
visreg(model_POCsurf,  points = list(cex = 1))

tab_model (model_POCsurf)

#ggsave ("Figures_rapp/Regressions_POC_Surf.png", width = 8, height = 6, dpi=500)

if(save_plots){
  png("Figures_rapp/Regressions_Hydro_POCsurf.png", width = 18, height = 15, unit = "cm", res = 400)
  par(mfrow = c(2,2), mar = c(4,5,2,1), oma = c(0,3,0,0))
  visreg(model_POCsurf,  points = list(cex = 1), ylab = "")
  mtext("Surface POC concentration", 2, line = 1, outer = TRUE)
  dev.off()
}

### 1a POC Surface - DETRENDED (included year as response and held fixed in dredge)
full_model <- lm(Hydro_POC_Surface ~ River_TOC + River_SPM + Winter_NAO + Hydro_Salinity_Surface + Hydro_DIN_Surface + Hydro_PO4_Surface + Hydro_TotP_Surface + Hydro_TotN_Surface + Hydro_Chla_Surface + Hydro_Secchi_Surface + Hydro_Temperature_Surface + Year, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, fixed = "Year", rank = "BIC") 
# All models with dBIC < 2
#subset(min_model, delta <= 2)
model_POCsurf_yr <- get.models(min_model, 1)[[1]]
summary (model_POCsurf_yr)
tab_model (model_POCsurf_yr)

### 1b POC Intermediate
full_model <- lm(Hydro_POC_Intermediate ~ Hydro_POC_Surface + River_TOC + River_SPM + Winter_NAO + Hydro_Salinity_Intermediate + Hydro_DIN_Intermediate + Hydro_PO4_Intermediate + Hydro_TotP_Intermediate + Hydro_TotN_Intermediate + Hydro_Chla_Intermediate + Hydro_Secchi_Surface, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, rank = "BIC") 
model_POCmed <- get.models(min_model, 1)[[1]]
summary (model_POCmed)
# Plot model
# model diagnostics
plot(model_POCmed)
plot_model(model_POCmed, type = "diag")
# plot model estimates (neutral line; vertical intercept that indicates no effect)
plot_model(model_POCmed, sort.est = TRUE)
#effektplot
plot_model(model_POCmed, type = "slope", show.data = TRUE)
par(mfrow = c(1,2), mar = c(4,5,2,1))
visreg(model_POCmed,  points = list(cex = 1))

#ggsave ("Figures_rapp/Regressions_POC_Intermed.png", width = 8, height = 6, dpi=500)

tab_model (model_POCmed)

if(save_plots){
  png("Figures_rapp/Regressions_Hydro_POCmed.png", width = 18, height = 7.5, unit = "cm", res = 400)
  par(mfrow = c(1,3), mar = c(4,5,2,1), oma = c(0,3,0,0))
  visreg (model_POCmed,  points = list(cex = 1), ylab = "")
  mtext("Intermediate POC concentration", 2, line = 1, outer = TRUE)
  dev.off()
}

### 1b POC Intermed - DETRENDED (included year as response and held fixed in dredge)
full_model <- lm(Hydro_POC_Intermediate ~ Hydro_POC_Surface + River_TOC + River_SPM + Winter_NAO + Hydro_Salinity_Surface + Hydro_DIN_Surface + Hydro_PO4_Surface + Hydro_TotP_Surface + Hydro_TotN_Surface + Hydro_Chla_Surface + Hydro_Secchi_Surface + Hydro_Temperature_Surface + Year, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, fixed = "Year", rank = "BIC") 
# All models with dBIC < 2
#subset(min_model, delta <= 2)
model_POCmed_yr <- get.models(min_model, 1)[[1]]
summary (model_POCmed_yr)
tab_model (model_POCmed_yr)

### 2a PON Surface
full_model <- lm(Hydro_PON_Surface ~ River_TOC + River_SPM + Winter_NAO + Hydro_Salinity_Surface + Hydro_DIN_Surface + Hydro_PO4_Surface + Hydro_TotP_Surface + Hydro_TotN_Surface + Hydro_Chla_Surface + Hydro_Secchi_Surface + Hydro_Temperature_Surface, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, rank = "BIC") 
# All models with dBIC < 2
#subset(min_model, delta <= 2)
model_PONsurf <- get.models(min_model, 1)[[1]]
summary (model_PONsurf)
# Plot model
# model diagnostics
plot(model_PONsurf)
plot_model(model_PONsurf, type = "diag")
# plot model estimates (neutral line; vertical intercept that indicates no effect)
plot_model(model_PONsurf, sort.est = TRUE)
#effektplot
plot_model(model_PONsurf, type = "slope", show.data = TRUE)
par(mfrow = c(1,3), mar = c(4,5,2,1))
visreg(model_PONsurf,  points = list(cex = 1))

tab_model (model_PONsurf)

if(save_plots){
  png("Figures_rapp/Regressions_Hydro_PONsurf.png", width = 18, height = 7.5, unit = "cm", res = 400)
  par(mfrow = c(1,3), mar = c(4,5,2,1), oma = c(0,3,0,0))
  visreg (model_PONsurf,  points = list(cex = 1), ylab = "")
  mtext("Surface PON concentration", 2, line = 1, outer = TRUE)
  dev.off()
}

### 2b PON intermed
full_model <- lm(Hydro_PON_Intermediate ~ Hydro_PON_Surface + River_TOC + River_SPM + Winter_NAO + Hydro_Salinity_Intermediate + Hydro_DIN_Intermediate + Hydro_PO4_Intermediate + Hydro_TotP_Intermediate + Hydro_TotN_Intermediate + Hydro_Chla_Intermediate + Hydro_Secchi_Surface, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, rank = "BIC") 
# All models with dBIC < 2
#subset(min_model, delta <= 2)
model_PONmed <- get.models(min_model, 1)[[1]]

# Plot model
# model diagnostics
plot(model_PONmed)
plot_model(model_PONmed, type = "diag")
# plot model estimates (neutral line; vertical intercept that indicates no effect)
plot_model(model_PONmed, sort.est = TRUE)
#effektplot
plot_model(model_PONmed, type = "slope", show.data = TRUE)
par(mfrow = c(1,3), mar = c(4,5,2,1))
visreg(model_PONmed,  points = list(cex = 1))

tab_model (model_PONmed)

### 3a TSM Surface
full_model <- lm(Hydro_TSM_Surface ~ River_TOC + River_SPM + Winter_NAO + Hydro_Salinity_Surface + Hydro_DIN_Surface + Hydro_PO4_Surface + Hydro_TotP_Surface + Hydro_TotN_Surface + Hydro_Chla_Surface + Hydro_Secchi_Surface + Hydro_Temperature_Surface, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, rank = "BIC") 
model_TSMsurf <- get.models(min_model, 1)[[1]]
summary (model_TSMsurf)
# Plot model
# model diagnostics
plot(model_TSMsurf)
plot_model(model_TSMsurf, type = "diag")
# plot model estimates (neutral line; vertical intercept that indicates no effect)
plot_model(model_TSMsurf, sort.est = TRUE)
#effektplot
plot_model(model_TSMsurf, type = "slope", show.data = TRUE)
par(mfrow = c(1,2), mar = c(4,5,2,1))
visreg(model_TSMsurf,  points = list(cex = 1))

#ggsave ("Figures_rapp/Regressions_TSM_Surf.png", width = 8, height = 6, dpi=500)

tab_model (model_TSMsurf)

if(save_plots){
  png("Figures_rapp/Regressions_Hydro_TSMsurf.png", width = 18, height = 7.5, unit = "cm", res = 400)
  par(mfrow = c(1,3), mar = c(4,5,2,1), oma = c(0,3,0,0))
  visreg (model_TSMsurf,  points = list(cex = 1), ylab = "")
  mtext("Surface TSM concentration", 2, line = 1, outer = TRUE)
  dev.off()
}

### 32 TSM surf - DETRENDED (included year as response and held fixed in dredge)
full_model <- lm(Hydro_TSM_Surface ~ River_TOC + River_SPM + Winter_NAO + Hydro_Salinity_Surface + Hydro_DIN_Surface + Hydro_PO4_Surface + Hydro_TotP_Surface + Hydro_TotN_Surface + Hydro_Chla_Surface + Hydro_Secchi_Surface + Hydro_Temperature_Surface + Year, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, fixed = "Year", rank = "BIC") 
# All models with dBIC < 2
#subset(min_model, delta <= 2)
model_TSMsurf_yr <- get.models(min_model, 1)[[1]]
summary (model_TSMsurf_yr)
tab_model (model_TSMsurf_yr)

### 2a TSM Deep
full_model <- lm(Hydro_TSM_Deep ~ Hydro_TSM_Surface + River_TOC + River_SPM + Winter_NAO + Hydro_Salinity_Deep + Hydro_PO4_Deep + Hydro_TotP_Deep + Hydro_TotN_Deep + Hydro_Chla_Deep + Hydro_Secchi_Surface + Hydro_Temperature_Deep, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, rank = "BIC") 
model_TSMdeep <- get.models(min_model, 1)[[1]]
summary (model_TSMdeep)
# Plot model
# model diagnostics
plot(model_TSMdeep)
plot_model(model_TSMdeep, type = "diag")
# plot model estimates (neutral line; vertical intercept that indicates no effect)
plot_model(model_TSMdeep, sort.est = TRUE)
#effektplot
plot_model(model_TSMdeep, type = "slope", show.data = TRUE)
par(mfrow = c(1,3), mar = c(4,5,2,1))
visreg(model_TSMdeep,  points = list(cex = 1))

#ggsave ("Figures_rapp/Regressions_TSM_Deep.png", width = 8, height = 6, dpi=500)

tab_model (model_TSMdeep)

if(save_plots){
  png("Figures_rapp/Regressions_Hydro_TSMdeep.png", width = 18, height = 7.5, unit = "cm", res = 400)
  par(mfrow = c(1,3), mar = c(4,5,2,1), oma = c(0,3,0,0))
  visreg (model_TSMdeep,  points = list(cex = 1), ylab = "")
  mtext("Deep TSM concentration", 2, line = 1, outer = TRUE)
  dev.off()
}


### 1a Salinity Surface
full_model <- lm(Hydro_Salinity_Surface ~ River_Discharge + Winter_NAO, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, rank = "BIC") 
model_SALsurf <- get.models(min_model, 1)[[1]]
summary (model_SALsurf)
# Plot model
# model diagnostics
plot(model_SALsurf)
plot_model(model_SALsurf, type = "diag")
# plot model estimates (neutral line; vertical intercept that indicates no effect)
plot_model(model_SALsurf, sort.est = TRUE)
#effektplot
plot_model(model_SALsurf, type = "slope", show.data = TRUE)
par(mfrow = c(1,2), mar = c(4,5,2,1))
visreg(model_SALsurf,  points = list(cex = 1))

#ggsave ("Figures_rapp/Regressions_Hydro_Salinity_SALsurf.png", width = 8, height = 6, dpi=500, ylab = "", mtext("Surface Salinity"))

tab_model (model_SALsurf)
?mtext
if(save_plots){
  png("Figures_rapp/Regressions_Hydro_SALsurf.png", width = 13, height = 7.5, unit = "cm", res = 400)
  par(mfrow = c(1,3), mar = c(4,5,2,1), oma = c(0,3,0,0))
  visreg (model_SALsurf,  points = list(cex = 1), ylab = "")
  mtext("Surface Salinity", 2,line = 1, outer = TRUE)
  dev.off()
}


### 1a Salinity Deep
full_model <- lm(Hydro_Salinity_Deep ~ River_Discharge + Winter_NAO, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, rank = "BIC") 
model_SALdeep <- get.models(min_model, 1)[[1]]
summary (model_SALdeep)
# Plot model
# model diagnostics
plot(model_SALdeep)
plot_model(model_SALdeep, type = "diag")
# plot model estimates (neutral line; vertical intercept that indicates no effect)
plot_model(model_SALdeep, sort.est = TRUE)
#effektplot
plot_model(model_SALdeep, type = "slope", show.data = TRUE)
par(mfrow = c(1,3), mar = c(4,5,2,1))
visreg(model_SALsurf,  points = list(cex = 1))

tab_model (model_SALdeep)

### Chla Surface
full_model <- lm(Hydro_Chla_Surface ~ Hydro_Salinity_Surface + Hydro_DIN_Surface + Hydro_PO4_Surface + Hydro_Si_Surface + River_TOC + River_SPM + Winter_NAO + Hydro_TotP_Surface + Hydro_TotN_Surface + Hydro_Secchi_Surface, data = dat_sel) 
summary(full_model)
# finne min model
min_model <- dredge (full_model, rank = "BIC") 
model_CHLAsurf <- get.models(min_model, 1)[[1]]
summary (model_CHLAsurf)
# Plot model
# model diagnostics
plot(model_CHLAsurf)
plot_model(model_CHLAsurf, type = "diag")
# plot model estimates (neutral line; vertical intercept that indicates no effect)
plot_model (model_CHLAsurf, sort.est = TRUE)
#effektplot
plot_model(model_CHLAsurf, type = "slope", show.data = TRUE)
par(mfrow = c(1,3), mar = c(4,5,2,1))
visreg(model_CHLAsurf,  points = list(cex = 1))

tab_model (model_SALsurf)

# relevant to add wind here
# River discharge more explanatory power than NAO??? just that they change in same direction??
full_model <- glm(Hydro_Temperature_Deep ~ Winter_NAO + River_Discharge, data = dat_sel, family = Gamma(link = "identity"))
summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)
par(mfrow = c(2,2), mar = c(4,5,2,1))
plot(step_model)
visreg(step_model, points = list(cex = 1))

```
### Linear models - OLD using step AIC
```{r}
### b. Surface Total N
full_model <- lm(TotN_Surface ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TotN + River_Distant_TotN, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)

### c. Surface Total P
full_model <- lm(TotP_Surface ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TotP + River_Distant_TotP, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)

### d. Surface TSM
full_model <- lm(TSM_Surface ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TOC + River_Distant_TOC, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)
par(mfrow = c(2,2), mar = c(4,5,2,1))
plot(step_model)
visreg(step_model, points = list(cex = 1))

### e. Siktedyp
full_model <- lm(Secchi_Surface ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TotN + River_Distant_TotN + River_Local_TotP + River_Distant_TotP + 
                 River_Local_TOC + River_Distant_TOC, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)
visreg(step_model, points = list(cex = 1))

## 3. Hydrographic data, intermediate

### a. Intermediate salinity
full_model <- lm(Salinity_Intermediate ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)

### b. Intermediate Total N
full_model <- lm(TotN_Intermediate ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TotN + River_Distant_TotN, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)

### c. Intermediate Total P
full_model <- lm(TotP_Intermediate ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TotP + River_Distant_TotP, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)

### d. Intermediate TSM
full_model <- lm(TSM_Intermediate ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TOC + River_Distant_TOC, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)
# par(mfrow = c(2,2), mar = c(4,5,2,1))
# plot(step_model)
par(mfrow = c(2,2), mar = c(4,5,2,1))
visreg(step_model, points = list(cex = 1))

## 4. Hydrographic data, deep

### a. Deep salinity
full_model <- lm(Salinity_Deep ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)

### b. Deep Total N
full_model <- lm(TotN_Deep ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TotN + River_Distant_TotN, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)

### c. Deep Total P
full_model <- lm(TotP_Deep ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TotP + River_Distant_TotP, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)
par(mfrow = c(2,2), mar = c(4,5,2,1))
visreg(step_model, points = list(cex = 1))

### d. Deep TSM
full_model <- lm(TSM_Deep ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TOC + River_Distant_TOC, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)
# par(mfrow = c(2,2), mar = c(4,5,2,1))
# plot(step_model)
visreg(step_model, points = list(cex = 1))
```

### GLM models - using glmulti
* Decided to go with lm (above) because no real improvement in diagnostics with glm
```{r}
str(dat_sel)

# Taken example from here:
# https://rstudio-pubs-static.s3.amazonaws.com/2897_9220b21cfc0c43a396ff9abf122bb351.html


#?glmulti
# explanation of terms
#level = 1,# No interaction considered
#method = "h",# Exhaustive approach
#crit = "bic",# BIC as criteria
#confsetsize = 3,# Keep 3 best models
#plotty = F, report = F,  # No plot or interim reports
#fitfunction = "glm",     # glm function
#family = Gamma(link = "identity"))# gamma family

# using AIC (min model includes terms that are not significant....)
POC_aic <-
  glmulti(Hydro_POC_Surface ~ River_TOC + River_SPM + Winter_NAO + Hydro_Salinity_Surface + Hydro_DIN_Surface + Hydro_PO4_Surface + Hydro_TotP_Surface + Hydro_TotN_Surface + Hydro_Chla_Surface + Hydro_Secchi_Surface,
          data = dat_sel, level = 1,method = "h", crit = "aic",confsetsize = 3,plotty = F, report = F,fitfunction = "glm", family = Gamma(link = "identity"))

# using BIC (less terms retained in min model)
POC_bic <-
  glmulti(Hydro_POC_Surface ~ River_TOC + River_SPM + Winter_NAO + Hydro_Salinity_Surface + Hydro_DIN_Surface + Hydro_PO4_Surface + Hydro_TotP_Surface + Hydro_TotN_Surface + Hydro_Chla_Surface + Hydro_Secchi_Surface,
          data = dat_sel, level = 1,method = "h", crit = "bic",confsetsize = 3,plotty = F, report = F,fitfunction = "glm", family = Gamma(link = "identity"))

## Show result for the best model
#summary(POC_aic@objects[[1]])
summary(POC_bic@objects[[1]])

#### NB! diagnostics does not look very good... neither does effect plot.
# is glm/gamma dist not the right model here, or can you see any errors in specifying the models above?
# diagnostic plot
plot_model(POC_bic@objects[[1]], type = "diag")
# plot model estimates (neutral line; vertical intercept that indicates no effect)
plot_model(POC_bic@objects[[1]], sort.est = TRUE)
#effektplot
plot_model(POC_bic@objects[[1]], type = "slope", show.data = TRUE)

#ggsave ("Figures_rapp/.png", width = 10, height = 10, dpi=500)


```


# PART TWO - Biologigal data

We're trying to find meaningful ways to describe the relationship between biological paramteres and environmental variables, and connect this to changes that have happened over time.

** Tentative approach: **

* Plot environmental vectors on ordination plots
* Use this to generate hypotheses about causal links
* Test by using models 
  + Probably different models for every system
  + Mixed effects for hard bottom (station as random)
  + GLM for soft bottom (one model per station for DCA at least) and plankton
* Probably makes sense to use the same set of explanatory variables in model selection for every response witin a system
* Should perhaps add some variables as factors.

## Correlation between envpar
```{r correlations envvar, fig.height=15, fig.width=15}
# check correlations Hydro
dat_sel %>%
#  dplyr::select(-ends_with("_Deep")) %>%  # remove Deep 
  dplyr::select(starts_with("Hydro")) %>% 
  rename_at(.vars = vars(starts_with("Hydro_")), .funs = funs(sub("Hydro_", "", .))) %>%
  cor() %>% 
  corrplot.mixed(tl.pos = "l", tl.cex = 0.75, number.cex = 0.75)

dat_sel %>%
#  dplyr::select(-ends_with("_Deep")) %>%  # remove Deep 
  dplyr::select(starts_with("Hydro")) %>% 
  rename_at(.vars = vars(starts_with("Hydro_")), .funs = funs(sub("Hydro_", "", .))) %>%
  cor() %>% 
  corrplot(type="upper", tl.col = "gray17", tl.cex = 1.2)

```

```{r, fig.height=5, fig.width=5}
# check correlations River
dat_sel %>%
#  dplyr::select(-ends_with("_Deep")) %>%  # remove Deep 
  dplyr::select(starts_with("River")) %>% 
  rename_at(.vars = vars(starts_with("River_")), .funs = funs(sub("River_", "", .))) %>%
  cor() %>% 
  corrplot.mixed(tl.pos = "d", tl.cex=0.7, tl.col="gray17")
```


```{r select envvars}
# fjerner enkelte av parameterne i par med korr > 0.8 (velger typisk intermediate over surface-verdier)
utgar <- c("Hydro_DIN_Surface", "Hydro_POC_Surface", "Hydro_PON_Surface", "Hydro_Si_Surface", "Hydro_Si_Deep", "Hydro_Temperature_Surface", "Hydro_Temperature_Deep", "Hydro_TotP_Deep", "Hydro_TotN_Surface","Hydro_O2_Intermediate") #, "River_TOC", "River_TotN"
# PO4 intermediate og deep skal kanskje også ut...
# Fjerner River TOC og River TotN som er sterkt korrelert med Discharge ++
# Fjerner også SM pga korr med Discharge
# Eller tar bort alt av River

dat_sel_mod <-
  dat_sel %>% 
  dplyr::select(-one_of(utgar)) %>% 
  dplyr::select(-contains("PO4")) %>%
  dplyr::select(-contains("River")) 
#  dplyr::select(-contains("TSM")) tar med denne når River er borte
  
names(dat_sel_mod)
```

## Hard bottom
### Environmental variables in relation to ordination
```{r fitting environmental variables to ordination}
dim(dat_sel_mod)

# retrieving HB parameters
HBdf <- readr::read_csv2("Datasett/Hardbunn_KOPI/HBanalysesett.csv")
names(HBdf)

# coupling the two using dplyr, by Year
HBdat <- # joining and removing phyto from the nix
  HBdf %>% 
    filter(Site %in% c(407, 410)) %>% # filter on stations 407 and 410
    left_join(dat_sel_mod, by = "Year") %>% # join
    dplyr::select(-starts_with("Plankton_")) # remove phytopl

DCAhb <- HBdat %>% 
  dplyr::select(DCA1:DCA4)

envHB <- HBdat %>% 
  dplyr::select(Hydro_Chla_Deep:Winter_NAO)

envfitHB <- envfit(DCAhb, envHB, na.rm = TRUE, permutations = 999, choices = 1)
envfitHB # gives an indication of which parameters may be important for community responses along dca 1, 12 obs fjernes pga NA
str(envfitHB)

#plot(DCAhb[1:2])
#plot(envfitHB, p.max = 0.05, add = TRUE) # plotting parameters with p-value < 0.01 in relation to ordination. Can make this into a ggplot if needed

HBpar <- names(which(envfitHB$vectors$pvals < 0.05)) # names of parameters with p-value < 0.01
# HBpar <- names(envHB) # all env

# check selection for missing values
HBdat %>%
  summarise_all(funs(sum(is.na(.))))

```

### DCA - possible causal links
```{r fitting model for DCA}

# For use as random factor
HBdat$Site <- as.character(HBdat$Site)

HBfullformula <- as.formula(paste("DCA1 ~ ", paste(HBpar, collapse = " + "), "+ (1|Site)", sep = ""))
HBfullformula

# Standarizing
# https://cran.r-project.org/web/packages/standardize/vignettes/using-standardize.html 

sHB <- standardize(HBfullformula, HBdat)

# Fullmodell
HBlmer <- lmer(sHB$formula, sHB$data) # med standardiserte data
# HBlmer <- lmer(HBfullformula, data = HBdat) # uten standardisering

# Model selection
# https://cran.r-project.org/web/packages/lmerTest/lmerTest.pdf 

HBlmer_select <- lmerTest::step(HBlmer)
str(HBlmer_select)

HBlmer_best <- get_model(HBlmer_select)

plot(HBlmer_best)

# a bit worried about this approach... Dag?

# Visualisation
# https://cran.r-project.org/web/packages/sjPlot/vignettes/plot_model_estimates.html
# https://cran.r-project.org/web/packages/sjPlot/vignettes/plot_marginal_effects.html

# Modellplot
plot_model(HBlmer_best, type = "diag")
plot_model(HBlmer_best, type="est", sort.est = TRUE)
ggsave ("Figures_rapp/ForestEffect_HBmodDCA.png", width = 6, height = 4, dpi=500)
# effektplot kommer i neste bolk
# kanksje må vi vurdere interaksjoner?
```

```{r effect plot, fig.height = 10}
plot_model(HBlmer_best, type = "slope", show.data = TRUE)
ggsave ("Figures_rapp/ForestEffect_HBmodDCA.png", width = 10, height = 10, dpi=500)
```

```{r effect table, fig.height = 10}
tab_model(HBlmer_best, show.est = FALSE, show.intercept = FALSE)
```

```{r visualization of effects}
# henter ut parameterestimater (skalert)
fixedHB <- fixef(HBlmer_best)
df.fixedHB <- data.frame(fixedHB, seq(1:length(fixedHB)))
names(df.fixedHB)[2] <- "index"
rm(fixedHB)

df.fixedHB <- df.fixedHB[-1, ] # gidder ikke styre med å fjerne intercept under

# plot
DCAplot_effects <- df.fixedHB %>%
  ggplot(aes(index, HBfixed, color = as.character(Site))) +
    geom_segment(data = df.fixedHB[df.fixedHB$fixedHB < 0, ], aes(x=index, xend=index, y=0, yend=fixedHB), 
                 linetype="solid", 
                 arrow = arrow(length = unit(0.2, "cm"), type="closed"),
                 size = 0.5, col = "tomato2") +
    geom_segment(data = df.fixedHB[df.fixedHB$fixedHB > 0, ], aes(x=index, xend=index, y=0, yend=fixedHB), 
                 linetype="solid", 
                 arrow = arrow(length = unit(0.2, "cm"), type="closed"),
                 size = 0.5, col = "cadetblue") +
    geom_text(data = df.fixedHB, 
              aes(x = index, y = fixedHB*1.1, label = row.names(df.fixedHB)),  
                color = "gray17", size = 3) +
    labs(y = "Effects") +
    xlim(-3, 12) +
    theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank())

DCAplot_dots <- HBdat %>% 
  ggplot(aes(Year, DCA1, color = as.character(Site))) +
    geom_point(alpha = 0.5) +
    geom_smooth() +
    scale_color_hue(labels = c("HT113", "HR104")) +
    labs(color = "Station") +
    theme(axis.title.x=element_blank(),
#        axis.text.x=element_blank(),
#        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank()) 

grid.arrange(DCAplot_dots, DCAplot_effects, nrow=1, widths = c(1,1))

```


## Soft bottom
### Environmental variables in relation to ordination
```{r fitting environmental variables to ordination}

# retrieving BB ordinations
BB05df <- readr::read_csv2("Data_produced/BBordSites_05.csv")
names(BB05df)
BB35df <- readr::read_csv2("Data_produced/BBordSites_35.csv")
names(BB35df)

# coupling the two using dplyr, by Year
BB05_dat <- BB05df %>% # joining and removing phyto from the mix
    left_join(dat_sel_mod, by = "Year")

DCA05 <- BB05_dat %>% 
  dplyr::select(DCA1:DCA4)

DCA35 <- BB35df %>% 
  dplyr::select(DCA1:DCA4)

envBB <- BB05_dat %>% 
  dplyr::select(Hydro_Chla_Deep:Winter_NAO)


BB35_dat <- bind_cols(BB35df[,-1], envBB)

envfitBB05 <- envfit(DCA05, envBB, na.rm = TRUE, permutations = 999, choices = 1:4)
envfitBB05 # gives an indication of which parameters may be important for community responses 
# 2 obs fjernes pga NA
envfitBB35 <- envfit(DCA35, envBB, na.rm = TRUE, permutations = 999, choices = 1:4)
envfitBB35 
# 2 obs fjernes pga NA

#plot(DCA05[1:2])
#plot(envfitBB05, p.max = 0.05, add = TRUE)
#plot(DCA35[1:2])
#plot(envfitBB35, p.max = 0.05, add = TRUE)

# plotting parameters with p-value < 0.01 in relation to ordination. Can make this into a ggplot if needed

 BB05par <- names(which(envfitBB05$vectors$pvals < 0.05)) # names of parameters with p-value < 0.05
 BB35par <- names(which(envfitBB35$vectors$pvals < 0.05)) # names of parameters with p-value < 0.05

# BB05par <- names(envBB) # all parameters
# BB35par <- names(envBB) # all parameters

```

### DCA - possible causal links
```{r fitting model for DCA}

# Standarizing
# https://cran.r-project.org/web/packages/standardize/vignettes/using-standardize.html 

BB05_fullformula <- as.formula(paste("DCA1 ~ ", paste(BB05par, collapse = " + "), sep = ""))
BB35_fullformula <- as.formula(paste("DCA1 ~ ", paste(BB35par, collapse = " + "), sep = ""))

# med lag
BB05_fullformula_dyn <- as.formula(paste("DCA1 ~ L(", paste(BB05par, collapse = ", 2) + L("), ", 2)", sep = ""))
BB35_fullformula_dyn <- as.formula(paste("DCA1 ~ L(", paste(BB35par, collapse = ", 2) + L("), ", 2)", sep = ""))


BB05_fullformula

sBB05 <- standardize(BB05_fullformula, BB05_dat)
sBB35 <- standardize(BB35_fullformula, BB35_dat)

# Model and selection
n1 = length(BB05_dat$DCA1)
BB05_lm <- lm(sBB05$formula, sBB05$data)
BB05_select <- stepAIC(BB05_lm, direction = "both", trace = 0, k = log(n1)) # BIC
BB05_select

n2 = length(BB35_dat$DCA1)
BB35_lm <- lm(sBB35$formula, sBB35$data)
BB35_select <- stepAIC(BB35_lm, direction = "both", trace = 0, k = log(n2)) # BIC
BB35_select

# With lag-effects?
n1 = length(BB05_dat$DCA1)
BB05_lm <- dynlm(BB05_fullformula_dyn, BB05_dat)
BB05_select <- stepAIC(BB05_lm, direction = "both", trace = 0, k = log(n1)) # BIC
BB05_select

n2 = length(BB35_dat$DCA1)
BB35_lm <- dynlm(BB35_fullformula_dyn, BB35_dat)
BB35_select <- stepAIC(BB35_lm, direction = "both", trace = 0, k = log(n2)) # BIC
BB35_select

# Modellplot BB05
plot_model(BB05_lm, type = "diag")
plot_model(BB05_lm, sort.est = TRUE, title = "DCA - BR1")
ggsave ("Figures_rapp/ForestEffect_BB05DCA.png", width = 6, height = 4, dpi=500)

# Modellplot BB35
plot_model(BB35_lm, type = "diag")
plot_model(BB35_lm, sort.est = TRUE, title = "DCA - BT44")
ggsave ("Figures_rapp/ForestEffect_BB35DCA.png", width = 6, height = 4, dpi=500)

# effektplot kommer i neste bolk
# kanksje må vi vurdere interaksjoner?
```

```{r effect plot, fig.height = 10}
plot_model(BB05_lm, type = "slope", show.data = TRUE, title = "DCA - BR1")
ggsave ("Figures_rapp/ForestEffect_BB05DCA.png", width = 10, height = 10, dpi=500)

plot_model(BB35_lm, type = "slope", show.data = TRUE, title = "DCA - BT44")
ggsave ("Figures_rapp/ForestEffect_BB05DCA.png", width = 10, height = 10, dpi=500)
```

```{r effect table, fig.height = 10}
tab_model(BB05_lm, BB35_lm)
```

```{r visualization of effects}
# henter ut parameterestimater (skalert)
fixedBB <- coefficients(BB05_lm)
df.fixedBB <- data.frame(fixedBB, seq(1:length(fixedBB)))
names(df.fixedBB)[2] <- "index"
rm(fixedBB)

df.fixedBB <- df.fixedBB[-1, ] # gidder ikke styre med å fjerne intercept under

# plot
DCAplot_effects <- df.fixedBB %>%
  ggplot(aes(index, fixedBB)) +
    geom_segment(data = df.fixedBB[df.fixedBB$fixedBB < 0, ], aes(x=index, xend=index, y=0, yend=fixedBB), 
                 linetype="solid", 
                 arrow = arrow(length = unit(0.2, "cm"), type="closed"),
                 size = 0.5, col = "tomato2") +
    geom_segment(data = df.fixedBB[df.fixedBB$fixedBB > 0, ], aes(x=index, xend=index, y=0, yend=fixedBB), 
                 linetype="solid", 
                 arrow = arrow(length = unit(0.2, "cm"), type="closed"),
                 size = 0.5, col = "cadetblue") +
    geom_text(data = df.fixedBB, 
              aes(x = index, y = fixedBB*1.1, label = row.names(df.fixedBB)),  
                color = "gray17", size = 3) +
    labs(y = "Effects") +
    xlim(-3, 12) +
    theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank())

DCAplot_dots <- BB05_dat %>% 
  ggplot(aes(Year, DCA1)) +
    geom_point(alpha = 0.5) +
    geom_smooth() +
    theme(axis.title.x=element_blank(),
#        axis.text.x=element_blank(),
#        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank()) 

grid.arrange(DCAplot_dots, DCAplot_effects, nrow=1, widths = c(1,1))

```

### NQI?
```{r NQI?}
df_BB_indeks <- read_excel("Datasett/Bløtbunn/Klimaoverblikk bløtbunn_data til Helene og Dag.xlsx", sheet = "indekser_sedimentparametere")

head(df_BB_indeks)

df_NQI_mean <- df_BB_indeks %>% 
  group_by(STAS, Year) %>%
  summarise(Mean = mean(NQI1, na.rm = TRUE))

head(df_NQI_mean)

ggplot(df_NQI_mean, aes(Year, Mean, colors = STAS))+
  geom_point()

# to ulike modeller kanskje

names(envBB)

BB05_datNQI <- df_NQI_mean %>% 
  filter(STAS=="B05") %>% 
  left_join(dat_sel_mod, by = "Year")

BB35_datNQI <- df_NQI_mean %>% 
  filter(STAS=="B35") %>% 
  left_join(dat_sel_mod, by = "Year")

# Standarizing - kjører her inn alle parametere
BB05_NQI_fullformula <- as.formula(paste("Mean ~ ", paste(BB05par, collapse = " + "), sep = ""))
BB35_NQI_fullformula <- as.formula(paste("Mean ~ ", paste(BB35par, collapse = " + "), sep = ""))

sBB05_NQI <- standardize(BB05_NQI_fullformula, BB05_datNQI)
sBB35_NQI <- standardize(BB35_NQI_fullformula, BB35_datNQI)

# Model and selection
n1 = length(BB05_datNQI$Mean)
BB05NQI_lm <- lm(sBB05_NQI$formula, sBB05_NQI$data)
BB05NQI_select <- stepAIC(BB05NQI_lm, direction = "both", trace = 0, k = log(n1)) # BIC
BB05NQI_select

n2 = length(BB35_datNQI$Mean)
BB35NQI_lm <- lm(sBB35_NQI$formula, sBB35_NQI$data)
BB35NQI_select <- stepAIC(BB35NQI_lm, direction = "both", trace = 0, k = log(n2)) # BIC
BB35NQI_select

# With lag-effects
n1 = length(BB05_datNQI$Mean)
BB05NQI_lm <- dynlm(BB05_NQI_fullformula, BB05_datNQI)
BB05NQI_select <- stepAIC(BB05NQI_lm, direction = "both", trace = 0, k = log(n1)) # BIC
BB05NQI_select

n2 = length(BB35_datNQI$Mean)
BB35NQI_lm <- dynlm(BB35_NQI_fullformula, BB35_datNQI)
BB35NQI_select <- stepAIC(BB35NQI_lm, direction = "both", trace = 0, k = log(n2)) # BIC
BB35NQI_select


# Modellplot BB05
plot_model(BB05NQI_lm, type = "diag")
plot_model(BB05NQI_lm, sort.est = TRUE, title = "NQI - BR1")
ggsave ("Figures_rapp/ForestEffect_BB05NQI.png", width = 6, height = 4, dpi=500)

# Modellplot BB35
plot_model(BB35NQI_lm, type = "diag")
plot_model(BB35NQI_lm, sort.est = TRUE, title = "NQI - BT44")
ggsave ("Figures_rapp/ForestEffect_BB35NQI.png", width = 6, height = 4, dpi=500)

# effektplot kommer i neste bolk
# kanksje må vi vurdere interaksjoner?
```

```{r effect table, fig.height = 10}
tab_model(BB05NQI_lm, BB35NQI_lm, show.est = FALSE, show.intercept = FALSE)
```

## Plankton - Bloom 2
### Environmental variables in relation to ordination
```{r fitting environmental variables to ordination}

# retrieving BB ordinations
P02df <- readr::read_csv2("Data_produced/P02ord.csv")
names(P02df)


# coupling the two using dplyr, by Year
P02_dat <- # joining and removing phyto from the mix
  P02df %>% 
    left_join(dat_sel, by = "Year") %>% # join
    dplyr::select(-ends_with("_med")) # remove phytopl

DCA02 <- P02_dat %>% 
  dplyr::select(DCA1:DCA4)

envP02 <- P02_dat %>% 
  dplyr::select(Hydro_Chla_Surface:Winter_NAO)

envfitP02 <- envfit(DCA02, envP02, na.rm = TRUE, permutations = 999, choices = 1:4)
envfitP02 # gives an indication of which parameters may be important for community responses 

plot(DCA01[1:2])
plot(envfitP02, p.max = 0.05, add = TRUE)

# plotting parameters with p-value < 0.01 in relation to ordination. Can make this into a ggplot if needed
# for plankton it looks like we might need to something about the ordination

P02par <- names(which(envfitP02$vectors$pvals < 0.05)) # names of parameters with p-value < 0.05
P02_allpar <- names(envP02)

```

### DCA - possible causal links
```{r fitting model for DCA}

# Standarizing
# https://cran.r-project.org/web/packages/standardize/vignettes/using-standardize.html 

P02_fullformula <- as.formula(paste("DCA1 ~ ", paste(P02_allpar, collapse = " + "), sep = ""))

sP02 <- standardize(P02_fullformula, P02_dat)

# Model and selection
P02_lm <- lm(sP02$formula, sP02$data)
P02_select <- stepAIC(P02_lm, direction = "both") # check envvar for missing values - will this be fixed?
P02_select

summary(P02_select)
summary(P02_lm) # ingen signifikante sammenhenger

# tror vi må starte modellseleksjon fra bunnen av på hele parametersettet. tviler kanskje litt på om vi får noe ut av det, men er vel verdt et forsøk? Dag?

```

### Dags modeller fra tidligere med planktongrupper
# hvilken type modell kan brukes, glm her også? med annen distribusjon?
```{r}
full_model <- lm(Kiselalger_med ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TotN + River_Distant_TotN + River_Local_TotP + River_Distant_TotP + 
                 River_Local_TOC + River_Distant_TOC, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)
par(mfrow = c(2,2), mar = c(4,5,2,1))
visreg(step_model, points = list(cex = 1))


### b. Dinoflagellater
full_model <- lm(Dinoflagellater_med ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TotN + River_Distant_TotN + River_Local_TotP + River_Distant_TotP + 
                 River_Local_TOC + River_Distant_TOC, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)
par(mfrow = c(2,2), mar = c(4,5,2,1))
visreg(step_model, points = list(cex = 1))

### c. Flagellater
full_model <- lm(Dinoflagellater_med ~ River_Local_Discharge + River_Distant_Discharge + Winter_NAO +
                 River_Local_TotN + River_Distant_TotN + River_Local_TotP + River_Distant_TotP + 
                 River_Local_TOC + River_Distant_TOC, data = dat)
# summary(full_model)
step_model <- stepAIC(full_model, direction = "both", trace = FALSE)
summary(step_model)
par(mfrow = c(2,2), mar = c(4,5,2,1))
visreg(step_model, points = list(cex = 1))
```