02_Organise_soft_bottom_for_PCA.Rmd

---
title: "Ordinations for soft-bottom, hard-bottom and plankton data"
output: 
  html_document:
    keep_md: true
    toc: true
    toc_depth: 3
    toc_float: true
    code_folding: hide
    df_print: paged    
---


## 0. Libraries
```{r}
library(readxl)
#install.packages("tidyverse")
library(tidyverse)
#install.packages("vegan")
library(vegan)
require(pander)
#install.packages("RCurl")
library(RCurl)

require(patchwork)

```

### Function for finding text in R files  
```{r}

search_text <- function(string, dir = ".", extension = c("R","Rmd"), pattern = NULL, fixed = FALSE, ignore.case = TRUE, deep = FALSE){
  if (!is.null(extension) & !is.null(pattern))
    warning("Pattern will take predence over extension")
  if (is.null(pattern)) {
    ext <- paste(extension, collapse = "|")
    pattern <- paste0("\\.(", paste(extension, collapse = '|'), ")$")
	}
  fn <- list.files(path = dir, pattern = pattern, full.names = TRUE, recursive = deep)
  search <- plyr::alply(fn, 1, function(x) grep(string,  readLines(x, warn = FALSE), fixed = fixed, ignore.case = ignore.case), .progress = "text")
  fn[plyr::laply(search, length) > 0]	
  }
```

## 1. Check organisation/analysis of hard-bottom data from Guri's scripts   
### a. Content of folders (see code)  
```{r, eval = FALSE}
dir("Datasett/hardbunn_kopi")
dir("Datasett/hardbunn_kopi/r workspace")
#
# Used this to explore where different data sets occur:
# search_text("ord1.df", "Datasett/hardbunn_kopi/r workspace")
# 
```

### b. Read 'transekt.df' which Guri used for ordination  
Does not include ord1.df, which is used in 'Klimaoverblikk_hardbunn.Rmd', but does include a lot of other stuff...  
Order of Guri's scripts:  
1) HBdata.R: HBdata -> Hbuse  
2) HBOrdinasjon.R: Hbuse -> HBagg -> transekt.df -> SiteSpec -> spec.m -> ordinasjon -> ord1.df  
```{r, eval = FALSE}
load("Datasett/hardbunn_kopi/r workspace/.RData")

# Let us delete everything except transekt.df
obj <- ls()
rm(list = obj[!obj %in% c("transekt.df")])

str(transekt.df)
```

### c. Create data for ordination ('spec.m') and perform ordination   
From Guri's code in 'HBOrdinasjon.R', except the first part     
* 'transekt.df' is in 'long format'  
* Is reshaped to broad format ('SiteSpec'). *Note* that this one is later combined with DCA output to create 'ord1.df'   
* For the ordination, the year and site columns are removed and NA -> 0 ('spec.m')   
```{r, eval = FALSE}
# table(transekt.df$Sitename)
transekt.df <- transekt.df %>%
  mutate(Site = substr(Sitename, 1,3), Year = as.numeric(substr(Sitename, 5,8))) %>%
  select(Site, Year, Species, Value)
# names(transekt.df) = c("Site", "Year", "Species", "Value")

# gjor om til vidt format
SiteSpec = reshape(transekt.df, idvar = c("Site", "Year"), timevar = "Species", direction = "wide")
# names(SiteSpec)

# velger kun stasjonene med lengre tidsserie (02.11.2018) 407 og 410
SiteSpec = subset(SiteSpec, SiteSpec$Site %in% c("407", "410"))

# Endrer navnene
artsnavn = names(SiteSpec)[-c(1:2)]
artsnavn = gsub("Value.", "", artsnavn)
artsnavn = gsub(" \n", "", artsnavn)
# artsnavn

# artsmatrise
spec.m = SiteSpec[, -c(1:2)]
names(spec.m) = artsnavn

# må erstatte NA med 0
spec.m[is.na(spec.m)] = 0

cat("Dimensions of 'spec.m':\n", dim(spec.m))
```

### d. Perform ordination
```{r, eval = FALSE}
# Transekt
ord1 <- decorana(spec.m, iweigh = 1) # vekter ned sjeldne arter

# Pretty longish output
# summary(ord1) # viser arts- og rutescorer

# Short output
summary(ord1, display = "none")

# Innledende NMDS - litt usikker på utfallet 
ord2 <- metaMDS(spec.m)


ord2

```
### e. Test plot 1
```{r, eval = FALSE}
plot(ord1) # sorte sirkler er sites, røde pluss er arter
```

### f. Test plot 2
```{r, eval = FALSE}
plot(ord1, display = "sites", type = "n")
points(ord1, display = "sites", pch = 21, col = "red", bg = "yellow")
```

### g. Make data set for the plots done in 'Klimaoverblikk_hardbunn.Rmd'  
```{r, eval = FALSE}
# trekker ut akseskorer så det kan kobles til stasjon og år... i.e. Sitename
ord1.sites <- data.frame(scores(ord1, display = "sites"))
ord1.df <- data.frame(SiteSpec[,c(1:2)], ord1.sites)

#str(ord1.df)

# NMDS
# trekker ut akseskorer så det kan kobles til stasjon og år... i.e. Sitename
ord2.sites <- data.frame(scores(ord2, display = "sites"))
ord2.df <- data.frame(SiteSpec[,c(1:2)], ord2.sites) 

```

### h. Redone plots done in 'Klimaoverblikk_hardbunn.Rmd'   
* *BUT* they look a bit different... so the input data are probably not identical  
```{r, eval = FALSE}
# Innledende DCA - litt usikker på utfallet  
DCAplot.1 <- ggplot(ord1.df, aes(DCA1, DCA2, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  facet_wrap(~ Site, nrow=2) +
  labs(color = "År")

DCAplot.2 <- ggplot(ord1.df, aes(DCA3, DCA4, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  facet_wrap(~ Site, nrow=2) +
  labs(color = "År")

DCAplot.1
DCAplot.2

# DCA aksene mot år
DCAplot_3 <- ord1.df %>%
  gather("DCA_axis", "Value", DCA1:DCA4) %>%
  ggplot(aes(Year, Value)) +
    geom_line() +
    facet_grid(Site~DCA_axis)

DCAplot_3

# Innledende NMDS

# NMDS aksene mot hverandre, fargekodet for år
NMDSplot_1 <- ggplot(ord2.df, aes(NMDS1, NMDS2, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  facet_wrap(~ Site, nrow=2) +
  labs(color = "År")


NMDSplot_1

# NMDS aksene mot år
NMDSplot_2 <- ord2.df %>%
  gather("NMDS_axis", "Value", NMDS1:NMDS2) %>%
  ggplot(aes(Year, Value)) +
    geom_line() +
    facet_grid(Site~NMDS_axis)

NMDSplot_2
```

### i. FOR USE: Loading data from HBanalysesett.csv and plotting. Faster, and that way the NMDS-results won't be different each time the script is run  
```{r}
HBdata <- read.csv("Datasett/Hardbunn_KOPI/HBanalysesett.csv", sep=";", stringsAsFactors = FALSE, dec=",")
HBdata_long <- subset(HBdata, HBdata$Site %in% c(407,410))

# DCA - transektdata, fargekodet for år  
DCAplot.1 <- ggplot(HBdata_long, aes(DCA1, DCA2, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  facet_wrap(~ Site, nrow=2) +
  labs(color = "År")

DCAplot.2 <- ggplot(HBdata_long, aes(DCA3, DCA4, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  facet_wrap(~ Site, nrow=2) +
  labs(color = "År")

DCAplot.1
DCAplot.2

# DCA aksene mot år
DCAplot_3 <- HBdata_long %>%
  gather("DCA_axis", "Value", DCA1:DCA4) %>%
  ggplot(aes(Year, Value)) +
    geom_line() +
    facet_grid(Site~DCA_axis)

DCAplot_3

# DCA1 til rapport
DCA1plot <- 
  ggplot(HBdata_long, aes(Year, DCA1, color = as.character(Site))) +
    geom_point() +
    geom_smooth() +
    scale_color_hue(labels = c("HT113", "HR104")) +
    labs(color = "Station")

DCA1plot
ggsave("Figures_rapp/DCA1_hard.png", DCA1plot, width = 8, height = 6, dpi=500)

# Innledende NMDS

# NMDS aksene mot hverandre, fargekodet for år
NMDSplot_1 <- ggplot(HBdata_long, aes(NMDS1, NMDS2, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  facet_wrap(~ Site, nrow=2) +
  labs(color = "År")


NMDSplot_1

```
### j. Species in relation to ordination
```{r DCA-species, fig.height = 8, fig.width = 6}
# Arter ift DCA 
ord.alger <- read.csv("Datasett/Hardbunn_KOPI/HB_AlgerORD.csv", sep=";", stringsAsFactors = FALSE, dec=",")
ord.dyr <- read.csv("Datasett/Hardbunn_KOPI/HB_DyrORD.csv", sep=";", stringsAsFactors = FALSE, dec=",")

# boundaries
bound.alger <- quantile(ord.alger$DCA1, probs = c(0.1, 0.9))
bound.dyr <- quantile(ord.dyr$DCA1, probs = c(0.1, 0.9))

# subset on boundaries
ord.sub.alger <- ord.alger[ord.alger$DCA1 <= bound.alger[[1]] |ord.alger$DCA1 >= bound.alger[[2]], ]
ord.sub.dyr   <- ord.dyr[ord.dyr$DCA1 <= bound.dyr[[1]] |ord.dyr$DCA1 >= bound.dyr[[2]], ]

# subset on boundaries for site scores (subjective)
ord.year.alger <- ord.alger[ord.alger$DCA1 >= -0.8 & ord.alger$DCA1 <= -0.3 |ord.alger$DCA1 >= 0.3 & ord.alger$DCA1 <= 0.8, ]
ord.year.dyr <- ord.dyr[ord.dyr$DCA1 >= -0.8 & ord.dyr$DCA1 <= -0.3 |ord.dyr$DCA1 >= 0.3 & ord.dyr$DCA1 <= 0.8, ]

# alger 90 tallet vs nå
ggplot(ord.year.alger, aes(x = IDnavn, y = DCA1, color = CAT2)) + 
  geom_point(size=3) +   # Draw points
  scale_colour_manual(values = c("tan3", "darkolivegreen2", "tomato2"), 
                      guide = guide_legend(title = "")) + # choose colors
  geom_segment(aes(   x = IDnavn, 
                   xend = IDnavn, 
                      y = min(DCA1), 
                   yend = max(DCA1)), 
               linetype="dashed", 
               size=0.1) +   # Draw dashed lines
  labs(title="Makroalger vs DCA1", 
       caption="source: Alger (DCA [-0.8,-0.2] (90-tallet) DCA [0.2,0.8] (nå))",
       x="") +  
  coord_flip() +
  theme_classic()

# dyr 90 tallet vs nå
ggplot(ord.year.dyr, aes(x=IDnavn, y=DCA1, color = CAT2)) + 
  geom_point(size=3) +   # Draw points
  scale_colour_brewer(palette = "Set2", guide = guide_legend(title = "")) + # choose colors
  geom_segment(aes(x=IDnavn, 
                   xend=IDnavn, 
                   y=min(DCA1), 
                   yend=max(DCA1)), 
               linetype="dashed", 
               size=0.1) +   # Draw dashed lines

  labs(title="Dyr vs DCA1", 
       caption="source: Alger (DCA [-0.8,-0.2] (90-tallet) DCA [0.2,0.8] (nå))",
       x="") +  
  coord_flip() +
  theme_classic()

# alger ekstremer
HardFloraDCA <-
  ord.sub.alger %>%
  arrange(DCA1) %>%
  mutate(IDnavn = factor(IDnavn, unique(IDnavn))) %>%
  ggplot(aes(x = IDnavn, y = DCA1, color = CAT2)) + 
    geom_point(size=3) +   # Draw points
    scale_colour_manual(values = c("tan3", "darkolivegreen2", "tomato2"), 
                        guide = guide_legend(title = ""),
                        labels=c("Brown", "Green", "Red")) + # choose colors etc
    geom_segment(aes(   x = IDnavn, 
                        xend = IDnavn, 
                        y = min(DCA1), 
                        yend = max(DCA1)),  
                 linetype="dashed", 
                 size=0.1) +   # Draw dashed lines
    labs(title="Macroalgae vs DCA1", 
         caption="source: Macroalgae subset (smallest and largest 10 % of DCA-scores)",
         x="") +  
    coord_flip() +
    theme_classic()

HardFloraDCA  
ggsave ("Figures_rapp/HardFloraDCA.png", HardFloraDCA, width = 6, height = 10, dpi=500)

# dyr ekstremer

HardFaunaDCA <-
  ord.sub.dyr %>%
  arrange(DCA1) %>%
  mutate(IDnavn = factor(IDnavn, unique(IDnavn))) %>%
  ggplot(aes(x = IDnavn, y = DCA1, color = CAT2)) + 
    geom_point(size=3) +   # Draw points
    scale_colour_brewer(palette = "Set2", 
                        guide = guide_legend(title = ""),
                        labels=c("Filterfeeders", "Carnivorous")) + # choose colors etc
    geom_segment(aes(x=IDnavn, 
                     xend=IDnavn, 
                     y=min(DCA1), 
                     yend=max(DCA1)), 
                 linetype="dashed", 
                 size=0.1) +   # Draw dashed lines
    
    labs(title="Animals vs DCA1", 
         caption="source: Animal subset (smallest and largest 10 % of DCA-scores)",
         x="") +  
    coord_flip() +
    theme_classic()

HardFaunaDCA
ggsave ("Figures_rapp/HardFaunaDCA.png", HardFaunaDCA, width = 6, height = 10, dpi=500)
```
Arters "optimum" (vid tolkning) langs DCA-aksen. Sier noe om hvor man mest sannsynlig finner de ulike artene. Har plottet arter som ligger i området der punktene fra tidlig 90-tall og nåtid befinner seg. Har også plottet ekstremene. (Legg derfor merke til skala på DCA1)


## 2. Reorganise soft-bottom fauna data and test ordination
### a. Data
```{r}
df_blot_b35 <- read_excel("Datasett/Bløtbunn/Klimaoverblikk bløtbunn_data til Helene og Dag.xlsx", sheet = "B35_artsliste")
colnames(df_blot_b35)[1] <- "Species"

df_blot_b05 <- read_excel("Datasett/Bløtbunn/Klimaoverblikk bløtbunn_data til Helene og Dag.xlsx", sheet = "B05_artsliste")
colnames(df_blot_b05)[1] <- "Species"

df_blot_ind <- read_excel("Datasett/Bløtbunn/Klimaoverblikk bløtbunn_data til Helene og Dag.xlsx", sheet = "indekser_sedimentparametere")

cat("b35, number of species:", nrow(df_blot_b35), ", Number of years:", ncol(df_blot_b35), "\n")
cat("b05, number of species:", nrow(df_blot_b05), ", Number of years:", ncol(df_blot_b05), "\n")

head(df_blot_b05)

```

### b. Put on data long format, combine, and extract Site and Year separately
```{r}
df_long_1 <- df_blot_b05 %>%
  gather("Siteyear", "Value", B05_1990:B05_2016)
df_long_2 <- df_blot_b35 %>%
  gather("Siteyear", "Value", B35_1990:B35_2016)

df_long <- bind_rows(df_long_1, df_long_2) %>%
  mutate(Site = substr(Siteyear, 1,3), Year = as.numeric(substr(Siteyear, 5,8))) %>%
  dplyr::select(Site, Year, Species, Value)

head(df_long)

str(df_long)
head (df_long)
# look at data
tb <- xtabs(~Year + Site, df_long)
pandoc.table(tb, style = "rmarkdown")

```

### c. Reshape for ordination  
Reshape to wide format the tidyr way (instead of using reshape()) - gives us nice column names right away  
```{r}
SiteSpec <- df_long %>%
  spread(Species, Value)

head(SiteSpec)

# må erstatte NA med 0
SiteSpec[is.na(SiteSpec)] = 0

# gjør det på gamlemåten siden jeg er usikker på hvordan det gjøres i dplyr
ant.reg  <- colSums(SiteSpec != 0) 
filter = names(which(ant.reg < 13))
SiteSpec <- SiteSpec[ ,-which(names(SiteSpec) %in% filter)] 
dim(SiteSpec)

# lager separate artsmatriser for de to ulike stasjonene (representerer ulike system)
spec_05 <- filter (SiteSpec, Site == "B05")
spec_35 <- filter (SiteSpec, Site == "B35")

# artsmatrise, fjerner år og stasjon
spec.m_05 = spec_05[, -c(1:2)]
spec.m_35 = spec_35[, -c(1:2)]

cat("Dimensions of 'spec.m_05':\n", dim(spec.m_05))

```

### d. Perform ordination
```{r}
# Innledende DCA - litt usikker på utfallet 
ord1_05 <- decorana(spec.m_05, iweigh = 1)
ord1_35 <- decorana(spec.m_35, iweigh = 1)

# Tungeeffekter?
plot(ord1_05, display = "sites")
plot(ord1_35, display = "sites") # kanskje litt her...

# Pretty longish output
# summary(ord1) # viser arts- og rutescorer

# Short output
summary(ord1_05, display = "none")
summary(ord1_35, display = "none") 

# Envfit for å kunne legge på år

# Innledende NMDS - litt usikker på utfallet 
ord2_05 <- metaMDS(spec.m_05)
ord2_35 <- metaMDS(spec.m_35)

ord2_05
ord2_35 

```
### e. Test plot 1
```{r}
# sorte sirkler er sites, røde pluss er arter

# DCA
plot(ord1_05) 
plot(ord1_35) 
# NMDS
plot(ord2_05)
plot(ord2_35)

```

### f. Test plot 2
```{r}
# DCA
plot(ord1_05, display = "sites", type = "n")
points(ord1_05, display = "sites", pch = 21, col = "red", bg = "yellow")

plot(ord1_35, display = "sites", type = "n")
points(ord1_35, display = "sites", pch = 21, col = "red", bg = "yellow")

# NMDS
plot(ord2_05, display = "sites", type = "n")
points(ord2_05, display = "sites", pch = 21, col = "red", bg = "yellow")

plot(ord2_35, display = "sites", type = "n")
points(ord2_35, display = "sites", pch = 21, col = "red", bg = "yellow")

```

### g. Make data set for ggplot
```{r}
# DCA
# trekker ut akseskorer så det kan kobles til stasjon og år... i.e. Sitename
ord1_05.sites <- data.frame(scores(ord1_05, display = "sites"))
ord1_05.df <- data.frame(spec_05[,c(1:2)], ord1_05.sites)

ord1_35.sites <- data.frame(scores(ord1_35, display = "sites"))
ord1_35.df <- data.frame(spec_35[,c(1:2)], ord1_35.sites)

# skriver til csv
write.csv2(ord1_05.df, row.names = FALSE, file = "Data_produced/BBordSites_05.csv")
write.csv2(ord1_35.df, row.names = FALSE, file = "Data_produced/BBordSites_35.csv")

# Ønske om å trekke ut andel forklart av hovedakse
# MEN!!! Dette har Jari Oksanen skrevet:
# The concept of total inertia does not exist in DCA. Alternative software use the total inertia from other ordination methods such as orthogonal correspondence analysis. Just call cca() for your data to get the total inertia of orthogonal CA. However, that really has no relevance for DCA, although that statistics is commonly used and ritually reported in papers.

# Bruk DCA 1 som en constraining variable i en cca for å beregne andel forklart.
cca.05 <- cca(spec.m_05 ~ DCA1, data = ord1_05.df) # 21 %
cca.35 <- cca(spec.m_35 ~ DCA1, data = ord1_35.df) # 43 %

# trekker ut species scores så de kan relateres til aksene
ord1_05.species <- data.frame(scores(ord1_05, display = "species"))
ord1_05.species$Art = rownames(ord1_05.species)
ord1_35.species <- data.frame(scores(ord1_35, display = "species"))
ord1_35.species$Art = rownames(ord1_35.species)

str(ord1_05.df)
str(ord1_35.df)
str(ord1_05.species)
str(ord1_35.species)

# legger DCA-aksene for de to stasjonene sammen igjen (etter at ordinasjon er gjort separat)
ord1_both <- bind_rows(ord1_05.df, ord1_35.df)

# NMDS
# trekker ut akseskorer så det kan kobles til stasjon og år... i.e. Sitename
ord2_05.sites <- data.frame(scores(ord2_05, display = "sites"))
ord2_05.df <- data.frame(spec_05[,c(1:2)], ord2_05.sites) 

ord2_35.sites <- data.frame(scores(ord2_35, display = "sites"))
ord2_35.df <- data.frame(spec_35[,c(1:2)], ord2_35.sites)

# legger NMDS-aksene for de to stasjonene sammen igjen (etter at ordinasjon er gjort separat)
ord2_both <- bind_rows(ord2_05.df, ord2_35.df)

```

### h. Redone ggplots similar to 'Klimaoverblikk_hardbunn.Rmd'   
```{r}

# DCA aksene mot hverandre, fargekodet for år
DCAplot_1 <- ggplot(ord1_both, aes(DCA1, DCA2, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  facet_wrap(~ Site, nrow=2) +
  labs(color = "År")

DCAplot_2 <- ggplot(ord1_both, aes(DCA3, DCA4, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  facet_wrap(~ Site, nrow=2) +
  labs(color = "År")

DCAplot_1
DCAplot_2

# DCA aksene mot år
DCAplot_3 <- ord1_both %>%
  gather("DCA_axis", "Value", DCA1:DCA4) %>%
  ggplot(aes(Year, Value)) +
    geom_line() +
    facet_grid(Site~DCA_axis)

DCAplot_3

NEWlabels = c(B05 = "BR1", B35 = "BT44")

# DCA1 til rapport
DCA1plot_bb <- 
  ggplot(ord1_both, aes(Year, DCA1)) +
    geom_point() +
    geom_smooth() +
    facet_wrap(~Site, labeller = labeller(Site = NEWlabels))
DCA1plot_bb
ggsave("Figures_rapp/DCA1_soft.png", DCA1plot, width = 8, height = 6, dpi=500)

# Innledende NMDS

# NMDS aksene mot hverandre, fargekodet for år
NMDSplot_1 <- ggplot(ord2_both, aes(NMDS1, NMDS2, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  facet_wrap(~ Site, nrow=2) +
  labs(color = "År")


NMDSplot_1

# NMDS aksene mot år
NMDSplot_2 <- ord2_both %>%
  gather("NMDS_axis", "Value", NMDS1:NMDS2) %>%
  ggplot(aes(Year, Value)) +
    geom_line() +
    facet_grid(Site~NMDS_axis)

NMDSplot_2

# for both DCA and NMDS the time correlated changes in species comp seems to be captured by the first axis. Note that there is opposite directions for the two stations for DCA1, does that matter?  

```

### i. Species in relation to ordination
```{r, fig.height = 8, fig.width = 6}
# boundaries
bound_05 <- quantile(ord1_05.species$DCA1, probs = c(0.1, 0.9))
bound_35 <- quantile(ord1_35.species$DCA1, probs = c(0.1, 0.9))

# subset on boundaries
bound_05.sub <- ord1_05.species[ord1_05.species$DCA1 <= bound_05[[1]] |ord1_05.species$DCA1 >= bound_05[[2]], ]
bound_35.sub <- ord1_35.species[ord1_35.species$DCA1 <= bound_35[[1]] |ord1_35.species$DCA1 >= bound_35[[2]], ]

# subset on boundaries for site scores (subjective) - må justeres
#ord.year_05 <- ord1_05.species[ord1_05.species$DCA1 >= -0.8 & ord1_05.species$DCA1 <= -0.25 |ord1_05.species$DCA1 >= 0.25 & ord1_05.species$DCA1 <= 0.7, ]
#ord.year_35 <- ord1_35.species[ord1_35.species$DCA1 >= -0.6 & ord1_35.species$DCA1 <= -0.25 |ord1_35.species$DCA1 >= 0.3 & ord1_35.species$DCA1 <= 1.5, ]


write.csv2(bound_05.sub, file = "Data_produced/BBord1_05_sub.csv")
write.csv2(bound_35.sub, file = "Data_produced/BBord1_35_sub.csv")
write.csv2(ord1_05.species, file = "Data_produced/BBord1_05_full.csv")
write.csv2(ord1_35.species, file = "Data_produced/BBord1_35_full.csv")

# hvor eventuelt subjektive skiller for å dele opp årsperioder skal gå bør nok fagansvarlig vurdere (dersom det trengs i tolkningene vel og merke)

# disse blir litt mer komplisert dersom vi deler opp i grupper
BBartOrd_05 <- 
  bound_05.sub %>%
  arrange(DCA1) %>%
  mutate(Art = factor(Art, unique(Art))) %>%
  ggplot(aes(x = reorder(Art, desc(Art)), y = DCA1)) + 
  geom_point(size=3) +   # Draw points
  geom_segment(aes(   x = Art, 
                   xend = Art, 
                      y = min(DCA1), 
                   yend = max(DCA1)), 
               linetype="dashed", 
               size=0.1) +   # Draw dashed lines
  labs(title="Soft bottom fauna vs DCA1", 
       caption="source: Site BR1 (smallest and largest 10 % av DCA-scores)",
       x="") +  
  coord_flip() +
  scale_y_reverse() +
  theme_classic()

BBartOrd_35 <- 
  bound_35.sub %>%
  arrange(DCA1) %>%
  mutate(Art = factor(Art, unique(Art))) %>%
  ggplot(aes(x = Art, y = DCA1)) +
  geom_point(size=3) +   # Draw points
  geom_segment(aes(   x = Art, 
                   xend = Art, 
                      y = min(DCA1), 
                   yend = max(DCA1)), 
               linetype="dashed", 
               size=0.1) +   # Draw dashed lines
  labs(title="Soft bottom fauna vs DCA1", 
       caption="source: Site BT44 (smallest and largest 10 % av DCA-scores)",
       x="") +  
  coord_flip() +
  theme_classic()

BBartOrd_35
BBartOrd_05

ggsave ("Figures_rapp/SoftFaunaDCA_05.png", BBartOrd_05, width = 6, height = 10, dpi=500)
ggsave ("Figures_rapp/SoftFaunaDCA_35.png", BBartOrd_35, width = 6, height = 10, dpi=500)

# skip these - very difficult to set boundaries subjectively
  ggplot(ord.year_35, aes(x = Art, y = DCA1)) + 
  geom_point(size=3) +   # Draw points
  geom_segment(aes(   x = Art, 
                   xend = Art, 
                      y = min(DCA1), 
                   yend = max(DCA1)), 
               linetype="dashed", 
               size=0.1) +   # Draw dashed lines
  labs(title="Soft bottom fauna vs DCA1", 
       caption="source: Site BT44",
       x="") +  
  coord_flip() +
  theme_classic()
  
  ggplot(ord.year_05, aes(x = Art, y = DCA1)) + 
  geom_point(size=3) +   # Draw points
  geom_segment(aes(   x = Art, 
                   xend = Art, 
                      y = min(DCA1), 
                   yend = max(DCA1)), 
               linetype="dashed", 
               size=0.1) +   # Draw dashed lines
  labs(title="Soft bottom fauna vs DCA1", 
       caption="source: Site BR1",
       x="") +  
  coord_flip() +
  theme_classic()

```

### j. Combine DCA main axis plots for hard and soft bottom
```{r DCA plots publication}
DCA1plot_hb <- HBdata_long %>% mutate(Dummy = "H", Station = ifelse(Site == 407, "HT113", "HR104")) %>% 
  ggplot(aes(Year, DCA1, color = Station, fill = Station)) +
    geom_point(size = 0.9) +
    geom_smooth(size = 0.7, aes(linetype = Station))+
    facet_wrap(~ Dummy, strip.position = "right") +
  theme_bw()

DCA1plot_bb2 <- 
  ggplot(ord1_both, aes(Year, DCA1)) +
    geom_point(size = 0.9) +
    geom_smooth(size = 0.7) +
    facet_wrap(~Site, labeller = labeller(Site = NEWlabels), ncol = 1, strip.position = "right") +
  labs(x = "") +
  theme_bw()

DCA1plot_bb2 + DCA1plot_hb +
  plot_layout(heights = c(2,1), ncol =1)

ggsave ("Figures_publ/DCAmain.png", width = 8, height = 6, dpi=500)
```


## 3. Reorganise phytoplankton data and test ordination
### a. Data
Use extraction of phytoplankton data during springbloom produced in "04_Get_plankton_bloom_data.Rmd"
HFR comment: agreed with Lars that we use Bloom_02
```{r}

require(RCurl)

# Bloom_01: phytoplankton data extracted when Chl a (at 5m) over absolute threshold (1.1) during Feb to Apr  
bloom_01_median <-read.csv(text=getURL("https://raw.githubusercontent.com/NIVANorge/Klimaoverblikk/master/Data_produced/df_plank_bloom01_median.csv"), header=T)

# Bloom_02: phytoplankton data extrated during max Chla (at 5m) during Feb to Apr
bloom_02 <-read.csv(text=getURL("https://raw.githubusercontent.com/NIVANorge/Klimaoverblikk/master/Data_produced/df_plank_bloom02.csv"), header=T)

# bloom_02 <- read.csv2("Data_produced/df_plank_bloom02.csv", sep = ",")

```

### c. Reshape for ordination  
```{r}

#dette datasettet er jo "wide" allerede
#SiteSpec <- df_long %>%
  #spread(Species, Value)

# artsmatrise: beholder kun artsdata (klassenivå)
# NB! artsdata er aggregert til 15 klasser
spec.m_phyto_01 = bloom_01_median[, -c(1,17:20)]
spec.m_phyto_02 = bloom_02[, -c(1:4,20:26)]

cat("Dimensions of 'spec.m_phyto':\n", dim(spec.m_phyto_01))
cat("Dimensions of 'spec.m_phyto':\n", dim(spec.m_phyto_02))

```

### d. Perform ordination
```{r}
# må erstatte NA med 0
spec.m_phyto_02[is.na(spec.m_phyto_02)] = 0
spec.m_phyto_01[is.na(spec.m_phyto_01)] = 0

# Innledende DCA - litt usikker på utfallet 
ord_phyto01 <- decorana(spec.m_phyto_01)
ord_phyto02 <- decorana(spec.m_phyto_02, iweigh = 1)

# Innledende NMDS - litt usikker på utfallet 
nmds_phyto01 <- metaMDS(spec.m_phyto_01)
nmds_phyto02 <- metaMDS(spec.m_phyto_02)

 
```
### e. Test plot 1
```{r}

# DCA
plot(ord_phyto01) 
plot(ord_phyto02) 

# NMDS
plot(nmds_phyto01)
plot(nmds_phyto02)

```

### f. Test plot 2
```{r}
# DCA
plot(ord_phyto01, display = "sites", type = "n")
points(ord_phyto01, display = "sites", pch = 21, col = "red", bg = "yellow")
plot(ord_phyto02, display = "sites", type = "n")
points(ord_phyto02, display = "sites", pch = 21, col = "red", bg = "yellow")

# NMDS
plot(nmds_phyto01, display = "sites", type = "n")
points(nmds_phyto01, display = "sites", pch = 21, col = "red", bg = "yellow")
plot(nmds_phyto02, display = "sites", type = "n")
points(nmds_phyto02, display = "sites", pch = 21, col = "red", bg = "yellow")

```

### g. Make data set for ggplot
```{r}
# DCA
# trekker ut akseskorer så det kan kobles til år... i.e. Sitename
ord_phyto01.sites <- data.frame(scores(ord_phyto01, display = "sites"))
ord_phyto01.df <- data.frame(Year = bloom_01_median$Year, ord_phyto01.sites)
ord_phyto02.sites <- data.frame(scores(ord_phyto02, display = "sites"))
ord_phyto02.df <- data.frame(Year = bloom_02$Year, ord_phyto02.sites)

# skriver til csv
write.csv2(ord_phyto01.df, row.names = FALSE, file = "Data_produced/P01ord.csv")
write.csv2(ord_phyto02.df, row.names = FALSE, file = "Data_produced/P02ord.csv")

str(ord_phyto02.df)

# NMDS
# trekker ut akseskorer så det kan kobles til stasjon og år... i.e. Sitename
nmds_phyto01.sites <- data.frame(scores(nmds_phyto01, display = "sites"))
nmds_phyto01.df <- data.frame(Year = bloom_01_median$Year, nmds_phyto01.sites)
nmds_phyto02.sites <- data.frame(scores(nmds_phyto02, display = "sites"))
nmds_phyto02.df <- data.frame(Year = bloom_02$Year, nmds_phyto02.sites)

str(nmds_phyto02.df)

# trekker ut species scores så de kan relateres til aksene
ord_phyto01.species <- data.frame(scores(ord_phyto01, display = "species"))
ord_phyto01.species$Art = rownames(ord_phyto01.species)
ord_phyto02.species <- data.frame(scores(ord_phyto02, display = "species"))
ord_phyto02.species$Art = rownames(ord_phyto02.species)

```

### h. Redone ggplots similar to 'Klimaoverblikk_hardbunn.Rmd'
#### On bloom01:

**GSA comment:**

* Since DCA and NMDS diagrams are more difficult to compare here, we could do a procrustes test... Haven't done it in several years, but it's still available in the vegan package.
```{r}
# Innledende DCA 
DCAplot.1 <- ggplot(ord_phyto01.df, aes(DCA1, DCA2, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  labs(color = "Year")

DCAplot.2 <- ggplot(ord_phyto01.df, aes(DCA3, DCA4, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  labs(color = "Year")

DCAplot.1
DCAplot.2

# DCA aksene mot år
DCAplot_3 <- ord_phyto01.df %>%
  gather("DCA_axis", "Value", DCA1:DCA4) %>%
  ggplot(aes(Year, Value)) +
    geom_line() +
    facet_grid(~DCA_axis)

DCAplot_3

# DCA1 til rapport
DCA1plot <- 
  ggplot(ord_phyto01.df, aes(Year, DCA1)) +
    geom_point() +
    geom_smooth()

DCA1plot
ggsave("Figures_rapp/DCA1_bloom1.png", DCA1plot, width = 8, height = 6, dpi=500)


# Innledende NMDS 
NMDSplot.1 <- ggplot(nmds_phyto01.df, aes(NMDS1, NMDS2, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  labs(color = "Year")

NMDSplot.1

# NMDS aksene mot år
NMDSplot_2 <- nmds_phyto01.df %>%
  gather("NMDS_axis", "Value", NMDS1:NMDS2) %>%
  ggplot(aes(Year, Value)) +
    geom_line() +
    facet_grid(~NMDS_axis)

NMDSplot_2

```
#### On bloom02
```{r}
# Innledende DCA 
DCAplot.1 <- ggplot(ord_phyto02.df, aes(DCA1, DCA2, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  labs(color = "Year")

DCAplot.2 <- ggplot(ord_phyto02.df, aes(DCA3, DCA4, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  labs(color = "Year")

DCAplot.1
DCAplot.2

# DCA aksene mot år
DCAplot_3 <- ord_phyto02.df %>%
  gather("DCA_axis", "Value", DCA1:DCA4) %>%
  ggplot(aes(Year, Value)) +
    geom_line() +
    facet_grid(~DCA_axis)

DCAplot_3

# DCA1 til rapport
DCA1plot_2 <- 
  ggplot(ord_phyto02.df, aes(Year, DCA1)) +
    geom_point() +
    geom_smooth()

DCA1plot_2
ggsave("Figures_rapp/DCA1_bloom2.png", DCA1plot_2, width = 8, height = 6, dpi=500)


# Innledende NMDS 
NMDSplot.1 <- ggplot(nmds_phyto02.df, aes(NMDS1, NMDS2, color = as.numeric(as.character(Year)))) + 
  geom_point() + 
  labs(color = "Year")

NMDSplot.1

# NMDS aksene mot år
NMDSplot_2 <- nmds_phyto02.df %>%
  gather("NMDS_axis", "Value", NMDS1:NMDS2) %>%
  ggplot(aes(Year, Value)) +
    geom_line() +
    facet_grid(~NMDS_axis)

NMDSplot_2

```

### i. Species in relation to ordination

**Comment GSA:**

* remember that the relationship between year and axis is not as straight forward as for the hard and soft bottom datasets. But the main axis may be coupled to some other variable and be interpreted in relation to that.

```{r, fig.height = 6, fig.width = 6}
# skipping boundaries since there are so few groups

PlanktonOrd_01 <- 
  ord_phyto01.species %>%
  arrange(DCA1) %>%
  mutate(Art = factor(Art, unique(Art))) %>%
  ggplot(aes(x = Art, y = DCA1)) + 
  geom_point(size=3) +   # Draw points
  geom_segment(aes(   x = Art, 
                   xend = Art, 
                      y = min(DCA1), 
                   yend = max(DCA1)), 
               linetype="dashed", 
               size=0.1) +   # Draw dashed lines
  labs(title="Phytoplankton vs DCA1", 
       caption="source: phytoplankton data extracted when Chl a (at 5m) over absolute threshold (1.1) during Feb to Apr",
       x="") +  
  coord_flip() +
  theme_classic()

PlanktonOrd_02 <- 
  ord_phyto02.species %>%
  arrange(DCA1) %>%
  mutate(Art = factor(Art, unique(Art))) %>%
  ggplot(aes(x = Art, y = DCA1)) + 
  geom_point(size=3) +   # Draw points
  geom_segment(aes(   x = Art, 
                   xend = Art, 
                      y = min(DCA1), 
                   yend = max(DCA1)), 
               linetype="dashed", 
               size=0.1) +   # Draw dashed lines
  labs(title="Phytoplankton vs DCA1", 
       caption="source: phytoplankton data extracted during max Chla (at 5m) during Feb to Apr",
       x="") +  
  coord_flip() +
  theme_classic()

PlanktonOrd_01
PlanktonOrd_02

ggsave ("Figures_rapp/PhytoTresholdDCA.png", plot = PlanktonOrd_01, width = 4, height = 4, dpi=500)
ggsave ("Figures_rapp/PhytoFebAprDCA.png", plot = PlanktonOrd_02, width = 4, height = 4, dpi=500)

```