Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bayes R2 for occupancy models #58

Open
JustinCally opened this issue Nov 11, 2021 · 2 comments
Open

bayes R2 for occupancy models #58

JustinCally opened this issue Nov 11, 2021 · 2 comments

Comments

@JustinCally
Copy link
Contributor

Hi Ken,

I'm not sure whether the use of a bayes R2 would be broadly suitable for occupancy models given the limitations with its application in logistic regression and often low values. However I had a crack at getting a bayes R2 for the state, detection submodels (and their combined value) using the residual method (https://avehtari.github.io/bayes_R2/bayes_R2.html). A bit of a messy function and example below.

# R2 example for logistic occupancy model
    library(ubms)
#> Loading required package: unmarked
#> Loading required package: lattice
#> 
#> Attaching package: 'ubms'
#> The following objects are masked from 'package:unmarked':
#> 
#>     fitList, projected
#> The following object is masked from 'package:base':
#> 
#>     gamma
    
    # R2 function
    bayes_R2_res_ubms <- function(fit, re.form = NULL, draws = draws) {
        
        # Get the observed occupancy at each site for each observation period
        y <- ubms::getY(fit)
        
        # Get the observed occupancy at each site
        y_mod <- matrix(apply(y, 1, FUN = function(x) max(x, na.rm = TRUE), simplify = TRUE), ncol = 1)
        
        # Get the linear predictor for the 'state'
        ypred <- ubms::posterior_linpred(fit, transform = TRUE, submodel = "state", re.form = re.form, draws = draws)
        yp <- ubms::posterior_linpred(fit, transform = TRUE, submodel = "det", re.form = re.form, draws = draws)
        
        # Assume perfect detection (no effect of detection submodel)
        yp1 <- yp
        yp1[!is.na(yp1)] <- 1
        
        # For each draw obtain the probability for each site that a detection will occur (0-1).
        # Probably a less messy way to do this
        ys <- list("state" = yp,
                   "detection" = yp1)
        
        ypred_mod <- list()
        r2 <- list()
        
        for(j in 1:length(ys)) {
            
            # detection * state
            ypred_prob <- ys[[j]]
            
            # draws x sites x obs
            ypred_cond <- array(data = NA, dim = c(draws, dim(ypred)[2], fit@response@max_obs))
            ypred_mod[[j]] <- ypred
            
            # loop over draws
            for(i in 1:draws) {
                # repeat the latent state by observation periods
                ypred_vec <- rep(ypred[i,], each = fit@response@max_obs)
                # det * state
                ypred_prob[i,] <- ypred_prob[i,]*ypred_vec
                # force into matrix with nsites X nobs
                ypred_cond[i,,] <- matrix(ypred_prob[i,], ncol = fit@response@max_obs, byrow = TRUE)
                # 1 minus the probability that all observations are zero = at least 1 detection
                ypred_mod[[j]][i,] <- 1-apply(1-ypred_cond[i,,], 1, function(x) {
                    prod(x, na.rm = TRUE)
                })
                
            }
            
            if (fit@response@z_dist == "binomial" && NCOL(y_mod) == 2) {
                trials <- rowSums(y_mod)
                y_mod <- y_mod[, 1]
                ypred_mod[[j]] <- ypred_mod[[j]] %*% diag(trials)
            }
            e <- -1 * sweep(ypred_mod[[j]], 2, y_mod)
            var_ypred <- apply(ypred_mod[[j]], 1, var)
            var_e <- apply(e, 1, var)
            r2[[j]] <- var_ypred / (var_ypred + var_e)
        }
        r2[[3]] <- r2[[1]] - r2[[2]]
        names(r2) <- c("both", "state", "det")
        return(r2)
    }
    
    # Data simulation
    
    set.seed(123)
    dat_occ <- data.frame(x1=rnorm(500))
    dat_p <- data.frame(x2=rnorm(500*5))
    y <- matrix(NA, 500, 5)
    z <- rep(NA, 500)
    b <- c(0.4, -0.5, 0.3, 0.5)
    re_fac <- factor(sample(letters[1:26], 500, replace=T))
    dat_occ$group <- re_fac
    re <- rnorm(26, 0, 1.2)
    re_idx <- as.numeric(re_fac)
    idx <- 1
    for (i in 1:500){
        z[i] <- rbinom(1,1, plogis(b[1] + b[2]*dat_occ$x1[i] + re[re_idx[i]]))
        for (j in 1:5){
            y[i,j] <- z[i]*rbinom(1,1,
                                  plogis(b[3] + b[4]*dat_p$x2[idx]))
            idx <- idx + 1
        }
    }
    
    # unmarked frame
    umf <- unmarkedFrameOccu(y=y, siteCovs=dat_occ, obsCovs=dat_p)
    
    # model
    options(mc.cores=3) #number of parallel cores to use
    fm <- stan_occu(~x2 ~x1 + (1|group), umf, chains=3)
    
    bayes_R2_res_ubms(fm, draws = 100)
#> $both
#>   [1] 0.2436708 0.1813600 0.1861829 0.1968885 0.2093023 0.2065249 0.2262449
#>   [8] 0.1605566 0.2356042 0.2089506 0.2313770 0.2569806 0.2476316 0.2080628
#>  [15] 0.1884270 0.2458283 0.2353730 0.2410739 0.2087004 0.2352926 0.2182499
#>  [22] 0.2195304 0.2197247 0.2299345 0.1642092 0.2512100 0.2537433 0.1866575
#>  [29] 0.2420921 0.1616609 0.1850477 0.2040931 0.2660747 0.1736265 0.2113426
#>  [36] 0.2345318 0.2077254 0.1940378 0.2166815 0.2828791 0.2590453 0.1380645
#>  [43] 0.2494774 0.2094812 0.1596296 0.2125061 0.2406926 0.2306142 0.2238410
#>  [50] 0.1424783 0.2422054 0.2399446 0.2433104 0.2083924 0.2163885 0.2143759
#>  [57] 0.2868863 0.2545129 0.2662583 0.2061908 0.2272771 0.1880253 0.2036795
#>  [64] 0.2259326 0.2396712 0.1560981 0.2029403 0.2575136 0.2447248 0.2600542
#>  [71] 0.1503401 0.1612022 0.2184497 0.2333725 0.2235703 0.2745752 0.1902804
#>  [78] 0.2021373 0.1704586 0.2658361 0.2470988 0.2298406 0.2281244 0.2116103
#>  [85] 0.2407505 0.2564699 0.2642025 0.2530184 0.2964650 0.1884810 0.1804507
#>  [92] 0.2371928 0.2304418 0.2121007 0.2515888 0.2200955 0.1956262 0.2275631
#>  [99] 0.2210517 0.2139585
#> 
#> $state
#>   [1] 0.17605019 0.10263418 0.10580724 0.13545364 0.14482887 0.13800919
#>   [7] 0.17355460 0.08628623 0.17066872 0.14583832 0.16249959 0.20375155
#>  [13] 0.18365678 0.13735974 0.11535767 0.17160635 0.16365889 0.18999839
#>  [19] 0.16342425 0.17257228 0.16156030 0.15115874 0.15180402 0.17367579
#>  [25] 0.09177282 0.18924085 0.20799994 0.13745090 0.18211136 0.09588723
#>  [31] 0.11718993 0.14892238 0.22422619 0.13182738 0.15381415 0.17530606
#>  [37] 0.14920495 0.12388629 0.15416240 0.24336614 0.20041741 0.08053604
#>  [43] 0.20827089 0.14264055 0.09731327 0.14357953 0.18287715 0.16933439
#>  [49] 0.15418602 0.07828137 0.17981013 0.17819720 0.19682582 0.15552595
#>  [55] 0.15226781 0.15547727 0.24004102 0.20505126 0.22142930 0.14618668
#>  [61] 0.15626259 0.12546194 0.14349652 0.16704033 0.18849389 0.08944449
#>  [67] 0.12046083 0.22172548 0.19268544 0.20941069 0.07355600 0.11754425
#>  [73] 0.15150645 0.17893636 0.16290501 0.22485436 0.12711429 0.15046677
#>  [79] 0.11693493 0.22099586 0.18770485 0.17481347 0.16442692 0.14398650
#>  [85] 0.17886823 0.20548550 0.20694618 0.20829883 0.24947301 0.11531789
#>  [91] 0.10101353 0.18807278 0.18736563 0.14776186 0.19111258 0.16002263
#>  [97] 0.11816607 0.17083967 0.16663188 0.15840369
#> 
#> $det
#>   [1] 0.06762062 0.07872586 0.08037562 0.06143486 0.06447340 0.06851572
#>   [7] 0.05269027 0.07427036 0.06493549 0.06311229 0.06887738 0.05322905
#>  [13] 0.06397480 0.07070302 0.07306933 0.07422198 0.07171414 0.05107554
#>  [19] 0.04527614 0.06272035 0.05668959 0.06837161 0.06792069 0.05625875
#>  [25] 0.07243638 0.06196911 0.04574332 0.04920663 0.05998069 0.06577362
#>  [31] 0.06785781 0.05517070 0.04184849 0.04179915 0.05752846 0.05922575
#>  [37] 0.05852047 0.07015146 0.06251909 0.03951300 0.05862791 0.05752848
#>  [43] 0.04120654 0.06684066 0.06231634 0.06892656 0.05781542 0.06127982
#>  [49] 0.06965496 0.06419698 0.06239527 0.06174742 0.04648460 0.05286642
#>  [55] 0.06412072 0.05889858 0.04684531 0.04946166 0.04482896 0.06000413
#>  [61] 0.07101450 0.06256336 0.06018296 0.05889223 0.05117732 0.06665366
#>  [67] 0.08247942 0.03578808 0.05203937 0.05064353 0.07678410 0.04365799
#>  [73] 0.06694323 0.05443615 0.06066532 0.04972085 0.06316610 0.05167058
#>  [79] 0.05352365 0.04484027 0.05939398 0.05502711 0.06369745 0.06762376
#>  [85] 0.06188222 0.05098439 0.05725632 0.04471958 0.04699198 0.07316309
#>  [91] 0.07943715 0.04912006 0.04307613 0.06433879 0.06047627 0.06007289
#>  [97] 0.07746012 0.05672347 0.05441978 0.05555479

Created on 2021-11-11 by the reprex package (v2.0.1)

@kenkellner
Copy link
Owner

Hi Justin,

Thanks, this is really cool! I have to admit I don't know that much about calculating a Bayes R2 for this type of model so I will have to look into it (thanks for including the link). It does seem like something that would be useful to add. Are you aware of any papers that calculate a Bayes r2 for occupancy models?

Ken

@JustinCally
Copy link
Contributor Author

Hi Ken,

I'm unaware of any paper's calculating Bayes R2 for occupancy models. Hence I'm a bit unsure whether the function written above is the preferred approach. In comparison, the method used in unmarked::modSel() that calculates R2 based on the likelihoods of the null model and fitted model (Nagelkerke 1991). I wonder whether it is possible to adapt this method in a Bayes context...

I suppose one good thing about the function above is that it provides the bayes R2 for both detection and state submodels, which may be beneficial in understanding the value of using an occupancy model for a given situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants