-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathurban_water_conservation_policies.Rnw
1871 lines (1692 loc) · 95.4 KB
/
urban_water_conservation_policies.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<<knitr_options, cache=F, echo=F, include=F, eval=T, message=F, warning=F, error=F, results='hide'>>=
library(knitr)
#
# Set for_submission TRUE to produce a LaTeX file for submission to
# Earth's Future, with figures not included and figure captions and tables at
# the end of the document and FALSE to produce a preprint with figures and
# tables integrated into the document
#
for_submission <- FALSE
do_cache <- FALSE
#
# Set all the following variables to TRUE to recalculate all computations.
# Setting these to FALSE re-uses previous computations in order to save
# time.
# * download_data_sets: Download fresh copies of the data sets (from the supporting information) from
# https://doi.org/10.6084/m9.figshare.5714944. These data sets contain the VWCI scores and all the covariates
# necessary to perform the regression analyses described in the paper.
# * recalc_pvi: Recalculate the PVI scores for cities and states. This requires you to obtain the raw
# state and county-level election returns for the 2008 and 2012 presidential elections
# (available from CQ Press, as cited in the paper). I downloaded the returns in groups, by state:
# * 2008_al_fl.csv and 2012_al_fl.csv for Alabama through Florida
# * 2008_ga_ky.csv and 2012_ga_ky.csv for Georgia through Kentucky
# * 2008_la_ne.csv and 2012_la_ne.csv for Louisiana through Nebraska
# * 2008_nv_nm.csv and 2012_nv_nm.csv for Nevada through New Mexico
# * 2008_ny_sd.csv and 2012_ny_sd.csv for New York through South Dakota
# * 2008_tn_wy.csv and 2012_tn_wy.csv for Tennessee through Wyoming
# These files should reside in the data/proprietary/ directory.
# * recalc_explanatory: Build a data frame with the following explanatory variables
# * population
# * population growth rate
# * surface water use as a fraction of public water supply
# * Democratic share of the two-party vote in the 2008 and 2012 presidential elections.
# * National Democratic vote share.
# * Partisan Voting Index (PVI)
# * Population density
# * Population density growth rate
# * MSA area
# * recalc_vwci: Assign FIPS codes to MSA-level VWCI scores (for geolocation).
# * recalc_city_coords: Recalculate the geographic location of the MSAs.
# * recalc_climate: Recalculate the average temperature, precipitation, and Köppen aridity index for cities
# and states.
# * recalc_msa_variables: Assemble a data-frame with MSA-level variables and covariates.
# * recalc_standardized_data: Transform raw MSA- and state-level variables into standardized forms, as described in the
# paper.
# * resample_models: Run the regression analysis for VWCI, Requirements, and Rebates using the baseline regression model
# described in the paper (hierarchical beta-binomial, with varying intercept).
# * resample_test_models: Run the alternate regresion models described in the "Robustness Checks" sections of the paper \
# and supporting information.
# * resample_pooled_test_models: Run the baseline and alternate regression models using the alternate normalization of
# MSA-level covariates (normalized relative to all MSAs, as opposed to representing the difference between the MSA and
# the state).
# * force_resampling: Force re-running all regression models that are selected by resample_models,
# resample_test_models, and resample_pooled_test_models. This is only necessary if do_cache is set to TRUE,
# which I do not recommend; if do_cache is set to FALSE, this has no effect and all selected models are
# automatically re-sampled.
#
download_data_sets <- TRUE
recalc_pvi <- FALSE
recalc_explanatory <- TRUE
recalc_vwci <- TRUE
recalc_city_coords <- TRUE
recalc_climate <- TRUE
recalc_msa_variables <- TRUE
recalc_standardized_data <- TRUE
resample_models <- TRUE
resample_test_models <- TRUE
resample_pooled_test_models <- TRUE
force_resampling <- TRUE
if (for_submission) {
figures_dir <- "figures/"
cache_dir <- "cache/"
output_dir <- "output/"
} else {
figures_dir <- "figures_preprint/"
cache_dir <- "cache_preprint/"
output_dir <- "output_preprint"
}
my_tex_chunk_hook <- function(x, options) {
ai = knitr:::output_asis(x, options)
size = if (options$size == 'normalsize') '' else sprintf('\\%s', options$size)
if (!ai) x = sprintf('%% jg_tex_chunk_hook\n%s\n%s\n', size, x)
if (options$split) {
name = fig_path('.tex', options, NULL)
if (!file.exists(dirname(name)))
dir.create(dirname(name))
cat(x, file = name)
sprintf('\\input{%s}', name)
} else x
}
my_tex_output_hook <- function(x, options) {
if (knitr:::output_asis(x, options)) {
x
} else .verb.hook(x)
}
my_tex_plot_hook <- function(x, options) {
knitr:::hook_plot_tex(x, options)
}
knit_hooks$set(chunk = my_tex_chunk_hook, output = my_tex_output_hook,
plot = my_tex_plot_hook)
opts_knit$set(progress = TRUE, verbose = TRUE,
header = '',
self.contained="FALSE"
)
opts_chunk$set(cache = FALSE, cache.lazy = FALSE,
echo = FALSE,
message = FALSE, warning = FALSE,
error = FALSE, # stop knitting on errors
out.width="0.8\\linewidth",
cache.path = cache_dir,
fig.path = figures_dir)
@
\documentclass[draft,linenumbers]{agujournal}
\usepackage[hyphens]{url}
\usepackage{amsmath}
% \draftfalse
\journalname{Earth's Future}
<<ms_options, cache=F, echo=F, include=F, eval=T, message=F, results='hide'>>=
message("Re-checking Chunk:\n", as.character(deparse(knit_hooks$get("chunk"))), "\n")
set.seed(447268152)
random_alpha <- TRUE # random effects on state-level intercept.
sigma_sigma_delta <- 2.5 # for partial-pooling at state-level.
mu_phi_vwci <- 50
sigma_phi_vwci <- 20
mu_phi_rr <- 15
sigma_phi_rr <- 10
pop_target_year <- 2014
bea_year <- 2014
@
<<setup, echo=F, include=F, eval=T, cache = F, results='hide'>>=
options(tidyverse.quiet = TRUE)
library(pacman)
p_load(tidyverse)
p_load(readxl)
p_load(forcats)
p_load(extrafont)
p_load_gh('slowkow/ggrepel')
p_load(gridExtra)
p_load(RColorBrewer)
p_load(viridis)
p_load(ggthemes)
p_load(rstan)
p_load_gh('jonathan-g/jgmcmc@jgmcmc')
p_load_gh('jonathan-g/jgally@jgally')
p_load(xtable)
options(xtable.caption.placement = "top")
thick_ci <- c(0.34 / 2., 1. - 0.34 / 2.)
thin_ci <- c(0.05 / 2., 1. - 0.05 / 2.)
figure_font <- choose_font('CM Sans')
if (figure_font == '') {
font_install('fontcm')
figure_font <- choose_font(c('CM Sans', 'Helvetica', 'Arial', 'sans'), quiet = FALSE)
}
loadfonts(device = 'pdf', quiet = TRUE)
if (Sys.info()['sysname'] == 'Windows') {
loadfonts(device = 'win', quiet = TRUE)
}
# theme_set(theme_bw(base_size = 15))
theme_set(
theme_bw(base_size = 15, base_family = figure_font) +
theme(plot.title = element_text(size = rel(1)),
axis.text.x = element_text(size = rel(1)),
panel.grid.major = element_line(size = 0.5, colour = "gray90"),
panel.grid.minor = element_line(size = 0.25, colour = "gray90")
)
)
@
%
<<set_path_vars, echo=F, eval=T, cache=FALSE, results='hide'>>=
script_dir <- 'scripts'
data_dir <- 'data'
if (!dir.exists(output_dir)) dir.create(output_dir, recursive = TRUE)
if (!dir.exists(file.path(output_dir, 'tables'))) dir.create(file.path(output_dir, 'tables'), recursive = TRUE)
if (!dir.exists(file.path(output_dir, 'figures'))) dir.create(file.path(output_dir, 'figures'), recursive = TRUE)
@
<<download_data, echo=F, eval=download_data_sets, message=FALSE, warning=FALSE, cache=do_cache, cache.rebuild=c(download_data_sets), cache.extra=c(file.mtime(file.path(script_dir, 'download_data_sets.R')))>>=
source(file.path(script_dir, 'download_data_sets.R'))
download_data_from_figshare(data_dir)
@
<<calculate_pvi, echo=FALSE, eval=recalc_pvi, message=FALSE, warning=FALSE, cache=do_cache, cache.rebuild=c(!file.exists(file.path(data_dir, 'pvi_by_msa.csv'))), cache.extra=c(file.mtime(file.path(script_dir, 'calculate_pvi.R')))>>=
source(file.path(script_dir, 'calculate_pvi.R'))
calculate_pvi(data_dir, save = TRUE)
@
<<reprocess_explanatory_variables, echo=F, eval=recalc_explanatory, message=F, warning=F, error=F, results='hide', cache=do_cache, cache.rebuild=c(!file.exists(file.path(data_dir, 'VWCI_explanatory_variables.csv'), !file.exists(file.path(data_dir, 'pvi_by_state.csv')), !file.exists(file.path(data_dir, 'pvi_by_msa.csv')), !file.exists(file.path(data_dir, 'wudata_out_NWIS_2010.csv')), !file.exists(file.path(data_dir, 'cbsa_msa_2015.xls')), !file.exists(data_dir, 'cbsa_msa_csa_2015.xls'))), cache.extra=c(file.mtime(file.path(data_dir, 'PEP_2015_GCTPEPANNR.US24PR.csv')), file.mtime(file.path(data_dir, 'wudata_out_NWIS2016-10-29.csv')), file.mtime(file.path(data_dir, 'cbsa_msa_2015.xls')), file.mtime(file.path(data_dir, 'cbsa_msa_csa_2015.xls')), file.mtime(file.path(script_dir, 'build_explanatory_variables.R')))>>=
source(file.path(script_dir, 'build_explanatory_variables.R'))
build_explanatory_variables(data_dir, save = TRUE, pop_target_year = pop_target_year)
@
<<reprocess_city_coordinates, echo=F, eval=recalc_city_coords | recalc_vwci, message=F, warning=F, error=F, results='hide', cache=do_cache, cache.rebuild=!file.exists(file.path(data_dir, 'VWCI_city_coordinates.csv')), cache.extra=c(file.mtime(file.path(data_dir, 'VWCI_DATA_10_15_16.xlsx')), file.mtime(file.path(script_dir, 'extract_city_coordinates.R')))>>=
source(file.path(script_dir, 'extract_city_coordinates.R'))
extract_city_coordinates(recalc_coords = recalc_city_coords)
@
<<reprocess_aridity, echo=F, eval=recalc_climate, message=F, warning=F, error=F, results='hide', cache=do_cache, cache.rebuild=!all(file.exists(file.path(data_dir, 'air.mon.mean.v401.nc')), file.exists(file.path(data_dir, 'precip.mon.total.v401.nc')), file.exists(file.path(data_dir, 'cities_aridity.Rda'))), cache.extra=c(file.mtime(file.path(data_dir, 'air.mon.mean.v401.nc')), file.mtime(file.path(data_dir, 'precip.mon.mean.v401.nc')), file.mtime(file.path(data_dir, 'VWCI_city_coordinates.csv')), file.mtime(file.path(script_dir, 'udel_temp_precip.R')))>>=
source(file.path(script_dir, 'udel_temp_precip.R'))
process_city_aridity('data')
@
<<reprocess_data, echo=F, eval=recalc_msa_variables, message=F, warning=F, error=F, results='hide', cache=do_cache, cache.rebuild=!all(file.exists(file.path(data_dir, 'msa_predictors.Rds')), file.exists(file.path(data_dir, 'state_predictors.Rds')), file.exists(file.path(data_dir, 'filtered_data.Rds')), file.exists(file.path(data_dir, 'standardized_data.Rds'))), cache.extra=c(file.mtime(file.path(data_dir, 'cities_aridity.Rds')), file.mtime(file.path(data_dir, 'regional_price_parity.xls')), file.mtime(file.path(data_dir, 'real_personal_income.xls')), file.mtime(file.path(data_dir, 'VWCI_explanatory_variables.csv')), file.mtime(file.path(data_dir, 'state_2010_surface_water.csv')), file.mtime(file.path(data_dir, 'regional_price_parity_by_state.xls')), file.mtime(file.path(data_dir, 'real_personal_income_by_state.xls')), file.mtime(file.path(data_dir, 'pvi_by_state.xls')), file.mtime(file.path(data_dir, 'VWCI_DATA_10_15_16.xlsx')), file.mtime(file.path(script_dir, 'load_city_data.R')), file.mtime(file.path(script_dir, 'load_state_data.R')), file.mtime(file.path(script_dir, 'load_data.R')))>>=
source(file.path(script_dir, 'load_city_data.R'))
process_city_climate(bea_year = bea_year)
source(file.path(script_dir, 'load_state_data.R'))
process_state_data('data', '2014')
@
<<standardize_data, echo=F, eval=recalc_standardized_data || recalc_msa_variables, message=F, warning=F, error=F, results='hide', cache=do_cache, cache.rebuild=!all(file.exists(file.path(data_dir, 'msa_data.csv')), file.exists(file.path(data_dir, 'state_covariates.csv')), file.exists(file.path(data_dir, 'filtered_data.Rds')), file.exists(file.path(data_dir, 'filtered_pooled_data.Rds')), file.exists(file.path(data_dir, 'standardized_data.Rds')), file.exists(file.path(data_dir, 'standardized_pooled_data.Rds'))), cache.extra = c(file.mtime(file.path(script_dir, 'load_data.R')))>>=
source(file.path(script_dir, 'load_data.R'))
process_data(pooled = TRUE)
process_data(pooled = FALSE)
@
<<get_all_cities, echo=F, eval=T, message=F, warning=F, error=F, results='hide', cache = do_cache, cache.rebuild=file.exists(file.path(data_dir, 'VWCI_city_coordinates.csv')), cache.extra=c(file.mtime(file.path(data_dir, 'VWCI_city_coordinates')))>>=
vwci_all <- read_csv(file.path(data_dir, 'VWCI_city_coordinates.csv'))
n_all_cities <- nrow(vwci_all)
n_all_states <- vwci_all$state %>% unique %>% length
rm(vwci_all)
@
<<source_fit_model, echo=F, eval=T, message=F, warning=F, error=F, results='hide', cache = FALSE, cache.extra=c(file.mtime(file.path(script_dir, 'fit_model.R')), file.mtime(file.path(data_dir, 'standardized_data.Rds')))>>=
source(file.path(script_dir, 'fit_model.R'))
@
<<fit_models, echo=F, eval=resample_models, message=F, warning=F, error=F, results='hide', cache = do_cache && ! force_resampling, cache.rebuild=!file.exists('data/model_fits.Rds'), cache.extra=c(random_alpha, std_data, vars_1, file.mtime(file.path(script_dir, 'water_conserve_ml_vary_intercept_beta_alpha.stan')))>>=
process_models(std_data, vars = model_vars,
data.dir = 'data',
abs_rank = TRUE,
var_sel = c("vars_1"),
beta = c(TRUE), multilevel = c(TRUE),
random_alpha = c(TRUE),
remove_desal = FALSE,
mu_phi_vwci = mu_phi_vwci, sigma_phi_vwci = sigma_phi_vwci,
mu_phi_rr = mu_phi_rr, sigma_phi_rr = sigma_phi_rr,
sigma_sigma_delta = sigma_sigma_delta,
dependent_vars = c("VWCI", "REQ", "REB"),
filename = file.path(data_dir, "model_fits.Rds"),
seed = 2105282946, comment = "main")
@
<<test_models, echo=F, eval=resample_test_models, message=F, warning=F, error=F, results='hide', cache = do_cache && ! force_resampling, cache.rebuild=!file.exists('data/model_fits.Rds')>>=
process_models(std_data, vars = model_vars,
data.dir = 'data',
abs_rank = TRUE,
var_sel = c("vars_1", "vars_2", "vars_3", "vars_4", "vars_5",
"vars_6", "vars_7", "vars_8", "vars_9", "vars_10",
"vars_11"
),
beta = c(TRUE, FALSE), multilevel = c(TRUE, FALSE),
random_alpha = c(TRUE),
remove_desal = FALSE,
mu_phi_vwci = mu_phi_vwci, sigma_phi_vwci = sigma_phi_vwci,
mu_phi_rr = mu_phi_rr, sigma_phi_rr = sigma_phi_rr,
sigma_sigma_delta = sigma_sigma_delta,
dependent_vars = "VWCI",
filename = file.path(data_dir, "test_model_fits.Rds"),
seed = 1796870325, comment = "test")
@
<<test_pooled_models, echo=F, eval=resample_pooled_test_models, message=F, warning=F, error=F, results='hide', cache = do_cache && ! force_resampling, cache.rebuild=!file.exists('data/model_fits.Rds')>>=
process_models(std_pooled_data, vars = model_vars,
data.dir = 'data',
abs_rank = TRUE,
var_sel = c("vars_1", "vars_2", "vars_3", "vars_4", "vars_5",
"vars_6", "vars_7", "vars_8", "vars_9", "vars_10",
"vars_11"
),
beta = c(TRUE, FALSE), multilevel = c(TRUE, FALSE),
random_alpha = c(TRUE),
remove_desal = FALSE,
mu_phi_vwci = mu_phi_vwci, sigma_phi_vwci = sigma_phi_vwci,
mu_phi_rr = mu_phi_rr, sigma_phi_rr = sigma_phi_rr,
sigma_sigma_delta = sigma_sigma_delta,
dependent_vars = "VWCI",
filename = file.path(data_dir, "test_model_fits_pooled.Rds"),
seed = 485695703, comment = "pooled")
@
<<load_model_fits, echo=F, eval=T, message=F, warning=F, error=F, results='hide', cache=do_cache, cache.extra=c(file.mtime(file.path(data_dir, 'model_fits.Rds')), file.mtime(file.path(data_dir, 'standardized_data.Rds')), file.mtime(file.path(data_dir, 'filtered_data.Rds')))>>=
set.seed(587365757) # Keep things reproducible when I sometimes run the preceeding chunks and sometimes don't.
basic_data <- read_rds(file.path(data_dir, 'filtered_data.Rds'))
scaled_data <- read_rds(file.path(data_dir, 'standardized_data.Rds'))
model_fits <- read_rds(file.path(data_dir, 'model_fits.Rds'))
msa_data <- scaled_data$msa_data
state_data <- scaled_data$state_data
n_cities <- nrow(msa_data)
n_states <- msa_data$state %>% unique %>% length
@
%
% set_top_vwci_count
%
<<set_top_vwci_count, echo=F, eval=T, output="hide", cache=do_cache>>=
n_top <- list(n = 20, name = 'twenty')
@
<<captions, echo=F, eval=T, message=F, warning=F, error=F, results="hide">>=
if (for_submission) {
fig_body_results = "hide"
tab_body_results = "hide"
fig_end_matter_results = "asis"
tab_end_matter_results = "asis"
} else {
fig_body_results = "asis"
tab_body_results = "asis"
fig_end_matter_results = "hide"
tab_end_matter_results = "hide"
}
figure_captions <- c(
vwci_map = "Map of cities with VWCI scores.",
vwci_histogram = "Distribution of VWCI, requirements, and rebates.",
vwci_cat_plot = str_c("Scaled regression coefficients for VWCI, requirements, and rebates: $\\gamma$ refer to state-level regression coefficients and $\\beta$ to MSA-level ones.\n%Coefficients are scaled so the value on the horizontal axis represents the approximate change in the probability $p_i$ of adopting an action, corresponding to a two-standard-deviation change in the predictor when $p_i$ is around 0.5.\nFor a scaled coefficient of 0.1, a two-standard-deviation change in the predictor corresponds to VWCI changing by about ", round(n_actions * 0.10), " for a city with a VWCI of around ", round(n_actions / 2), ", the number of requirements changing by about ", round(n_req * 0.10), " for a city with around ", round(n_req / 2), " requirements, and the number of rebates changing by about ", round(n_reb * 0.10), " for a city with around ", round(n_reb / 2), " rebates. The points represent the median of the posterior, the thick lines the ", round(100 * (thick_ci[2] - thick_ci[1])), "\\% highest-density interval (HDI), and the thin lines the ", round(100 * (thin_ci[2] - thin_ci[1])), "\\% HDI. Coefficients are grouped by state vs.\ city level and then ordered within each group by absolute value of the median for the VWCI analysis."),
vwci_residuals = str_c("Predicted versus actual VWCI. Cities with the ten largest residuals are labeled. Cities in California, Florida, and Texas are indicated by color (these states contain ", (basic_data$msa_data %>% mutate(CaFlTx = state %in% c("CA", "FL", "TX")) %>% group_by(CaFlTx) %>% summarize(count = n()) %>% ungroup() %>% mutate(pct = round(100 * count / sum(count))) %>% dplyr::filter(CaFlTx) %>% dplyr::select(pct) %>% simplify() %>% unname()), "\\% of the MSAs in our data set)")
)
table_captions <- c(
top_vwci = str_c("Cities with the ", n_top$name, " highest VWCI scores. Req. = requirements, Reb. = rebates. A complete list of all ", n_cities, " cities appears in Table~S1."),
vwci_top_residuals = "Cities with the ten largest residuals from VWCI regression."
)
@
\begin{document}
\title{Urban Water Conservation Policies in the United States}
\authors{Jonathan M. Gilligan\affil{1,2,3}, Christopher A. Wold\affil{3}, Scott C. Worland\affil{2}, John J. Nay\affil{3,4,5}, David J. Hess\affil{3,6}, George M. Hornberger\affil{1,2,3}}
\affiliation{1}{Department of Earth \& Environmental Sciences, Vanderbilt University, Nashville, Tennessee, USA}
\affiliation{2}{Department of Civil \& Environmental Engineering, Vanderbilt University, Nashville, Tennessee, USA}
\affiliation{3}{Vanderbilt Institute for Energy and Environment, Vanderbilt University, Nashville, Tennessee, USA}
\affiliation{4}{Information Law Institute, New York University, New York, New York, USA}
\affiliation{5}{Berkman Klein Center, Harvard University, Cambridge, Massachusetts, USA}
\affiliation{6}{Department of Sociology, Vanderbilt University, Nashville, Tennessee, USA}
% \affiliation{6}{U.S. Geological Survey, Nashville, Tennessee, USA}
\correspondingauthor{Jonathan M. Gilligan}{[email protected]}
%% Keypoints, final entry on title page.
% Example:
% \begin{keypoints}
% \item List up to three key points (at least one is required)
% \item Key Points summarize the main points and conclusions of the article
% \item Each must be 100 characters or less with no special characters or punctuation
% \end{keypoints}
% List up to three key points (at least one is required)
% Key Points summarize the main points and conclusions of the article
% Each must be 100 characters or less with no special characters or punctuation
\begin{keypoints}
\item Analysis of water conservation policies of 195 cities in 45 states
\item Water conservation policies correlate with both environmental and social variables
\item Correlations with partisan voting patterns explain
much of the variation
in policy adoption.
\end{keypoints}
%% ------------------------------------------------------------------------ %%
%
% ABSTRACT
%
% A good abstract will begin with a short description of the problem
% being addressed, briefly describe the new data or analyses, then
% briefly states the main conclusion(s) and how they are supported and
% uncertainties.
%% ------------------------------------------------------------------------ %%
%% \begin{abstract} starts the second page
\begin{abstract}
Urban water supply systems in the United States are increasingly stressed as
economic and population growth confront limited water resources.
Demand management, through conservation and improved efficiency, has long been
promoted as a practical alternative to building Promethean energy-intensive
water-supply infrastructure. Some cities are making great progress at managing
their demand, but study of conservation policies has been limited and often
regionally focused. We present a hierarchical Bayesian analysis of a new measure
of urban water conservation policy, the Vanderbilt Water Conservation Index (VWCI),
for \Sexpr{n_cities}~cities in \Sexpr{n_states}~states in the contiguous United
States.
This study does not attempt to establish causal relationships, but does
observe that cities in states with arid climates
tend to adopt more conservation measures.
Within a state, cities with
more Democratic-leaning voting preferences and
large and rapidly growing populations
tend to adopt more conservation measures.
Economic factors and climatic differences between cities do not correlate with
the number of measures adopted, but they do correlate with
the character of the measures, with arid cities favoring mandatory conservation
actions and cities in states with lower real personal income favoring rebates
for voluntary actions.
Understanding relationships between environmental and
societal factors and cities' support for water conservation measures can help
planners and policy-makers identify obstacles and opportunities to increase the
role of conservation and efficiency in making urban water supply systems
sustainable.
\end{abstract}
%% ------------------------------------------------------------------------ %%
%
% TEXT
%
%% ------------------------------------------------------------------------ %%
\section{Introduction}
Cities face increasing challenges to their water supply because of complex
interactions among drought, infrastructure, population growth, land-use changes,
and other natural and human factors.
The prospect of climatic change adds to the difficulty of planning robust and
sustainable water supply systems, on account both of increasing uncertainty
about future supply and demand for water, and also of predicted reductions in
water availability in some regions, such as the Southwestern United States
\citep{gcrp:natl.assessment.3:2014}.
For many years, advocates of soft approaches to managing water resources have
stressed the importance of improving the efficiency with which society obtains
and consumes
the water services it requires \citep{gleick:soft.water.paths:2002}.
Some cities and their associated water-supply systems respond to these
challenges by pursuing grand energy-intensive infrastructure projects to draw
water from distant or difficult sources. However,
many have also shown increasing
interest in the soft path, which involves
managing demand through efficiency and conservation
measures, and a number of cities have made impressive progress
\citep{fleck:fighting:2016}.
A pressing challenge is to identify the characteristics of successful
transitions toward sustainable water use and the necessary conditions for those
transitions to spread more widely.
Studies have investigated urban water conservation policies and several water
conservation indices are available, but these studies either lack comprehensive coverage
of water conservation policies or are geographically limited
\citep{hess:vwci:2017,sauri:conservation:2013,maggioni:conservation:2014}.
Recent research finds that individual perception of water scarcity and
preference for policy action to address scarcity depend not only on the actual
degree of water scarcity, but also on the person's ideological worldview
\citep{switzer:green.lenses:2016}, but it is not clear how these individual
preferences translate into policy action.
The Vanderbilt Water Conservation Index (VWCI) is an integer score representing
the number of measures that a city has taken to reduce its water demand, out of
a list of \Sexpr{n_actions} possible policy actions
\citep{hornberger:hydrological.transitions:2015,hess:drought:2016,hess:vwci:2017}.
This list includes \Sexpr{n_req} requirements, such as restrictions on
lawn-watering or mandatory use of water-efficient plumbing in new construction
and renovations; and \Sexpr{n_reb} rebates offered for voluntary actions, such
as purchasing water-efficient appliances.
Previously, we assessed this index and performed preliminary quantitative and
qualitative analyses on a subset of the central cities of the 22 largest
metropolitan statistical areas (MSAs) in the extended Southwestern United States
\citep{hess:drought:2016}.
That preliminary analysis found that propensity to
adopt water conservation measures depended both on characteristics
of the physical environment (precipitation) and on socio-economic and political
characteristics of the MSA (partisan voting and cost of living).
We represented the partisan political leanings of states and MSAs by the
Cook Partisan Voting Index (PVI) \citep{cook:pvi:2013}. The index measures
the difference between the percentage of the two-party vote received by
Democratic presidential candidates in a city or MSA and the percentage received
in the national election. Positive or negative values measure the
city's or state's preference for the Democratic (more progressive)
or Republican (more conservative) candidates,
respectively, relative to the national average.
Previous studies have indicated that a more Democratic-leaning PVI score is associated with higher environmental
policy adoption, including for renewable energy \citep{chupp:environ.voting:2011}
and water-conservation policies \citep{hess:drought:2016}, and conversely that a lower score is associated with
higher greenhouse gas emissions from power plants \citep{grant:environ.accountability:2017}.
Municipal water conservation policies are chosen by city officials, so it
may seem curious to measure partisan leanings with respect to presidential
elections. However, presidential votes have the advantage of providing a uniform
scale because voters across the country choose between the same candidates,
whereas the differences between Democratic and Republican candidates for
city and state positions vary considerably from one city or state to another,
with narrow ideological differences in some elections and wide differences in
others. Assessing partisan leanings in state and local
elections becomes more complicated and difficult when we consider that
the vast majority of American cities hold non-partisan city-council elections
\citep{svara:city.councils:2003} and that more than one third of state-legislature
races are uncontested by one of the major parties
\citep{klarner:state.elections:2015,ap:uncontested:2006}.
Moreover, detailed vote
counts are not readily available for municipal- and state-level elections as
they are for presidential elections. For all these reasons, presidential
voting patterns are a useful measure for gauging regional variations in partisan
leanings.
Finally, we note that the rate of split-ticket voting has declined rapidly
and consistently since the late 1980s \citep{fiorina:renationalization:2016}
and that even voters who self-identify as independent are found to align
consistently with one party or the other \citep{hawkins:motivated:2012},
so we believe that
presidential votes are good indicators for comparing the partisan leanings
that voters bring to state and local elections across the United States.
% Two important economic indicators are
% Regional Price Parity (RPP) and Real Personal Income (RPI),
% which represent, respectively,
% the cost of living of a city or state, relative to the national average
% and the average personal income in the city or state, adjusted for cost of
% living and inflation \citep{bea:rpp.methodology:2016}.
% Our previous analysis examined RPP, but here we chose to examine RPI because
% we expected that the ratio of income to cost of living would be more relevant
% than cost of living alone.
%
Our previous analysis found that the measure of Democratic-leaning voting preference (PVI)
for an MSA had, by a large margin,
the greatest predictive power for adopting water conservation measures
\citep{hess:drought:2016}.
% , with RPP having the second-greatest predictive power.
The importance of this measure
was supported by qualitative analysis, which suggested that the political
association of water conservation with general ``environmentalist'' politics
created political differences in the ways that cities respond to the physical
fact of water scarcity.
We subsequently expanded our database to the central cities in the
\Sexpr{n_all_cities}~most populous MSAs in
the United States \citep{hess:vwci:2017},
which span \Sexpr{n_all_states} states and comprise more than half of the 382~MSAs
in the U.S.
\citep{census:population:2015}. Here, we present the first quantitative
analysis of the larger database, analyzing relationships between environmental
and societal characteristics of \Sexpr{n_cities}~cities across the contiguous
United States and those cities' propensity to adopt water conservation policies.
Our goal is to assess differences across MSAs that are correlated with higher or
lower levels of water-conservation policy adoption. Our analysis also
considers state-level
variables because MSA policy adoption can be strongly influenced by state-government actions,
such as the California state government's strong policy support for water conservation).
We identify which variables correlate most strongly with variations in water conservation policy
regimes and whether these
effects manifest differently when we consider specific aspects of
water-conservation policy, such as requirements and rebates.
%We find that differences in climate, political leanings of voters, and
%demographics are reflected in the number and kind of water conservation
%measures adopted by U.S. cities. Such insights
This may help water managers and policy makers to anticipate how
receptive a given city might be to adopting different conservation measures.
\section{Data and Methods}
\label{sec:data.methods}
\subsection{Vanderbilt Water Conservation Index}
%
%
%
<<vwci_map, eval = T, include=!for_submission, fig.cap=figure_captions['vwci_map'], fig.height=5, fig.width=10 * 6.0 / 6.5, out.width="6.5in", fig.pos="htp", out.extra="angle=0", cache.extra=c(basic_data), results = fig_body_results>>=
## prepare data
vwci <- basic_data$msa_data %>% dplyr::select(lat, lon, vwci)
state.poly <- map_data("state")
p1 <- ggplot() +
geom_polygon(data=state.poly, aes(x=long, y=lat, group=group), fill="grey95", color="gray80") +
geom_point(data=vwci, aes(x=lon, y=lat, fill=vwci), shape=21, size = 1.5, alpha=0.8) +
scale_size_continuous(range=c(0.1,7), breaks=c(5,10,15,20,25,40,50), name = "VWCI") +
scale_fill_viridis(option="viridis", name = 'VWCI') +
scale_x_continuous(limits=c(-125,-60), breaks=seq(-120,-60,25), labels=c(paste(c(120,95,70),"°W", sep="")),name="") +
scale_y_continuous(limits=c(25,50),breaks=seq(30,45,5), labels=c(paste(seq(30,45,5),"°N", sep="")),name="")+
geom_rangeframe(data=state.poly, aes(x=long, y=lat)) +
coord_fixed(1.3) +
theme_tufte(base_size=15) +
theme(axis.text.y=element_text(angle = 90, hjust=0.5),legend.position = c(0.83, 0.33))
ggsave(file.path(output_dir, 'figures', 'fig_1.pdf'), p1, 'pdf', height=5, width = 10 * 6.0 / 6.5)
print(p1)
@
%
% vwci_histogram figure
%
<<vwci_histogram, echo=F, include=!for_submission, fig.height=6, fig.width=5, fig.pos='tb', fig.align="center", fig.cap=figure_captions['vwci_histogram'], cache=do_cache, cache.extra = c(theme_get()), results = fig_body_results>>=
std_data$msa_data %>%
dplyr::select(VWCI = vwci, Rebates = rebtotal, Requirements = reqtotal) %>%
gather(key = measure, value = value) %>%
mutate(measure = ordered(measure, levels = c('VWCI', 'Requirements', 'Rebates'),
labels = c('(a) VWCI', '(b) Requirements', '(c) Rebates'))) %>%
ggplot(aes(x = value)) +
geom_histogram(binwidth = 1) +
scale_x_continuous(expand = c(0,0.25), limits=c(NA,55)) +
labs(x = "# policies adopted", y = "# cities") +
facet_wrap(~measure, ncol = 1, scales = 'free_y') +
theme(strip.background = element_blank()) -> p2
ggsave(file.path(output_dir, "figures", "fig_2.pdf"), p2, "pdf", height=6, width=5)
print(p2)
@
%The VWCI data set provides detailed information about both the number and
%nature of water conservation policies in \Sexpr{n_all_cities }~cities in
%\Sexpr{n_all_states}~states \citep{hess:drought:2016,hess.vwci:2017}.
The VWCI data set reports both the number and
nature of urban water conservation policies, which allows hierarchical analysis
to examine the relationships between water conservation scores and
hydroclimatological and societal characteristics at both state and MSA levels.
Our complete database includes Anchorage, AK, and Honolulu, HI, but for this
analysis we chose to include only cities within the contiguous United States
(\Sexpr{n_cities}~cities in \Sexpr{n_states}~states,
Figure~\ref{fig:vwci_map})
because both
climatic and socio-political characteristics of Alaska and Hawaii may be very
different from those of the contiguous states, and also because state-level
variables that average over the enormous area of Alaska and across the multiple
Hawaiian islands may render them less suitable for our hierarchical treatment.
VWCI scores range from~\Sexpr{min(std_data$msa_data$vwci)}
to~\Sexpr{max(std_data$msa_data$vwci)}
with a mean of \Sexpr{round(mean(std_data$msa_data$vwci),1)}
and a median of \Sexpr{round(median(std_data$msa_data$vwci), 1)}
(Figure~\ref{fig:vwci_histogram}a; Table~S1).
The set of \Sexpr{n_actions} measures for each city makes it possible to construct different variables.
In this study, we focus on three variables: the composite VWCI of all \Sexpr{n_actions} policies,
the subset of \Sexpr{n_req} requirements, and the subset of \Sexpr{n_reb} rebates.
The requirement scores range
from~\Sexpr{min(std_data$msa_data$reqtotal)}
to~\Sexpr{max(std_data$msa_data$reqtotal)}
with a mean of \Sexpr{round(mean(std_data$msa_data$reqtotal),1)}
and a median of \Sexpr{round(median(std_data$msa_data$reqtotal), 1)}
(Figure~\ref{fig:vwci_histogram}b)
and the rebate scores range
from~\Sexpr{min(std_data$msa_data$rebtotal)}
to~\Sexpr{max(std_data$msa_data$rebtotal)}
with a mean of \Sexpr{round(mean(std_data$msa_data$rebtotal),1)}
and a median of \Sexpr{round(median(std_data$msa_data$rebtotal), 1)}
(Figure~\ref{fig:vwci_histogram}c).
%
% top_vwci table
%
<<top_vwci, echo=F, include=!for_submission, cache.extra=c(model_fits), results = tab_body_results>>=
tab_placement = ifelse(for_submission, "htbp", "tbp")
include.aridity.in.table = FALSE
if(include.aridity.in.table) {
table_data <- std_data$msa_data %>%
arrange(desc(abs(vwci))) %>%
mutate(city = str_c(city, as.character(state), sep = ', ')) %>%
dplyr::select(city, vwci, reqtotal, rebtotal, aridity) %>%
mutate(ratio = rebtotal / reqtotal) %>%
top_n(n_top$n, vwci) %>%
mutate(rank = rank(-vwci, ties.method = 'min')) %>%
dplyr::select(rank, city, vwci, reqtotal, rebtotal, ratio, aridity) %>%
setNames(c('Rank', 'City', 'VWCI', "Req.", "Reb.", "Reb./Req.", "$\\Delta \\text{aridity}$"))
# if (nrow(table) != top_n$n) stop("Error: table of top ", top_n$n, " cities has ", nrow(table), " rows.")
write_csv(table_data, path=file.path(output_dir, "tables", "Table_1.csv"))
table_data %>%
xtable(caption = table_captions['top_vwci'],
label="tab:top_vwci", align="lrlrrrrr",
digits=2,
display=c('s', 'd', 's','d','d','d', 'f', 'f')) %>%
print("latex", floating=T, table.placement=tab_placement,
include.rownames=F, include.colnames=T,
sanitize.text.function = function(x) x,
sanitize.colnames.function=function(x) paste("\\multicolumn{1}{c}{", x, "}"),
math.style.negative=F)
} else {
table_data <- std_data$msa_data %>%
arrange(desc(abs(vwci))) %>%
mutate(city = str_c(city, as.character(state), sep = ', ')) %>%
dplyr::select(city, vwci, reqtotal, rebtotal) %>%
mutate(ratio = reqtotal/rebtotal) %>%
top_n(n_top$n, vwci) %>%
mutate(rank = rank(-vwci, ties.method = 'min')) %>%
dplyr::select(rank, city, vwci, reqtotal, rebtotal, ratio) %>%
setNames(c('Rank', 'City', 'VWCI', "Req.", "Reb.", '$\\text{Req.}/\\text{Reb.}$'))
# if (nrow(table) != n_top$n) stop("Mismatch: table of top ", n_top$n, " has ", nrow(table), " rows.")
table_data %>% mutate_at("$\\text{Req.}/\\text{Reb.}$", funs(round(.,2))) %>%
write_csv(path=file.path(output_dir, "tables", "Table_1.csv"))
table_data %>%
xtable(caption = table_captions['top_vwci'],
label="tab:top_vwci", align="lrlrrrr",
digits=2,
display=c('s', 'd', 's', 'd', 'd', 'd', 'f')) %>%
print("latex", floating=T, table.placement=tab_placement,
include.rownames=F, include.colnames=T,
sanitize.text.function = function(x) x,
sanitize.colnames.function=function(x) paste("\\multicolumn{1}{c}{", x, "}"),
math.style.negative=F)
}
@
The \Sexpr{n_top$n}~cities with the greatest VWCI scores are in states in the
Western U.S., with the exception of two cities in Florida and one in New York
(Tables~\ref{tab:top_vwci}, S1). Cities with similar total VWCI scores, such as
New York and Salt Lake City vs.\ Tampa and Vallejo, El Paso vs.\ Miami, and
Riverside vs.\ Fresno, may have very different relative contributions from
requirements and rebates.
Our previous qualitative research
indicated that there might be a preference for rebates over requirements in more
politically conservative cities \citep{hess:drought:2016,brown:politics:2016}.
Consequently, in addition to the VWCI score we also analyzed the number of
rebates and the number of requirements in a city's portfolio of conservation
policies.
The relative contributions of rebates and requirements to VWCI varied
considerably (Tables~\ref{tab:top_vwci}, S1)
\subsection{Explanatory Variables}
We selected explanatory variables with the goal of understanding the
relative contribution of political-economic and hydroclimatological factors
to policy adoption.
Datasets containing these covariates and the
VWCI data are available from
\citet{gilligan:vwci.data:2018}.
We selected one political variable (the Cook Partisan Voting Index, PVI) for reasons
described above and one economic variable (real personal income, RPI).
We calculated PVI using state- and county-level presidential votes averaged over the
2008 and 2012 elections \citep{cook:pvi:2013,cq:elections:2016}
(Tables~S1--S2 and Datasets~S1--S4).
Positive PVI indicates that the Democratic share of the two-party vote was greater
than the national average, and negative PVI indicates a greater Republican vote share.
Income is generally correlated with education and with a range of other variables that
are associated with differences in policy adoption, and we reasoned that in poorer
regions there may be a preference for rebates and other optional policies.
We chose real personal income (RPI), which represents the average personal income in the city
or state, adjusted for inflation and the regional cost of living
\citep{bea:rpp.methodology:2016}.
We obtained RPI for MSAs and states in 2014 from the
\citet{bea:rpi:2016}.
For the hydroclimatological variables, we
obtained values for the MSA-level monthly average temperature ($T$) and monthly
total precipitation ($P$) for the period 1900--2014 from the University of Delaware's
gridded climate reanalysis
\citep{matsuura:gridded.temp:2015,matsuura:gridded.precip:2015}
and for state-level from the National Climatic Data Center's divisional temperature
and precipitation records for the Continental U.S.
\citep{vose:nclimdiv:2014}.
We calculated the mean annual temperature and precipitation over a
30-year interval from 1985--2014.
We obtained the fraction of the public water supply taken from surface water sources
(henceforth, surface-water fraction) from the U.S. Geological
Survey's water use report for 2010 \citep{maupin:water.use:2014}.
In addition to the two political-economic variables and the two hydroclimatological variables,
we included measures for total population and population growth of the MSA. Our previous research had
indicated that larger cities have more capacity to adopt water-conservation policies and that more
rapidly growing cities are more concerned with water supply. We obtained populations for MSAs and states
in 2010 and 2014 from the \citet{census:population:2015} and used them to calculate average annual rates
of population growth.
We began with the covariates listed above, but adopted a modified set based on
preliminary results: The regression coefficients for $T$ and $P$ showed
collinearity.
This, together with a desire for parsimony, led us to replace them with the
K\"oppen aridity index: $P / (T + 33)$, with $P$ in millimeters and $T$ in
Celsius.
The K\"oppen index is
derived from mean annual temperature and precipitation
and correlates reasonably well with other aridity indices
that incorporate evapotranspiration
\citep{quan:aridity:2013}.
Larger values of this index correspond to wetter conditions
and smaller values to drier conditions.
The distribution of MSA population in~\Sexpr{pop_target_year} was skewed, with
a few large cities producing an asymmetric fat tail (Figure~S1),
so we deemed the natural logarithm of population
a more suitable predictor
for a regression analysis \citep[pp.~59--61]{gelman:arm:2007}.
\subsection{Regression Analysis}
We applied Bayesian hierarchical varying-intercept
logistic regression to each of the
three conservation scores (VWCI, number of requirements, and number of rebates)
with the following explanatory variables:
PVI (Democratic Party preference), RPI (real per-capita income),
the K\"oppen aridity index, surface water fraction,
metropolitan population, and population growth rate.
The hierarchical structure of our regression model, which nests cities and MSAs
within states, reflects the fact that water resources extend beyond MSA
boundaries and that state-level policies affect local actions, both by
constraining local ordinances and regulations, and by encouraging or requiring
counties and cities to adopt various types of conservation measures.
We also considered a hierarchical varying-slope model, but with an average
of only 4.3 MSAs per state there were too few degrees of freedom to adequately
constrain 6 additional regression coefficients for each state.
The regression analysis follows standard textbook treatments
\citep{gelman:arm:2007,gelman:bda:2014}.
Each city's water-conservation score (VWCI, requirements, or rebates) is
an integer count out of a maximum number of possible actions.
We model this as a quasi-binomial process, in which each policy action
for city $i$ is adopted independently, with
the $m$\textsuperscript{th} action, $a_m$, having probability
$p_{im}$.
In a pure binomial process, each policy in city $i$ would have the same
probability $p_i$ of being adopted.
The variance of a pure binomial distribution is uniquely determined
by its mean and the variance of the VWCI data was
greater than a pure binomial
distribution could account for,
so our model uses a beta-binomial process, which allows for greater dispersion
of scores by drawing the probabilities $p_{im}$
for the different actions in city $i$ from a beta distribution with mean $p_i$
\citep[pp.~437--38]{gelman:bda:2014}.
We model the cities' mean probabilities $p_i$
with a hierarchical varying-intercept logistic
regression in which MSAs are nested within states and
water-conservation policy measures
are associated with both
state-level and MSA-level covariates.
For city $i$, located in state $j$,
\begin{linenomath*}
\begin{align}
\text{Score}_i \sim{}& \text{beta-binomial}(N_{\text{action}}, \phi p_i, \phi (1 - p_i)), \\
\intertext{where}
p_i ={}& \text{logit}^{-1}(y_i),\\
y_i ={}& \alpha_{j} + \sum_{k \in \text{MSA variables}} \beta_k x_{ik}, \label{eq:y}\\
\alpha_{j} ={}&
\alpha_0 + \sum_{k \in \text{state variables}} \gamma_k w_{jk} + \delta_{j},
\end{align}
\end{linenomath*}
$N_{\text{action}}$ is the number of possible conservation actions
(\Sexpr{n_actions} for VWCI, \Sexpr{n_req} for requirements, and
\Sexpr{n_reb} for rebates),
$\beta$ is a vector of MSA-level regression coefficients,
$x$ is a matrix of MSA-level covariates,
% ($x_{i,j}$ is the $j$th covariate for city $i$),
% $\text{state}_i$ represents the state of city $i$,
and $\alpha_j$ is the intercept of the MSA-level regression
for all cities in state $j$.
%and $\epsilon_i$ represents random noise at the MSA-level, which we do not explicitly model.
We do not include a noise term in Equation~\ref{eq:y} because random variations
related to the probabilities $p_i$ are incorporated in the beta-binomial
process \citep[p.~321]{gelman:arm:2007}.
The inverse logit function maps the real numbers onto the interval $(0,1)$,
thus transforming $y_i$ to a probability.
The overdispersion of conservation scores is parameterized by $\phi$:
as $\phi \rightarrow \infty$, the distribution of conservation scores approaches
a binomial distribution, and the smaller $\phi$ is, the greater the
overdispersion.
In modeling $\alpha_j$,
$\alpha_0$ is the mean intercept across all states,
$\gamma$ is a vector of state-level regression coefficients,
$w$ is a matrix of state-level covariates,
% ($w_{\text{state}_i,k}$ is the $k$th covariate for the state of the $i$th MSA),
and
$\delta$ is a random noise term that represents state-level variations not
accounted for by the regression model.
%$\sigma$ is a scale parameter for $\delta$.
We constrain $\delta$ to sum to zero in order to keep the model identifiable
\citep[Ch.~23]{stan:manual:2015}.
We standardized the independent variables $x$ and $w$ to put them on a common
scale. This serves both
to facilitate comparing the relative importance of variables whose natural scales are
vastly different
\citep[pp.~55--57]{gelman:prior:2008,gelman:arm:2007},
and also to improve computational performance \citep{stan:manual:2015}.
We rescaled the state-level covariates to have means of zero and
standard deviations of 0.5, as recommended by \citet{gelman:prior:2008}
and \citet[pp.~55--57]{gelman:arm:2007}:
$w_{\text{scaled}} = (w - \mu_w) / (2 \sigma_w)$,
where $w$ is a state-level covariate,
$\mu_w$ is the mean over all states,
and $\sigma_w$ is the standard deviation over all states.
Where the same covariate appeared at both the state and
MSA level,
we addressed multicollinearity between state-level and MSA-level
variables by rescaling the MSA-level covariate
to the same scale as its state-level counterpart and
subtracting the state-level value,
so that MSA-level covariate represented the difference between the MSA and
the state-average:
$x_{\text{scaled}} = (x - w) / (2 \sigma_w)$, where $x$ is an MSA-level
covariate, $w$ is the state-level covariate for the MSA's state, and $\sigma_w$
is the standard deviation of $w$ over all states.
Covariates that only appeared at the MSA level were rescaled so
the mean over all \Sexpr{n_cities}~MSAs was zero and the standard deviation~0.5:
$x_{\text{scaled}} = (x - \mu_x) / (2 \sigma_x)$, as above.
We recognize that there are various ways that the multilevel model could be
configured.
Our choice of models was based on the theoretical objective of highlighting MSA-level
difference in water-conservation policies. This choice involved three decision criteria.
First, we selected the four main variables (Democratic-leaning, income, aridity, and
surface water) on theoretical grounds described above with our goal of understanding the
relative strength of political and economic factors and hydrogeological factors. The
population variables were included as ``controls.'' This choice of variables allowed us to
have a balanced approach to sociohydrological factors and to show that policy adoption is
not a simple outcome of hydrological factors. Second, we selected the state-government
level as the higher level for the model because water conservation policy is frequently
driven by state-government organizations and policies. Third, any of the four variables
could be included only at the MSA level rather than at both the MSA and state levels, but
doing so would tend to bias the model toward an overestimation of MSA-level differences.
By using a model that includes these four variables at both the MSA and state levels, and
by normalizing the MSA-level variables to state averages, we developed a conservative
approach to showing the importance of MSA-level differences.
Thus, when we demonstrate MSA-level differences, we do so with a model that would tend
to underestimate those differences.
Because there are no previous quantitative comprehensive analyses of urban
water conservation policies, we represent our ignorance at the outset by
choosing weakly-informative priors, so the regression results are almost
entirely determined by the data, and are only weakly constrained by the priors.
We follow \citet{gelman:prior:2008}'s analysis
of weakly-informative priors for logistic regression
by choosing Cauchy priors with a scale of 2.5 for the
parameters corresponding to the standardized variables ($\alpha_0$, $\beta$,
and $\gamma$).
We represent $\delta$ as normally distributed
with a scale defined by the hyperparameter $\sigma$ with a positive half-Cauchy
hyperprior \citep{gelman:prior:2008}.
For $\phi$, we parameterized the Cauchy priors
\emph{ad-hoc}, based on the data, and setting the scales by trial and error
to make the prior distribution wide enough that it did not noticeably constrain
the posterior.
\begin{linenomath*}
\begin{align}
\alpha_0, \beta, \gamma \sim{} & \text{Cauchy}(0,2.5),\\
\delta \sim{}& \text{normal}(0,\sigma),\\
\sigma \sim{}& \text{positive half-Cauchy}(0,\Sexpr{sigma_sigma_delta}),\\
\intertext{and}
\phi \sim{}& \begin{cases}
\text{Cauchy}(\Sexpr{mu_phi_vwci},\Sexpr{sigma_phi_vwci}) & \text{for VWCI}. \\
\text{Cauchy}(\Sexpr{mu_phi_rr},\Sexpr{sigma_phi_rr}) & \text{for requirements and rebates}. \\
\end{cases}
\end{align}
\end{linenomath*}
We implemented the statistical model in the Stan probabilistic programming
language \citep{carpenter:stan:2016}, which generates a Hamiltonian Monte Carlo
sampler.
We used R to prepare the data and call the sampler \citep{r.manual:2016}.
We
sampled four Markov chains for 1000 iterations each, after 1000 warm-up iterations,
which yielded a total of 4000 samples.
% where each sample is a vector containing a value for each of the model paraameters.
% These samples approximate random draws from the joint posterior probability
% distribution for the parameters, given the priors and the data,
% so the statistics of the samples approximate those of the
% posterior distribution.
The means and medians of the posterior distributions for
regression coefficients $\beta$ and $\gamma$, were equal within $\pm 0.01$
(Tables~S15--S17).
Code to reproduce the analysis is included in the supporting information.
To facilitate interpretation, we
follow the recommendations of \citet[pp.~81--82]{gelman:arm:2007} by reporting
rescaled regression coefficients:
\iffalse
$\beta' = N_{\text{action}} \beta / 4$ and $\gamma' = N_{\text{action}} \gamma / 4$,
which represent the approximate change in VWCI (or requirements or rebates)
\else
$\beta' = \beta / 4$ and $\gamma' = \gamma / 4$, which represent
the approximate change in the probability $p$
\fi
corresponding to
a two-standard-deviation change of the covariate near the
midpoint of the logistic function (where $p = 0.5$).
The corresponding change in VWCI, requirements, or rebates
of
$\beta' N_{\text{action}}$ or $\gamma' N_{\text{action}}$.
For example, $N_{\text{VWCI}} = \Sexpr{n_actions}$,
so if $\beta'_k = 0.1$ for some covariate $x_k$,
then for a city
with VWCI of \Sexpr{floor(n_actions/2)} ($p \approx 0.5$),
a two-standard-deviation change in $x_k$ would correspond
to a change in VWCI of roughly \Sexpr{round(0.1 * n_actions)}.
We tested VWCI for overdispersion by comparing binomial and beta-binomial models
using leave-one-out cross-validation (LOO)
(Tables~S3--S5),
the Widely Applicable Information Criterion (WAIC)
(Tables~S6--S8),
and also by visual inspection of the distribution of posterior predictions of
the mean, standard deviation, maximum, and minimum of VWCI over the cities in
our data set \citep{gelman:bda:2014,gelman:predictive:2014,vehtari:loo:2016}.
All three methods favored the overdispersed beta-binomial model.
These tests also strongly favored a
hierarchical
model over a single-level regression on MSA-level covariates.
\section{Results}
With respect to the political-economic variables, regressions