-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.Rmd
381 lines (238 loc) · 11.7 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
---
title: "Exact Statistics and Semi-Parametric Tests for Small Network Data"
author:
- \textbf{George G. Vega Yon, MS}$^\star$
- Andrew Slaughter, PhD$^\dagger$
- Kayla de la Haye, PhD$^\star$
date: "IC$^2$S$^2$ 2019, Amsterdam \\linebreak[4]July 19, 2019"
institute: "\\begin{minipage}[c]{.3\\linewidth}\\centering $^\\star$University of Southern California\\linebreak[4]Department of Preventive Medicine\\end{minipage}\\begin{minipage}[c]{.3\\linewidth}\\centering $^\\dagger$U.S. Army Research Institute for the Behavioral and Social Sciences\\end{minipage}"
output:
beamer_presentation:
slide_level: 2 # revealjs::revealjs_presentation
highlight: espresso
latex_engine: xelatex
includes:
in_header: notation-def.tex
aspectratio: 169
handout: false
fontsize: 10pt
bibliography: bibliography.bib
colortheme: beaver
---
```{r setup, include=FALSE}
knitr::knit_hooks$set(smallsize = function(before, options, envir) {
if (before) {
"\\footnotesize\n\n"
} else {
"\n\\normalsize\n\n"
}
})
knitr::opts_chunk$set(echo = TRUE, smallsize=TRUE)
```
## Acknowledgements
\scriptsize
\begincols
\begincol{.2\linewidth}
\includegraphics[width=.7\linewidth]{fig/ARO_logo.png}
\endcol
\begincol{.79\linewidth}
This material is based upon work support by, or in part by, the U.S. Army Research
Laboratory and the U.S. Army Research Office under grant number W911NF-15-1-0577
\endcol
\endcols
\begincols
\begincol{.79\linewidth}
Computation for the work described in this paper was supported by the University
of Southern California’s Center for High-Performance Computing (hpc.usc.edu).
\endcol
\begincol{.2\linewidth}
\includegraphics[width = .7\linewidth]{fig/usc.pdf}
\endcol
\endcols
The views expressed in this presentation are those of the authors, and do not represent the official policy or positions of the Department of the Army, the DoD, or the U.S. Government.
\begin{figure}
\centering
\includegraphics[width = .85\linewidth]{fig/muriteams.png}
\end{figure}
## Context: Social abilities and team performance
Two research questions\pause
\begin{centering}
\large How do \uscred{social abilities} impact \uscred{network structure}? \pause
\color<4->{gray}{How does \only<3>{\uscred{collective intelligence}}\only<4->{collective intelligence} affect team (network) \only<3>{\uscred{performance}}\only<4->{performance}?}\pause
\normalsize
\end{centering}
---
To answer this question, we have the following experimental data: \pause
- 42 mixed-gender teams,\pause
- Which completed 1 hour of group tasks (MIT's IQ test for teams)\pause
- Individual survey capturing information regarding socio-demographics **and**:\pause
- \uscred{Social Intelligence}: Social Perception (measured by RME), Social Accomodation, Social Gregariousness, and Social Awareness \pause
- \uscred{Social Networks}: Advice Seeking, Leadership, Influence (among others).
## Context (cont'd)
\begin{figure}
\centering
\includegraphics[width = .6\linewidth]{fig/plot-graph-4-1.pdf}
\end{figure}
We can do a lot of simple statistics: density, \% of \textit{\color{gray}[blank]}, etc. but...\pause{} \uscred{\large how can we go beyond that?}
## Exponential random graph models
\def\fig1width{.35\linewidth}
\begin{figure}
\centering
\begin{tabular}{m{.2\linewidth}<\centering m{.4\linewidth}<\raggedright}
\toprule Representation & Description \\ \midrule
\includegraphics[width=\fig1width]{terms/mutual.pdf} & Mutual Ties (Reciprocity)\linebreak[4]$\sum_{i\neq j}y_{ij}y_{ji}$ \\
\includegraphics[width=\fig1width]{terms/ttriad.pdf} & Transitive Triad (Balance)\linebreak[4]$\sum_{i\neq j\neq k}y_{ij}y_{jk}y_{ik}$ \\
\includegraphics[width=\fig1width]{terms/homophily.pdf} & Homophily\linebreak[4]$\sum_{i\neq j}y_{ij}\mathbf{1}\left(x_i=x_j\right)$ \\
\includegraphics[width=\fig1width]{terms/nodeicov.pdf} & Covariate Effect for Incoming Ties\linebreak[4]$\sum_{i\neq j}y_{ij}x_j$ \\
\includegraphics[width=\fig1width]{terms/fourcycle.pdf} & Four Cycle\linebreak[4]$\sum_{i\neq j \neq k \neq l}y_{ij}y_{jk}y_{kl}y_{li}$ \\
\bottomrule
\end{tabular}
\end{figure}
ERGMs can do the job.
## Exponential random graph models (a 1 slide crah course)
\begin{figure}
\centering
\includegraphics[width = .8\linewidth]{parts-of-ergm.pdf}
\end{figure}
------
\centering
There is one problem with this model ... \linebreak[4]
\includegraphics[width = .5\linewidth]{fig/parts-of-ergm.pdf}\pause \linebreak[4]
\large because of \color[HTML]{af0000}$\mathcal{Y}$\color{black},
the \color[HTML]{5726e7} \textbf{normalizing constant}\color{black}{} is \linebreak[4] a summation of $2^{n(n-1)}$ terms \includegraphics[width=.05\linewidth]{fig/scared.pdf}!\normalsize\pause
## ERGMs for small networks
* Calculating the likelihood function for a directed graph means (at some point)
enumerating \uscred{$2^{n(n-1)}$ terms}.
$$
\Prcond{\Graph = \graph}{\params, \Indepvar} = \frac{%
\exp{\transpose{\params}\sufstats{\graph, \Indepvar}}%
}{
\color{USCCardinal}\sum_{\graph'\in\GRAPH} \exp{\transpose{\params}\sufstats{\graph', \Indepvar}}\color{black}
}
$$
\pause
* So, if $n = 6$, then we have approx 1,000,000,000 terms.\pause
* This has lead the field to aim for (very neat) simulation based methods\pause
* But, if our small networks have (at most) 6 nodes... \pause
\large \uscred{We can go back to the \textbf{good-old-fashion} MLE}\normalsize \raggedleft
---
Keeping $n\leq 6$ we can\pause
- Compute the likelihood function exactly, and hence use ``simple'' optimization to get MLEs.\pause
- Obtain more \uscred{accurate} estimates \uscred{faster}\pause{} (in most cases). \pause
- Since (usually) small networks come in many...\pause obtain pooled estimates. Which helps with \uscred{power} \textit{and} \uscred{degeneracy})\pause
- And more:
- All MLE goodies, e.g., LRT
- Enhanced simulation methods: resampling, cross-validation
- Trivially extend ERGM: mixed-effects models, dependency structures across net\pause
- etc.\pause{}
This and more has been implemented in the `ergmito` (\includegraphics[width=.1\linewidth]{fig/lifecycle-experimental-orange.pdf}) R package (available at https://github.com/muriteams/ergmito)
---
Sidetrack...
\begin{minipage}[c]{1\linewidth}
\large \textbf{ito, ita}: From the latin -\textit{\=ittus}. suffix in Spanish used to denote small or affection. e.g.:
\hspace{.5cm} \textit{¡Qué lindo ese perr\textcolor{USCCardinal}{\textbf{ito}}!} / \textit{What a beautiful little dog!}
\hspace{.5cm} \textit{¿Me darías una tac\textcolor{USCCardinal}{\textbf{ita}} de azúcar?} / \textit{Would you give me a small cup of sugar?}
\normalsize
\end{minipage}\pause
\alert{Special thanks to George Barnett who proposed the name during the 2018 NASN!}
## How many networks?
- Thinking about power and unbiasedness, we did a simulation study
- Simulated 20,000 samples of networks using the following steps:
1. Draw parameters for the model based on the terms _edges_ and _ttriad_ (transitive triples) from a uniform(-2, 2).
2. Draw group sizes by randomly selecting the number of networks of size 4 and size 5. Each sample has any of $\{10, 20, ..., 50\}$ networks.
3. Using 1. and 2., simulate networks using an ERGM model
- We fitted the models using both MC-MLE (block-diagonal ergm) and MLE (ergmito).
- We looked (are looking) at empirical bias (sanity check), power and elapsed time.
## Simulation study: Bias
\begin{figure}
\centering
\includegraphics[width=.65\linewidth]{fig/bias-02-various-sizes-4-5-ttriad.png}
\end{figure}
## Simulation study: Bias (contd')
\begin{figure}
\centering
\includegraphics[width=.65\linewidth]{fig/bias-absdiff-02-various-sizes-4-5-ttriad.png}
\end{figure}
## Simulation study: Power
\begin{figure}
\centering
\includegraphics[width=.65\linewidth]{fig/power-02-various-sizes-4-5-ttriad.png}
\end{figure}
## Simulation study: Elapsed time (contd')
\begin{figure}
\centering
\includegraphics[width=.65\linewidth]{fig/bias-elapsed-02-various-sizes-4-5-ttriad.png}
\end{figure} \pause{}
What about a real data set?
## Preliminary results
From our sample of 42 small networks:
```{r muri-pre, cache=TRUE, echo=FALSE, results='asis'}
ans <- readRDS("model/ergmito_forward_selection_part4.rds")
library(ergmito)
names(ans) <- gsub("ans_", "", names(ans))
texreg::texreg(
ans, booktabs = TRUE, fontsize = "tiny",
use.packages = FALSE, caption = "Selected models for each one of the studied networks. Results presented here correspond to a forward selection process."
)
```
----
```{r muri-gof, cahe=TRUE, echo=FALSE, fig.align='center', out.width=".8\\linewidth"}
library(ergmito)
load("model/ergmito_forward_selection_part4_gof.rda")
plot(gof_advice, main = "GOF Advice seeking network")
```
## Context: Social abilities and team performance
Two research questions\pause
\begin{centering}
\large \sout{How do \uscred{social abilities} impact \uscred{network structure}?}
How does \uscred{collective intelligence} affect team (network) \uscred{performance}?
\normalsize
\end{centering}
## Networks and team performance
Suppose we have the following:
- Data on structure, nodes, and an outcome: $\left(\graph, \indepvar, \depvar\right)$\pause
- In general, we are interested on assessing the following: $\graph\perp\depvar$ \pause
- Three scenarios:\pause
a. Direct association
b. Known (hypothesized) mediated association (a knwon confounder)
c. Uknown (hypothesized) mediated association (an unknown confounder)\pause{}
Non-parametrically, a. is "trivial"\pause{}, c. is really hard,\pause{} b. can be approached by:
a. Conditional permutation tests,\pause{}
b. __Simulation based methods__
---
\footnotesize
\def\svgwidth{.8\linewidth}
\begin{figure}
\centering
\input{fig/struct-test.pdf_tex}
\end{figure}\pause{}
\normalsize
We are still working (thinking) about this...
## Discussion
* ERGMItos... This is not new.\pause{} What's new is the set of tools to apply it\pause
* Taking this approach we can improve our estimates (power) and help with
degeneracy\pause
* The tool is working\pause{} (according to the simulation study...)\pause
* Need to conduct more simulations using _nodal_ attributes and larger samples of networks.\pause
* What about goodness-of-fit? Still need to better think about it
## Discussion (contd')
* The simplicity of the estimation procedure allows us to think of:\pause
- Separable Temporal ERGMitos, a.k.a. TERGMitos \pause
- Mixture models and Bayesian inference (if you are into that kind of stuff)\pause
- More flexible formulas (e.g. interactions between terms and graph-level attributes) \pause{}
- Better odds ratios (not simply exponentiating the coefficients)\pause
- Simulation based methods (small size $\implies$ sampling from in-memory data, and exact tests)\pause
- Cross-validation/model selection in ERGMs\pause{}
- Fit models to large samples of small networks (10s of 1,000 vertices)\pause{}
* Still thinking about how to test for association between network structure and group outcome
## Thanks!
\begin{centering}
We thank members of our MURI research team, USC's Center for Applied Network Analysis, Garry Robins, Carter Butts, Johan Koskinen, Noshir Contractor, and attendees of the NASN 2018 conference for their comments.
\includegraphics[width=.2\linewidth]{logo.png}
\def\and2{\hskip 1em}
\large \textbf{George G. Vega Yon, MS} \and2 Andrew Slaughter, PhD \and2 Kayla de la Haye, PhD
\href{mailto:[email protected]}{[email protected]}
\href{https://ggvy.cl}{https://ggvy.cl}
\includegraphics[width=.02\linewidth]{github.png}\href{https://github.com/gvegayon}{gvegayon}
\includegraphics[width=.02\linewidth]{twitter.png}\href{https://twitter.com/gvegayon}{gvegayon}
\end{centering}