|
1 | 1 | ---
|
2 | 2 | title: "textNet: Directed, Multiplex, Multimodal Event Network Extraction from Textual Data"
|
3 |
| -authors: |
4 |
| - - name: Elise Zufall |
5 |
| - - name: Tyler Scott |
| 3 | +author: Elise Zufall and Tyler Scott |
6 | 4 | date: 7 November 2024
|
7 | 5 | bibliography: paper.bib
|
8 | 6 | output: pdf_document
|
@@ -74,7 +72,7 @@ The following example uses parsed text from the Gravelly Ford Water District Gro
|
74 | 72 | ### Extract Networks
|
75 | 73 | First, we read in the pre-processed data and call textnet_extract() to produce the network object:
|
76 | 74 |
|
77 |
| -```{r readpreprocessed} |
| 75 | +```{r readpreprocessed, message=F, warning=F} |
78 | 76 | library(textNet)
|
79 | 77 | old_new_parsed <- textNet::old_new_parsed
|
80 | 78 |
|
@@ -193,7 +191,7 @@ A tool that generates an igraph or network object from the textnet_extract outpu
|
193 | 191 | ```
|
194 | 192 | The *ggraph* package [@pedersen_ggraph_2024] has been used to create the two network visualizations seen here, using a weighted version of the igraphs constructed below. We set collapse_edges = T to convert the multiplex graph into its weighted equivalent.
|
195 | 193 |
|
196 |
| -```{r plot} |
| 194 | +```{r plot, message=F, warning=F} |
197 | 195 | library(ggraph)
|
198 | 196 | old_extract_plot <- export_to_network(old_extract_clean, "igraph", keep_isolates = F,
|
199 | 197 | collapse_edges = T, self_loops = T)[[1]]
|
@@ -370,7 +368,7 @@ The 2x2 table below summarizes the rate at which each entity type is found in bo
|
370 | 368 |
|
371 | 369 | We can also investigate differences in network statistics between the two plans. For instance, the distribution of degree does not change much between plan versions. The distribution of betweenness, likewise, is relatively stable except for person nodes, which are the least common nodes in the graph.
|
372 | 370 |
|
373 |
| -```{r step8b4} |
| 371 | +```{r step8b4, warning=F, message=F} |
374 | 372 | library(gridExtra)
|
375 | 373 | library(ggplot2)
|
376 | 374 | b1 <- ggplot(old_node_df, aes(x = entity_type, y = deg)) + geom_boxplot() +
|
@@ -436,7 +434,7 @@ This is a wrapper for pdftools, which has the option of using pdf_text or OCR. W
|
436 | 434 | pdfs <- c("vignettes/old.pdf",
|
437 | 435 | "vignettes/new.pdf")
|
438 | 436 |
|
439 |
| - old_new_text <- textNet::pdf_clean(pdfs, keep_pages=T, ocr=F, maxchar=10000, |
| 437 | + old_new_text <- textNet::pdf_clean(pdfs, ocr=F, maxchar=10000, |
440 | 438 | export_paths=NULL, return_to_memory=T, suppressWarn = F,
|
441 | 439 | auto_headfoot_remove = T)
|
442 | 440 | names(old_new_text) <- c("old","new")
|
|
0 commit comments