-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathREADME.Rmd
270 lines (181 loc) · 7.61 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
---
title: "README"
author: "Shawn T O'Neil"
date: "5/1/2022"
output:
html_document:
keep_md: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.path = "README_figs/README-")
```
## `rstackdeque`: Persistent stacks, deques, and queues for R
R journal publication: [https://journal.r-project.org/archive/2015-1/oneil.pdf](https://journal.r-project.org/archive/2015-1/oneil.pdf)
Have you ever wanted to use a stack or a queue for R, but just pulled your hair out
trying to use lists or vectors instead? Enter rstackdeque.
The most important feature of the stacks and queues (and double-ended-queues, or
"deques") in this package are not only are they reasonably fast (amortized or
worst-case O(1) depending on which you're using), inserting and removal are
implemented as "side-effect-free" functions operating similar to other R
structures.
This is largely possible due to the fantastic work of
Chris Okasaki: see [Purely Functional Data Structures](http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504)
and [Simple and Efficient Purely Functional Queues and Deques](http://www.westpoint.edu/eecs/SiteAssets/SitePages/Faculty%20Publication%20Documents/Okasaki/jfp95queue.pdf).
*Psst, you can use these to efficiently build a data frame or list in a loop:*
```{r warning=FALSE,message=FALSE}
library(rstackdeque)
library(dplyr)
stack <- rstack()
for(i in 1:5) {
stack <- stack %>% insert_top(data.frame(i = i, logi = log(i)))
}
print(as.data.frame(stack))
```
*See more good stuff under Common Functionality below.*
## Install
You can install this package via CRAN:
```{r eval=FALSE}
install.packages("rstackdeque")
```
Then load it up and check the help and examples:
```{r eval=FALSE}
require(rstackdeque)
help(package = "rstackdeque")
help(rstack)
help(rdeque)
help(rpqueue)
```
## Quick Start
### Stacks
An `rstack()` is a stack-like structure, that supports adding elements to the "top":
```{r}
library(rstackdeque)
library(dplyr)
stack <- rstack() %>%
insert_top("A") %>%
insert_top("B") %>%
insert_top("C") %>%
insert_top("D")
print(stack)
```
They also allow "peeking at" (returning a copy of) the top element. This does not change the contents of the stack.
```{r}
top_element <- peek_top(stack)
print(top_element)
```
Lastly, they support dropping the top element:
```{r}
stack <- without_top(stack)
print(stack)
```
### Deques (double-ended queues)
Deques, or double-ended queues, support adding and removing elements to either the "front" or "back" of the queue.
```{r}
deque <- rdeque() %>%
insert_front("A") %>%
insert_front("B") %>%
insert_front("C") %>%
insert_back("X") %>%
insert_back("Y") %>%
insert_back("Z")
print(deque)
deque <- without_front(deque)
deque <- without_back(deque)
print(deque)
```
The `peek_front()` and `peek_back()` functions return the first and last elements:
```{r}
first_el <- peek_front(deque)
last_el <- peek_back(deque)
print(first_el)
print(last_el)
```
While insertions and removals from `rstack`s are always fast, certain patterns of repeated insertions and removals from an `rdeque` will cause periodic 're-balancing' operations where a full copy of the deque is made with more efficient internal organization. Thus, although `rdeque`s provide more functionality than `rstack`s, `rstack`s are preferred unless deque operations are required.
### Queues
Queues support a subset of functionality of dequeus - queues only allow inserting elements at the back of the queue, and removing elements from the front of the queue. Peeking is also only supported at the front of the queue.
```{r}
queue <- rpqueue() %>%
insert_back("A") %>%
insert_back("B") %>%
insert_back("C")
print(queue)
queue <- without_front(queue)
print(queue)
first_el <- peek_front(queue)
print(first_el)
```
Queues provide a subset of functionality of deques, but this implementation comes with an efficiency improvement: while `rdeques` may experience
periodic re-balancing described above, `rpqueus` do not, such that insertions and removals always take approximately the same time. The complexity of supporting this for queues, however, makes the implementation slower in practice that `rstack`s. These queues are thus a good choice when queue functionality is needed in real-time applications (e.g. user interfaces).
### Common Functionality
All three types support `empty()` (`TRUE` or `FALSE`), `length()`, and `head()` (where the head is the front of deques and queues,
and the top of stacks), and `rstacks` also support `rev()` (but this operations makes a full copy).
We demonstrate features below for `rstack`s, but they also apply to `rdeque`s
and `rpqueue`s (discussed below).
#### Element types
Almost any datatype can be stored - vectors, lists, data.frames, models, etc.
```{r}
stuff <- rstack() %>%
insert_top(mtcars) %>%
insert_top(t.test(rnorm(25), rnorm(25)))
print(stuff)
```
#### Conversion to or from lists or vectors
```{r}
letters_stack <- as.rstack(letters)
print(letters_stack)
```
... and converted to a list. If the list is simple, the R-base `unlist()`
function is handy:
```{r}
letters_again <- as.list(letters) %>% unlist()
print(letters_again)
```
#### Conversion to data.frames
If the elements are lists that have the same element names in the same order, or dataframes with the
same column names in the same order, they can be converted to a `data.frame`. If elements are data frames, the rows
will be concatenated.
```{r}
people_stack <- rstack() %>%
insert_top(list(name = "Joe", age = 26)) %>%
insert_top(list(name = "Kim", age = 30))
people_df <- as.data.frame(people_stack)
print(people_df)
```
#### Usage with loops
As a result, these structures pair nicely with loops. Here's a loop
that computes a [Collatz sequence](https://en.wikipedia.org/wiki/Collatz_conjecture) from a given starting number.
```{r}
value <- 17
step <- 1
collatz_stack <- rstack()
while(value != 1) {
if(value %% 2 == 0) {
value <- value / 2
} else {
value <- value * 3 + 1
}
collatz_stack <- collatz_stack %>% insert_top(list(value = value, step = step))
step <- step + 1
}
collatz_df <- as.data.frame(collatz_stack) %>% arrange(step)
print(collatz_df)
```
Note: Although the structures in `rstackdeque` are generally designed to be fast,
for performance-critical applications where the size of the result is known
a-priori, using a [pre-allocation strategy](https://www.r-bloggers.com/2018/08/growing-objects-and-loop-memory-pre-allocation/) will be faster.
Computing the Collatz sequence as shown above is an example where pre-allocation is difficult, and stacks, queues, and deques (double-ended queues) are important components of many algorithms. See Timothy Barry's [Collections in R: Review and Proposal](https://journal.r-project.org/archive/2018/RJ-2018-037/RJ-2018-037.pdf) (R Journal, 10(1), 2018) for a comparison of different R implementations of these and similar data structures.
#### Persistence
All of the structures described are *persistent*, meaning that additions and
removals don't alter the original; practically, a modified copy is returned.
(A full copy is not made however, the structures handle efficient organization
of the data in memory.)
```{r}
stack <- as.rstack(c("A", "B", "C"))
stack2 <- insert_top(stack, "G")
stack3 <- without_top(stack)
print(stack)
print(stack2)
print(stack3)
```
This property makes functions that use the structures "side-effect-free" like most R types, supporting pure functions and parallelization.