-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path001-infrastructure.Rmd
245 lines (174 loc) · 8.88 KB
/
001-infrastructure.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
# Infrastructure
We have 2 servers in the lab, `lightfoot` and `snowmane` (named after horses from Lord of the Rings).
The main "production" server is `lightfoot` where all of our data and compute resources exist.
## Components
We use [Docker](https://www.docker.com/) in our lab.
This allows us to install various components without affecting the underlying server.
A brief history of how we settled on docker and why is in this blog post on [From VMs to LXC Containers to Docker Containers](https://chendaniely.github.io/sdal/2017/07/07/vm_lxc_docker/).
Figure \@ref(fig:docker-lightfoot) depicts what containers we have on `lightfoot`.
You can think of each 'container' as an 'application' just like one you are running on your laptop.
But the behavior of each 'container' is more like a separate server you connect to.
```{r docker-lightfoot, echo=FALSE, fig.cap='The Docker infrastructure used in SDAL. The containers on the top row are the parts of the system lab members will be connecting to and working on.', fig.align='center', out.width='100%'}
knitr::include_graphics(path = "./figs/sdal_docker_lightfoot.png")
```
The primary containers you will be using are the RStudio, Adminer, and Django/Wagtail containers.
They all exist on `lightfoot` and can all be reached in a browser with [`https://analytics.bi.vt.edu`](https://analytics.bi.vt.edu) and an appropriate suffix (e.g., [`/rstudio`](https://analytics.bi.vt.edu/rstudio), [`/db`](https://analytics.bi.vt.edu/db)).
## Accessing Servers
Your main point of contact will be using RStudio on `lightfoot`.
There's a few things that can be setup so you don't have to type your password all the time.
This involves creating "SSH keys".
Aside from creating keys, below is a set of links you'll probably be using all the time:
### RStudio
**Best to open/run rstudio in a private/incognito browser window**
You will definitely need to do this if you try to go to your rstudio container and are only presented with an empty/blank/white screen.
- Safari: https://support.apple.com/guide/safari/browse-privately-ibrw1069/mac
- Chrome: https://support.google.com/chrome/answer/95464?hl=en&co=GENIE.Platform%3DDesktop
- Firefox: https://support.mozilla.org/en-US/kb/private-browsing-use-firefox-without-history
**RStudio**
- Generic RStudio container: https://analytics.bi.vt.edu/rstudio
- This is the generic RStudio container
- For people doing heavy computational work, you will be assigned a separate container. It is better to use that one instead.
- Your own container: https://analytics.bi.vt.edu/YOUR_PID/rstudio
- It's suggested you use the container assigned to yourself, since your work and crashed code is isolated from everyone else
### Database
We use [PostgreSQL](https://www.postgresql.org/) for our database.
There is a way to connect directly to the database container,
but the easiest way is to use the [Adminer Database management](https://www.adminer.org/en/) server.
Connect to adminer (i.e., the database web interface) by going to: https://analytics.bi.vt.edu/db.
You should see the login screen:
```{r adminer-login, echo=FALSE, fig.cap='Login screen for Adminer', fig.align='center', out.width='50%'}
knitr::include_graphics(path = "./figs/adminer/adminer_login.png")
```
Make sure:
1. System: `PostgreSQL`
2. Server `postgresql` (note all lower-case)
3. Username: your username (i.e., vt PID)
4. Password: Your BI LDAP Password
5. Database: You can leave this blank, or you can put in the name of the database you want to connect to if you already know.
You have to option to connect and look for a database after you connect.
So this field is optional.
### Python
Currently Python (Anaconda) in installed on an individual basis directly on Lightfoot.
Since Anaconda installs things into your `home` directory, you have access to Python on all the containers.
You can open `python` or `ipython` directly by running the command in the terminal.
#### Jupyter Notebook
If you want to use the jupyter notebook,
you need to run the command on the server,
and then connect to it from the browser on your local machine
1. Run the notebook on the server
1. `ssh` into `lightfoot` (e.g., `ssh [email protected]`)
2. `jupyter notebook --no-browser`
3. Note the url of the notebook server (along with the token):
- e.g., `http://localhost:8888/?token=asdfasdfasdfasdfasdf`
- Here also pay very close attention to the port number, this example shows `8888`
- Older versions of `jupyter` will not have a token
- You may want to think about performing a full update
- `conda update conda && conda update --all`
2. Connect your laptop to the server
1. In another terminal window make the connection:
- Pay attention to the port number from the previous step (e.g., `8888`)
- Make the connection `ssh -L 8888:localhost:8888 [email protected]`
3. Copy/paste the url of the notebook server (from step 1) into your browser
## Project Template
Why project templates: https://chendaniely.github.io/sdal/2017/05/30/project_templates/
Other resources:
- https://github.com/ropensci/rrrpkg
- https://github.com/benmarwick/rrtools
- A template for research projects structured as R packages: https://github.com/Pakillo/template
```
project/ # the project/code repository
|
|- data/ # raw and primary data, are not changed once created
| |
| +- *project_data/ # subfolder that links to an encrypted data storage container
| | | note that nothing here is in the git repository
| | | it's just a shortcut (i.e., link) to a different folder
| | |- original/ # raw data, will not be altered
| | |- working/ # intermediate datasets from src code
| | +- final/ # final datasets used for analysis/models/plots
| |
| +- more_data/ # some projects will need multiple links
|
|- src/ # any programmatic code
| |- analysis1/ # user1 assigned to the project
| +- analysis2/ # user2 assigned to the project
|
|- R/ # functions
|
|- tests/ # unit tests
|
|- output # all output and results from workflows and analyses
| |- figures/ # graphs, likely designated for manuscript figures
| |- pictures/ # diagrams, images, and other non-graph graphics
| +- analysis/ # generated reports for (e.g. rmarkdown output)
|
|- README.md # the top level description of content
|
|- .gitignore # git ignore file
|- project.Rproj # RStudio project
|
|- DESCRIPTION # Description file to repo into R package, if applicable
+- Makefile # Makefile, if applicable
```
Notes:
- you can find our project template `.gitignore` file here: https://github.com/bi-sdal/project_template/blob/master/gitignore
## Join a project repository (GitLab)
All of our code exist on a GitLab server (https://devlab.vbi.vt.edu)
### Using Terminal
You can use the `SSH` or `HTTPS` link (selected blue text) to run `git clone` to pull down the repository code.
In general this should be cloned into the `git` folder in your home folder.
```bash
cd ~
git clone <REPO URL HERE>
```
```{r, echo=FALSE}
knitr::include_graphics(path = "./figs/get_repo/010-gitlab.png")
```
```{r, echo=FALSE}
knitr::include_graphics(path = "./figs/get_repo/020-get_repo_url.png")
```
### Using RStudio
The steps in RStudio actually do the same thing as the steps from the terminal.
**Find the repository you plan to work on.**
```{r, echo=FALSE}
knitr::include_graphics(path = "./figs/get_repo/010-gitlab.png")
```
<hr>
**Get the `SSH` url.**
You can also use the `HTTPS` url if you have not setup your SSH keys.
```{r, echo=FALSE}
knitr::include_graphics(path = "./figs/get_repo/020-get_repo_url.png")
```
<hr>
**File > New Project**
```{r, echo=FALSE}
knitr::include_graphics(path = "./figs/get_repo/030-new_project.png")
```
<hr>
**Version Control**
```{r, echo=FALSE}
knitr::include_graphics(path = "./figs/get_repo/040-version_control.png")
```
<hr>
**Git**
```{r, echo=FALSE}
knitr::include_graphics(path = "./figs/get_repo/050-git.png")
```
<hr>
**Enter repository URL**
Paste in the repository URL from earlier.
The "Project data name" should auto fill itself with the name of the repository (selected in blue text).
It's highly advised to put the project into a folder designated for repositories (e.g., git, Repositories).
You can make sure the project get's `clone`d to the correct place in the "Create project as a subdirectory of" section of the page.
```{r, echo=FALSE}
knitr::include_graphics(path = "./figs/get_repo/060-paste_screen.png")
```
<hr>
**The loaded project**
RStudio will `git clone` the repository to the designated folder and open the project.
The top right corner will tell you which project you are in.
Make sure you check this every time you start working.
```{r, echo=FALSE}
knitr::include_graphics(path = "./figs/get_repo/070-check_project.png")
```
## Start a project repository (GitLab)