This repository demonstrates some good examples for sharing interactive, reproducible data analysis code across Python and R. By leveraging devcontainer for consistent development environments and Quarto for technical publishing, this setup ensures a smooth experience for researchers to share their code and ideas.
Contributors are encouraged to submit code examples using Quarto in r/samples
or python/samples
folders.
Our goal is not to provide a comprehensive guide to showcase the possibilities of using Quarto for technical publishing.
But by adding yaml metadata to your codesnippets, it allows publishing the content to Github Pages easily and
others can view the code / results directly in the browser without running the code.
- Workflow
- DevContainer
- Quarto Examples Using Jupyter Notebook
- Quarto Examples Using
.qmd
for R - Key Quarto commands
- Contribute your example
It is recommanded to follow the steps below to create a reproducible example:
%%{init: {'themeVariables': { 'fontSize': '10px', 'nodeSpacing': 1, 'rankSpacing': 1 }}}%%
graph TD
A[Start] --> B{Dev Container?}
B -- Yes --> C[Follow step 1 for how to use devcontainer]
B -- No --> D[Clone the repo locally and<br>
setup python or R environment on your own]
C --> E[Create a new folder under
r/samples or python/sample]
D --> E
E --> F[Follow step 2 or 3
to add your jupyter notebook or qmd file]
F --> G[Follow step 4 to test your example locally]
G --> H[Follow step 5 to Commit your changes and create a pull request]
The devcontainer
feature in VS Code allows you to define the development environment using a Docker-based configuration.
This ensures that all participant in a project has a consistent setup, eliminating the "works only on my machine" problem.
By using devcontainer.json
, you can specify the required dependencies for Python and R, configure the environment,
and set up additional tools like Jupyter and Quarto for sharing your work.
How to:
In this repo, we have a template "Python and R Dev Container" that you can use to create a consistent development environment. Go to github codespaces and create a new codespace using this template as shown below: (This may take 5-10 minutes, and please be aware of codespace costs )
Use the JupyterLab or Rstudio Server instance running in the codespace to edit your Jupyter notebook or Quarto document.
The default devcontainer configuration has JupyterLab running on port 8888 and Rstudio Server running on port 8787. You can access them directly on the browser:
Stopping or deleting a GitHub Codespace are effective ways to control costs!
Stopping a Codespace pauses the environment and stops billing temporarily, allowing you to save the state and resume work later without accruing charges. Deleting a Codespace will permanently remove the environment, you should make sure to do this when you’ve finished the work and don’t need the environment or data anymore.
If you choose to clone the repo locally, you can also setup your local container to use the same environment, follow this tutorial to set it up.
If your project requires additional dependencies, you can create a customized your_devcontainer.json
file.
For more information, check the VS Code documentation.
There are many features available in the devcontainer that you can leverage, see this list.
Quarto is an open-source tool for technical publishing that enables you to create interactive, data-driven documents. It is built on top of the R Markdown format and supports Jupyter notebooks as part of its ecosystem. This section provides examples of using Quarto with Jupyter notebooks for Python-based data analysis. The integration allows you to create rich, interactive documents that blend code, analysis, and markdown. for more information, go to Quarto.org.
How to:
In this repo, we have a sample Jupyter notebook sir_model_simulation.ipynb that demonstrates how you can insert a yaml at the top of a markdown cell to specify metadata for quarto publishing. If you are using codespace, a jupyterlab instance is running on port 8888, and you can edit the notebook in the browser or add your own:
It is a good practice to add required dependencies in the requirements.txt file
so that your example will be easily reproducible by others. You can also choose to use the magic command !
or %
inside your notebook to install dependent packages.
!pip install numpy~=2.1
%pip install matplotlib>=3.9
In this section, we explore how to use Quarto's .qmd
format to create reproducible reports in R.
Quarto simplifies the process of creating technical documents that include R code, plots, and analysis.
The .qmd
files serve as an efficient way to write and maintain both the narrative and the underlying code used in your analysis.
It is very similar to R Markdown but offers additional features and flexibility for publishing.
How to:
In this repo, we have a sample Quarto document simple_plot.qmd that shows how to add metadata and use R code chunks.
If you are using codespace, an Rstudio Server instance is running on port 8787 (username: rstudio, password: rstudio)
and you can edit the .qmd
file directly in the browser or add your own:
You should handle install.packages
and library
commands to install and load the required R packages for reproducibility, or provide
a install_packages.R file so that others can reproduce your code easily.
Once you have created a new folder and added your Jupyter notebook or .qmd
file,
you can test it using the quarto render
command.
You can open a terminal window in your local machine or in the codespace and run the following commands to build the quarto document locally:
quarto render your_example.qmd
quarto render your_example.ipynb --to html
Quarto allows you to customize the rendering of documents using both the quarto render
command and the YAML header within a file. Here's a brief overview:
-
Output Format: You can specify the format (HTML, PDF, DOCX, etc.) by adding the
--to
flag. For example:quarto render file.qmd --to pdf
This converts your
.qmd
or.ipynb
file to PDF. -
Output Directory: Control where the output file is saved using
--output-dir
:quarto render file.qmd --output-dir output/
-
Document Metadata: Set properties like title, author, and date in the YAML header:
--- title: "My Report" author: "Jane Doe" ---
-
Table of Contents (TOC): Enable and position the table of contents:
--- toc: true toc-location: left ---
-
Code Execution: Control whether code is executed or shown in the final document:
--- execute: echo: true ---
These options give you flexibility to control how your documents are rendered, either through the command line or the YAML configuration in your file. See list of all available options here.
You are encouraged to check in your example to this repo! Please create your own folder under r/samples
or python/samples
and add your jupyter notebook or .qmd (quarto markdown) file, you do not need to include the html files as
all examples checked in to this repo
will be published to Github Pages. automatically when merged to main branch.
Below is an example of how to check in a new example to the repo:
git checkout -b my_branch --track origin/main
Open a terminal window and navigate to the folder that you created:
cd r/samples/my_example
# or
cd python/samples/my_example
git add r/samples/my_example/example.qmd
# if you use python, you can add a jupyter notebook and requirements.txt file
git add python/samples/my_example/requirements.txt
git add python/samples/my_example/example.ipynb
git commit -m "your_commit_message"
git push origin my_branch
note that when codespace is created, it generates a temporary authentication token for GitHub CLI and Git commands, which provides access to the repository for the duration of the Codespace session, it may refresh periodically so you may get access denied error when pushing changes, you can use the "Export changes to a new branch" from UI instead:
Be aware that any files you added /edited will appear in this branch but if you have created a commit in step (3), it will be shown in the history which you can easily retrieve later.
Go to the pull request page and click "create a new pull request" from your branch to the main branch. You should use the pull reqeust template provided in pull_request_template.md. Ask a peer to review your code and merge it to the main branch.