projectglow · karenfeng · Oct 17, 2019 · Oct 17, 2019 · Oct 17, 2019 · Oct 17, 2019
diff --git a/README.md b/README.md
@@ -1,4 +1,6 @@
 [![CircleCI](https://circleci.com/gh/projectglow/glow.svg?style=svg&circle-token=7511f70b2c810a18e88b5c537b0410e82db8617d)](https://circleci.com/gh/projectglow/glow)
+[![Documentation
+Status](https://readthedocs.org/projects/glow/badge/?version=latest)](https://glow.readthedocs.io/en/latest/?badge=latest)
 
 # Building and Testing
 This project is built using sbt: https://www.scala-sbt.org/1.0/docs/Setup.html

diff --git a/docs/source/additional-resources.rst b/docs/source/additional-resources.rst
@@ -0,0 +1,15 @@
+Additional Resources
+====================
+
+Blog posts
+----------
+
+- `Scaling Genomic Workflows with Spark SQL BGEN and VCF Readers
+  <https://databricks.com/blog/2019/06/26/scaling-genomic-workflows-with-spark-sql-bgen-and-vcf-readers.html>`_
+- `Parallelizing SAIGE Across Hundreds of Cores <https://databricks.com/blog/2019/10/02/parallelizing-saige-across-hundreds-of-cores.html>`_
+
+  + Parallelize SAIGE using Glow and the Pipe Transformer
+
+- `Accurately Building Genomic Cohorts at Scale with Delta Lake and Spark SQL <https://databricks.com/blog/2019/06/19/accurately-building-genomic-cohorts-at-scale-with-delta-lake-and-spark-sql.html>`_
+
+  + Joint genotyping with Glow and Databricks
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -107,6 +107,10 @@
 # so a file named "default.css" will overwrite the builtin "default.css".
 html_static_path = ['_static']
 
+html_logo = '../../static/glow_logo_horiz_color_dark_bg.png'
+
+html_favicon = '../../static/favicon.ico'
+
 # Custom sidebar templates, must be a dictionary that maps document names
 # to template names.
 #

diff --git a/docs/source/etl/index.rst b/docs/source/etl/index.rst
@@ -10,9 +10,9 @@ Glow offers functionalities to perform genomic variant data ETL, manipulation, a
 .. toctree::
    :maxdepth: 2
 
-   variant-data.rst
-   vcf2delta.rst
-   variant-qc.rst
-   sample-qc.rst
-   lift-over.rst
-   utility-functions.rst
+   variant-data
+   vcf2delta
+   variant-qc
+   sample-qc
+   lift-over
+   utility-functions
diff --git a/docs/source/getting-started.rst b/docs/source/getting-started.rst
@@ -0,0 +1,58 @@
+Getting Started
+===============
+
+Running Locally
+---------------
+
+Glow requires Apache Spark 2.4.2 or above. If you don't have a local Apache Spark installation,
+you can install it from PyPI:
+
+.. code-block:: sh
+
+  pip install pyspark==2.4.2
+
+or `download a specific distribution <https://spark.apache.org/downloads.html>`_.
+
+Install the Python frontend from pip:
+
+.. code-block:: sh
+
+  pip install glow.py
+
+and then start the `Spark shell <http://spark.apache.org/docs/latest/rdd-programming-guide.html#using-the-shell>`_ 
+with the Glow maven package:
+
+.. code-block:: sh
+
+  ./bin/pyspark --packages io.projectglow:glow_2.11:0.1.0
+
+To start a Jupyter notebook instead of a shell:
+
+.. code-block:: sh
+
+  PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook ./bin/pyspark --packages io.projectglow:glow_2.11:0.1.0
+
+And now your notebook is glowing! To access the Glow functions, you need to register them with the
+Spark session.
+
+.. code-block:: python
+
+  import glow
+  glow.register(spark)
+  df = spark.read.format('vcf').load('my_first.vcf')
+
+Running in the cloud
+--------------------
+
+The easiest way to use Glow in the cloud is with the `Databricks Runtime for Genomics
+<https://docs.databricks.com/runtime/genomicsruntime.html>`_. However, it works with any cloud
+provider or Spark distribution. You need to install the maven package
+``io.project:glow_2.11:${version}`` and optionally the Python frontend ``glow.py``.
+
+Example notebook
+----------------
+
+This notebook demonstrates some of the key functionality of Glow, like reading in a genomic dataset,
+saving it as a `Delta Lake <https://delta.io>`_, and performing a genome-wide assocation study.
+
+.. notebook:: _static/notebooks/tertiary/gwas.html
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,11 +1,17 @@
 Glow
 ====
 
-Glow is an open-source genomic data analysis tool using `Apache Spark <https://spark.apache.org>`__.
+Glow is an open-source toolkit for working with genomic data at biobank-scale and beyond. The
+toolkit is natively built on Apache Spark, the leading unified engine for big data processing and
+machine learning, enabling the scale of the cloud for genomics workflows.
 
 .. toctree::
    :maxdepth: 2
 
+   introduction
+   getting-started
    etl/index
    tertiary/index
+   additional-resources
+
 .. modules
diff --git a/docs/source/introduction.rst b/docs/source/introduction.rst
@@ -0,0 +1,23 @@
+Introduction to Glow
+====================
+
+Glow aims to simplify genomic workflows at scale. The best way to accomplish this goal is to take a system that
+has already been proven to work and adapt it to fit into the genomics ecosystem.
+
+Apache Spark and in particular `Spark SQL <https://spark.apache.org/sql/>`_, its module for working with
+structured data, is used at organizations across industries with datasets at the petabyte scale and
+beyond. Glow smoothes the rough edges so that you can be productive immediately.
+
+Glow features:
+
+- Genomic datasources: To read datasets in common file formats like VCF and BGEN into Spark SQL DataFrames.
+- Genomic functions: Common operations like computing quality control statistics, running regression
+  tests, and performing simple transformations are provided as Spark SQL functions that can be
+  called from Python, SQL, Scala, or R.
+- Data preparation building blocks: Glow includes transformations like variant normalization and
+  lift over to help produce analysis ready datasets.
+- Integration with existing tools: With Spark SQL, you can write user-defined functions (UDFs) in
+  Python, R, or Scala. Glow also makes it easy to run DataFrames through command line tools.
+- Integration with other data types: Genomic data can generate additional insights when joined with data sets
+  such as electronic health records, real world evidence, and medical images. Since Glow returns native Spark
+  SQL DataFrames, its simple to join multiple data sets together.
diff --git a/static/favicon.ico b/static/favicon.ico
diff --git a/static/glow_logo_horiz_color.png b/static/glow_logo_horiz_color.png
diff --git a/static/glow_logo_horiz_color_dark_bg.png b/static/glow_logo_horiz_color_dark_bg.png