`Sphinx`_ is what is called a documentation generator. This means that it takes a bunch of source files in plain text, and generates a bunch of other awesome things, mainly HTML. For our use case you can think of it as a program that takes in plain text files in `reStructuredText`_ format, and outputs HTML.
reST -> Sphinx -> HTML
So as a user of Sphinx, your main job will be writing these text files. This means that you should be minimally familiar with `reStructuredText`_ as a language. It's similar to Markdown in a lot of ways, if you are already familiar with Markdown.
The first step is installing `Sphinx`_. Sphinx is a python project, so it can be installed like any other python library. Every Operating System should have Python pre-installed, so you should just have to run:
sudo easy_install Sphinx
Note
Advanced users can install this in a virtualenv if they wish.
Also, pip install Sphinx
works fine if you have Pip.
Now you are ready to creating documentation.
Create a directory called crawler
.
Inside that directory you should create a docs
directory,
and move into it:
mkdir crawler cd crawler mkdir docs cd docs
Then you can create the Sphinx project skeleton in this directory:
sphinx-quickstart
accepting all the defaults, calling the project Crawler
, and giving it a 1.0 version.
Your file system should now look similar to this:
crawler/ docs/ conf.py index.rst Makefile
We have a top-level docs
directory in the main project directory.
Inside of this is:
index.rst
:- This is the index file for the documentation, or what lives at
/
. It normally contains a Table of Contents that will link to all other pages of the documentation. conf.py
: which allows for customization of Sphinx.- You won't need to use this too much yet, but it's good to be familiar with this file.
Makefile
: This ships with Sphinx,- and is the main interface for local development, and shouldn't be changed.
_build
: The directory that your output files go into.
_static
: The directory to include all your static files, like images.
_templates
: Allows you to override Sphinx templates to customze look and feel.
Let's build our docs into HTML to see how it works. Simply run:
# Inside top-level docs/ directory. make html
This should run Sphinx in your shell, and output HTML.
At the end, it should say something about the documents being ready in
_build/html
.
You can now open them in your browser by typing:
open _build/html/index.html
This should display a rendered HTML page that says Welcome to Crawler’s documentation! at the top.
make html
is the main way you will build HTML documentation locally.
It is simply a wrapper around a more complex call to Sphinx.
Now that we have our basic skeleton, let's document the project. As you might have guessed from the name, we'll be documenting a basic web crawler.
For this project, we'll have the following pages:
- Index Page
- Support
- Installation
- Cookbook/Examples
- Command Line Options
- Changelog
Let's start with the Support page.
It's always important that users can ask questions when they get stuck. There are many ways to handle this, but normal approaches are to have an IRC channel and mailing list.
Go ahead and put this markup in your support.rst
:
======= Support ======= The easiest way to get help with the project is to join the ``#crawler`` channel on Freenode_. We hang out there and you can get real-time help with your projects. The other good way is to open an issue on Github_. The mailing list at https://groups.google.com/forum/#!forum/crawler is also available for support. .. _Freenode: irc://freenode.net .. _Github: https://github.com/example/crawler/issues
.. index:: pair: Syntax; Hyperlink
The main new markup here is the link syntax.
The link text is set by putting a _
after some text.
The `
is used to group text,
allowing you to include multiple words in your link text.
You should use the `
,
even when the link text is only one word.
This keeps the syntax consistent.
The link target is defined at the bottom of the section with .. _<link text>: <target>
.
Installation documentation is really important. Anyone who is coming to the project will need to install it. For our example, we are installing a basic Python script, so it will be pretty easy.
Include the following in your install.rst
:
============ Installation ============ At the command line:: easy_install crawler Or, if you have virtualenvwrapper installed: .. code-block:: bash mkvirtualenv crawler pip install crawler
.. index:: pair: Syntax; Code Example
This snippet introduces a couple of simple concepts.
The syntax for displaying code is ::
.
When it is used at the end of a sentence,
Sphinx is smart and displays one :
in the output,
and knows there is a code example in the following indented block.
Sphinx,
like Python,
uses meaningful whitespace.
Blocks of content are structured based on the indention level they are on.
You can see this concept with our code-block
directive above.
.. index:: pair: Syntax; TOC Tree
Now would be a good time to introduce the toctree
.
One of the main concepts in Sphinx is that it allows multiple pages to be combined into a cohesive hierarchy.
The toctree
directive is a fundamental part of this structure.
A simple toctree
directive looks like this:
.. toctree:: :maxdepth: 2 install support
This will then output a Table of Contents in the page where this occurs. It will output the top-level headers of the pages as listed. This also tells Sphinx that the other pages are sub-pages of the current page.
You should go ahead and include the above toctree
directive in your index.rst
file.
Now that you have a few pages of content, go ahead and build your docs again:
make html
If you open up your index.html
,
you should see the basic structure of your docs from the included toctree
directive.
The beauty of Sphinx is that it can output in multiple formats, not just HTML. All of those formats share the same base format though, so you only have to change things in one place. So you can generate a manpage for your docs:
make man
This will place a manpage in _build/man
.
You can then view it with:
man _build/man/crawler.1
Some people prefer one large HTML document, instead of having to look through multiple pages. This is another area where Sphinx shines. You can write your documentation in multiple files to make editing and updating easier. Then if you want to distribute a single page HTML version:
make singlehtml
This will combine all of your HTML pages into a single page. Check it out by opening it in your browser:
open _build/singlehtml/index.html
You'll notice that it included the documents in the order that your :index:`TOC Tree` was defined.
Let's go ahead and add a cookbook to our documentation. Users will often come to your project to solve the same problems. Including a Cookbook or Examples section will be a great resource for this content.
In your cookbook.rst
,
add the following:
======== Cookbook ======== Crawl a web page ---------------- The most simple way to use our program is with no arguments. Simply run:: crawler <url> to crawl a webpage. Crawl a page slowly ------------------- To add a delay to your crawler, use :option:`-d`:: crawler -d 10 <url> This will wait 10 seconds between page fetches. Crawl only your blog -------------------- You will want to use the :option:`-i` flag, which while ignore URLs matching the passed regex:: crawler -i "^blog/" <url> This will only crawl pages that contain your blog URL. Only crawl certain pages ------------------------ You will want to use the :option:`-i` flag, which while ignore URLs matching the passed regex:: crawler -i "pdf$" <url> This will ignore URLs that end in PDF.