Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 29 additions & 20 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of Spark you currently have checked out of revision control.

## Prerequisites
## Building Documentation
There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> "similar to the main documentation site at ..."
Start a new sentence like "with all APIs documented. Partial ..."
I think this could be clarified: "Partial documentation builds, for a specific language or API, are also possible"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline after section heading, like others

to the main documentation site at https://spark.apache.org/documentation.html. Partial documentation build is for
a specific language or API, are also possible.

### Prerequisites

The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
Python, R and SQL.

You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
For complete documentation all tools below must be installed **including Optionals**.

You need to have the JDK, Scala, [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
[Python](https://docs.python.org/3.8/using/unix.html#getting-and-installing-the-latest-version-of-python)
installed. Make sure the `bundle` command is available, if not install the Gem containing it:

```sh
Expand Down Expand Up @@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph

### R API Documentation (Optional)

If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
and install these libraries:
If you'd like to generate R API documentation, you'll need to install these packages and libraries:

```sh
$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think we should better make it independent from the OS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestion? I can only suggest adding a Dockerfile similar to this one to build and test the changes or omit these installs as they are for linux? In the last case it makes again not complete and one needs to figure it out what to install every time.

libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
```

```sh
$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, rmarkdown depends on markdown IIRC. rmarkdown falls back to markdown. Was this required in your env?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @huaxingao FYI who faced a similar problem before IIRC ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I finally managed to understand what's going on...
So, I'm using this Docker for the build. And tested with and without markdown package and it fails without and I couldn't understand how it succeeds in the build and test CI phase. So, apparently it's installed on the base image (and others I am adding here) from @dongjoon-hyun 's Docker image (BTW, where is the source of this Dockerfile kept?). So, some packages are "reinstalled" during Build and test and some not hence the confusion.
Additionally, I tested building "without" rmarkdown and it succeeds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same problem: I tested with and without markdown package and it failed without.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-tested that number of times in docker containers and it always fails if package is not installed. So, yes, in short markdown is required package.

$ sudo Rscript -e 'devtools::install_version("roxygen2", version = "7.1.2", repos="https://cloud.r-project.org/")'
$ sudo Rscript -e "devtools::install_version('pkgdown', version='2.0.1', repos='https://cloud.r-project.org')"
$ sudo Rscript -e "devtools::install_version('preferably', version='0.4', repos='https://cloud.r-project.org')"
Expand All @@ -89,17 +100,7 @@ you have checked out or downloaded.
In this directory you will find text files formatted using Markdown, with an ".md" suffix. You can
read those text files directly if you want. Start with `index.md`.

Execute `SKIP_API=1 bundle exec jekyll build` from the `docs/` directory to compile the site. Compiling the site with
Jekyll will create a directory called `_site` containing `index.html` as well as the rest of the
compiled files.

```sh
$ cd docs
# Skip generating API docs (which takes a while)
$ SKIP_API=1 bundle exec jekyll build
```

You can also generate the default Jekyll build with API Docs as follows:
You can generate the complete website from the `docs/` directory as follows:

```sh
$ bundle exec jekyll build
Expand All @@ -111,7 +112,7 @@ $ bundle exec jekyll serve --watch
$ PRODUCTION=1 bundle exec jekyll build
```

## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
## Generating individual API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APIs are "Scala", "Java", "Python", "R". roxygen2, mkdocs, sphinx are not APIs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm confused, weren't the sections above already about generating individual API docs?


You can build just the Spark scaladoc and javadoc by running `./build/sbt unidoc` from the `$SPARK_HOME` directory.

Expand All @@ -129,6 +130,14 @@ The jekyll plugin also generates the PySpark docs using [Sphinx](http://sphinx-d
using [roxygen2](https://cran.r-project.org/web/packages/roxygen2/index.html) and SQL docs
using [MkDocs](https://www.mkdocs.org/).

NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run `SKIP_API=1
bundle exec jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used
NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, see below example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"see the example below"

In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used
to skip a single step of the corresponding language. `SKIP_SCALADOC` indicates skipping both the Scala and Java docs.

For example:

```sh
$ cd docs
# Skip generating API docs (which takes a while)
$ SKIP_API=1 bundle exec jekyll build
```