-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-38210][DOCS] Improve documentation generation README #35516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown) | |
| documentation yourself. Why build it yourself? So that you have the docs that correspond to | ||
| whichever version of Spark you currently have checked out of revision control. | ||
|
|
||
| ## Prerequisites | ||
| ## Building Documentation | ||
| There are two ways to build Spark documentation, complete and partial. Complete will build a site similar | ||
| to the main documentation site at https://spark.apache.org/documentation.html. Partial documentation build is for | ||
| a specific language or API, are also possible. | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java, | ||
| Python, R and SQL. | ||
|
|
||
| You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and | ||
| [Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python) | ||
| For complete documentation all tools below must be installed **including Optionals**. | ||
|
|
||
| You need to have the JDK, Scala, [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and | ||
| [Python](https://docs.python.org/3.8/using/unix.html#getting-and-installing-the-latest-version-of-python) | ||
| installed. Make sure the `bundle` command is available, if not install the Gem containing it: | ||
|
|
||
| ```sh | ||
|
|
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph | |
|
|
||
| ### R API Documentation (Optional) | ||
|
|
||
| If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html) | ||
| and install these libraries: | ||
| If you'd like to generate R API documentation, you'll need to install these packages and libraries: | ||
|
|
||
| ```sh | ||
| $ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \ | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, I think we should better make it independent from the OS
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have any suggestion? I can only suggest adding a Dockerfile similar to this one to build and test the changes or omit these installs as they are for linux? In the last case it makes again not complete and one needs to figure it out what to install every time. |
||
| libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev | ||
| ``` | ||
|
|
||
| ```sh | ||
| $ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")' | ||
| $ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")' | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm,
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @huaxingao FYI who faced a similar problem before IIRC ..
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I finally managed to understand what's going on...
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had the same problem: I tested with and without markdown package and it failed without.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I re-tested that number of times in docker containers and it always fails if package is not installed. So, yes, in short |
||
| $ sudo Rscript -e 'devtools::install_version("roxygen2", version = "7.1.2", repos="https://cloud.r-project.org/")' | ||
| $ sudo Rscript -e "devtools::install_version('pkgdown', version='2.0.1', repos='https://cloud.r-project.org')" | ||
| $ sudo Rscript -e "devtools::install_version('preferably', version='0.4', repos='https://cloud.r-project.org')" | ||
|
|
@@ -89,17 +100,7 @@ you have checked out or downloaded. | |
| In this directory you will find text files formatted using Markdown, with an ".md" suffix. You can | ||
| read those text files directly if you want. Start with `index.md`. | ||
|
|
||
| Execute `SKIP_API=1 bundle exec jekyll build` from the `docs/` directory to compile the site. Compiling the site with | ||
| Jekyll will create a directory called `_site` containing `index.html` as well as the rest of the | ||
| compiled files. | ||
|
|
||
| ```sh | ||
| $ cd docs | ||
| # Skip generating API docs (which takes a while) | ||
| $ SKIP_API=1 bundle exec jekyll build | ||
| ``` | ||
|
|
||
| You can also generate the default Jekyll build with API Docs as follows: | ||
| You can generate the complete website from the `docs/` directory as follows: | ||
|
|
||
| ```sh | ||
| $ bundle exec jekyll build | ||
|
|
@@ -111,7 +112,7 @@ $ bundle exec jekyll serve --watch | |
| $ PRODUCTION=1 bundle exec jekyll build | ||
| ``` | ||
|
|
||
| ## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs) | ||
| ## Generating individual API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. APIs are "Scala", "Java", "Python", "R". roxygen2, mkdocs, sphinx are not APIs
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I'm confused, weren't the sections above already about generating individual API docs? |
||
|
|
||
| You can build just the Spark scaladoc and javadoc by running `./build/sbt unidoc` from the `$SPARK_HOME` directory. | ||
|
|
||
|
|
@@ -129,6 +130,14 @@ The jekyll plugin also generates the PySpark docs using [Sphinx](http://sphinx-d | |
| using [roxygen2](https://cran.r-project.org/web/packages/roxygen2/index.html) and SQL docs | ||
| using [MkDocs](https://www.mkdocs.org/). | ||
|
|
||
| NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run `SKIP_API=1 | ||
| bundle exec jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used | ||
| NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, see below example. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "see the example below" |
||
| In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used | ||
| to skip a single step of the corresponding language. `SKIP_SCALADOC` indicates skipping both the Scala and Java docs. | ||
|
|
||
| For example: | ||
|
|
||
| ```sh | ||
| $ cd docs | ||
| # Skip generating API docs (which takes a while) | ||
| $ SKIP_API=1 bundle exec jekyll build | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> "similar to the main documentation site at ..."
Start a new sentence like "with all APIs documented. Partial ..."
I think this could be clarified: "Partial documentation builds, for a specific language or API, are also possible"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newline after section heading, like others