[SPARK-38210][DOCS] Improve documentation generation README#35516
[SPARK-38210][DOCS] Improve documentation generation README#35516khalidmammadov wants to merge 2 commits intoapache:masterfrom
Conversation
|
I have also made this Dockerfile to make the process even easier. Would that be valuable to add to the repo? |
|
|
||
| ```sh | ||
| $ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")' | ||
| $ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")' |
There was a problem hiding this comment.
Hm, rmarkdown depends on markdown IIRC. rmarkdown falls back to markdown. Was this required in your env?
There was a problem hiding this comment.
cc @huaxingao FYI who faced a similar problem before IIRC ..
There was a problem hiding this comment.
I think I finally managed to understand what's going on...
So, I'm using this Docker for the build. And tested with and without markdown package and it fails without and I couldn't understand how it succeeds in the build and test CI phase. So, apparently it's installed on the base image (and others I am adding here) from @dongjoon-hyun 's Docker image (BTW, where is the source of this Dockerfile kept?). So, some packages are "reinstalled" during Build and test and some not hence the confusion.
Additionally, I tested building "without" rmarkdown and it succeeds.
There was a problem hiding this comment.
I had the same problem: I tested with and without markdown package and it failed without.
There was a problem hiding this comment.
I re-tested that number of times in docker containers and it always fails if package is not installed. So, yes, in short markdown is required package.
docs/README.md
Outdated
| whichever version of Spark you currently have checked out of revision control. | ||
|
|
||
| ## Prerequisites | ||
| ## Building documentation |
| If you'd like to generate R API documentation, you'll need to install these packages and libraries: | ||
|
|
||
| ```sh | ||
| $ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \ |
There was a problem hiding this comment.
Hm, I think we should better make it independent from the OS
There was a problem hiding this comment.
Do you have any suggestion? I can only suggest adding a Dockerfile similar to this one to build and test the changes or omit these installs as they are for linux? In the last case it makes again not complete and one needs to figure it out what to install every time.
|
|
||
| ## Prerequisites | ||
| ## Building documentation | ||
| There are two ways to build Spark documentation, complete and partial. Complete will build a site similar |
There was a problem hiding this comment.
-> "similar to the main documentation site at ..."
Start a new sentence like "with all APIs documented. Partial ..."
I think this could be clarified: "Partial documentation builds, for a specific language or API, are also possible"
There was a problem hiding this comment.
Newline after section heading, like others
docs/README.md
Outdated
|
|
||
| You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and | ||
| [Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python) | ||
| For complete documentation all below tools must be installed **including Optionals**. |
docs/README.md
Outdated
| [Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python) | ||
| For complete documentation all below tools must be installed **including Optionals**. | ||
|
|
||
| You need to have JDK, Scala, [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and |
docs/README.md
Outdated
| ``` | ||
|
|
||
| ## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs) | ||
| You can optionally skip API build (for partial build) as it takes time |
There was a problem hiding this comment.
This needs a rewrite: "To create a partial build without API docs (which can take a long time), use SKIP_API=1:"
But then I thought partial builds were just the API docs? this addition is confusing
|
Can one of the admins verify this patch? |
srowen
left a comment
There was a problem hiding this comment.
I find a lot of this change a bit confusing, not sure it is helping docs
|
|
||
| ## Prerequisites | ||
| ## Building documentation | ||
| There are two ways to build Spark documentation, complete and partial. Complete will build a site similar |
There was a problem hiding this comment.
Newline after section heading, like others
| ``` | ||
|
|
||
| ## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs) | ||
| ## Generating individual API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs) |
There was a problem hiding this comment.
APIs are "Scala", "Java", "Python", "R". roxygen2, mkdocs, sphinx are not APIs
There was a problem hiding this comment.
Also, I'm confused, weren't the sections above already about generating individual API docs?
|
|
||
| NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run `SKIP_API=1 | ||
| bundle exec jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used | ||
| NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, see below example. |
What changes were proposed in this pull request?
Current instructions in README file is not complete and not sufficient to complete site build for testing and validation.
After number of trial and errors I have managed to build it. In the process I had to install number of additional packages.
This PR purposes improvements to the documentation to avoid spending similar efforts for contributors.
Why are the changes needed?
Improve Spark documentation generation procedure
Does this PR introduce any user-facing change?
No
How was this patch tested?
I have started a docker container:
docker run --name spark_doc_build_new -p 4000:4000 -it spark_doc_build_imageand installed everything as per below
and checked via jekyll serve from host
bundle exec jekyll serve --host 0.0.0.0