Skip to content

[SPARK-38210][DOCS] Improve documentation generation README#35516

Closed
khalidmammadov wants to merge 2 commits intoapache:masterfrom
khalidmammadov:fix_doc_generation_md
Closed

[SPARK-38210][DOCS] Improve documentation generation README#35516
khalidmammadov wants to merge 2 commits intoapache:masterfrom
khalidmammadov:fix_doc_generation_md

Conversation

@khalidmammadov
Copy link
Contributor

What changes were proposed in this pull request?

Current instructions in README file is not complete and not sufficient to complete site build for testing and validation.
After number of trial and errors I have managed to build it. In the process I had to install number of additional packages.
This PR purposes improvements to the documentation to avoid spending similar efforts for contributors.

Why are the changes needed?

Improve Spark documentation generation procedure

Does this PR introduce any user-facing change?

No

How was this patch tested?

I have started a docker container:
docker run --name spark_doc_build_new -p 4000:4000 -it spark_doc_build_image
and installed everything as per below

apt-get update
apt-get -y install  git nano
apt-get -y install  curl
apt-get -y install  ruby-full
apt-get -y install  python3 pip
apt-get -y install  gnupg

echo "deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/" >> /etc/apt/sources.list
apt-key adv --keyserver keyserver.ubuntu.com --recv-key '95C0FAF38DB3CCAD0C080A7BDC78B2DDEABC47B7'
apt-get update

apt-get -y install  r-base
apt-get -y install  pandoc libxml2-dev
apt-get -y install  libcurl4-openssl-dev
apt-get -y install  libssl-dev
apt-get -y install  libfontconfig1-dev libharfbuzz-dev   libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev

Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
Rscript -e 'devtools::install_version("roxygen2", version = "7.1.2", repos="https://cloud.r-project.org/")'
Rscript -e "devtools::install_version('pkgdown', version='2.0.1', repos='https://cloud.r-project.org')"
Rscript -e "devtools::install_version('preferably', version='0.4', repos='https://cloud.r-project.org')"

echo 'deb http://security.debian.org/debian-security stretch/updates main' >> /etc/apt/sources.list
apt-get update
apt-get -y install  openjdk-8-jdk
apt-get -y install  scala

git clone https://github.com/apache/spark.git
cd spark/doc

gem install bundler
bundle install
bundle exec jekyll build

and checked via jekyll serve from host
bundle exec jekyll serve --host 0.0.0.0

@github-actions github-actions bot added the DOCS label Feb 14, 2022
@khalidmammadov
Copy link
Contributor Author

I have also made this Dockerfile to make the process even easier. Would that be valuable to add to the repo?
https://github.com/khalidmammadov/spark/pull/1/files


```sh
$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, rmarkdown depends on markdown IIRC. rmarkdown falls back to markdown. Was this required in your env?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @huaxingao FYI who faced a similar problem before IIRC ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I finally managed to understand what's going on...
So, I'm using this Docker for the build. And tested with and without markdown package and it fails without and I couldn't understand how it succeeds in the build and test CI phase. So, apparently it's installed on the base image (and others I am adding here) from @dongjoon-hyun 's Docker image (BTW, where is the source of this Dockerfile kept?). So, some packages are "reinstalled" during Build and test and some not hence the confusion.
Additionally, I tested building "without" rmarkdown and it succeeds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same problem: I tested with and without markdown package and it failed without.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-tested that number of times in docker containers and it always fails if package is not installed. So, yes, in short markdown is required package.

docs/README.md Outdated
whichever version of Spark you currently have checked out of revision control.

## Prerequisites
## Building documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d -> D

If you'd like to generate R API documentation, you'll need to install these packages and libraries:

```sh
$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think we should better make it independent from the OS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestion? I can only suggest adding a Dockerfile similar to this one to build and test the changes or omit these installs as they are for linux? In the last case it makes again not complete and one needs to figure it out what to install every time.


## Prerequisites
## Building documentation
There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> "similar to the main documentation site at ..."
Start a new sentence like "with all APIs documented. Partial ..."
I think this could be clarified: "Partial documentation builds, for a specific language or API, are also possible"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline after section heading, like others

docs/README.md Outdated

You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
For complete documentation all below tools must be installed **including Optionals**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below tools -> tools below

docs/README.md Outdated
[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
For complete documentation all below tools must be installed **including Optionals**.

You need to have JDK, Scala, [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDK -> the JDK

docs/README.md Outdated
```

## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
You can optionally skip API build (for partial build) as it takes time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a rewrite: "To create a partial build without API docs (which can take a long time), use SKIP_API=1:"
But then I thought partial builds were just the API docs? this addition is confusing

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find a lot of this change a bit confusing, not sure it is helping docs


## Prerequisites
## Building documentation
There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline after section heading, like others

```

## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
## Generating individual API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APIs are "Scala", "Java", "Python", "R". roxygen2, mkdocs, sphinx are not APIs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm confused, weren't the sections above already about generating individual API docs?


NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run `SKIP_API=1
bundle exec jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used
NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, see below example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"see the example below"

@srowen srowen closed this Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants