Skip to content

Add CI with link checker.#3584

Merged
kolchfa-aws merged 14 commits intoopensearch-project:mainfrom
dblock:add-ci
Apr 4, 2023
Merged

Add CI with link checker.#3584
kolchfa-aws merged 14 commits intoopensearch-project:mainfrom
dblock:add-ci

Conversation

@dblock
Copy link
Member

@dblock dblock commented Mar 27, 2023

Description

Builds Jekyll site similarly to project-website, runs link checker, fails if a link is broken.

Link checker had some improvements, makes HEAD requests first, which is a lot faster.

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dblock dblock requested a review from a team as a code owner March 27, 2023 16:04
@dblock dblock marked this pull request as draft March 27, 2023 16:04
@dblock dblock force-pushed the add-ci branch 5 times, most recently from 16df190 to 5f157a0 Compare March 27, 2023 19:19
@dblock dblock force-pushed the add-ci branch 2 times, most recently from d78f2d7 to 8bd3e06 Compare March 27, 2023 19:52
@dblock dblock marked this pull request as ready for review March 27, 2023 19:58
Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, dB! This will partly close #681. The only other improvement would be to add a broken image check.

def self.verify(_site)
return unless @check_links

@base_url_matcher = %r{^#{@site.config["url"]}#{@site.baseurl}(/.*)$}.freeze
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a check for the case where a writer forgets a slash after the base URL? For example, {{site.url}}{{site.baseurl}}opensearch/supported-field-types/range/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky, let's defer this to a future PR.

@dblock
Copy link
Member Author

dblock commented Mar 29, 2023

In opensearch-project/project-website#1470 we changed the link checker to run on push to main only (not in PRs) and autocut a ticket. Would you prefer that to the implementation that checks on every PR, @kolchfa-aws?

@kolchfa-aws
Copy link
Collaborator

@dblock I think the best way would be to run an internal link checker on every PR, and run an all link checker (primarily for the purpose of identifying external links) on a cron schedule once a week and create an issue if there are failures. Alternatively, we can run an all link checker on every PR. Also, we are running an internal link checker locally every time we build, so could you please make a change to the build.sh file to include an internal link checker env variable?
JEKYLL_LINK_CHECKER=internal bundle exec jekyll serve --host localhost --port 4000 --incremental --livereload --open-url --trace Thanks!

@dblock
Copy link
Member Author

dblock commented Apr 3, 2023

@kolchfa-aws Updated!

This was referenced Apr 3, 2023
@kolchfa-aws
Copy link
Collaborator

@dblock Thanks so much! Is there a way to run the local build in quiet mode so it only displays errors and not info? If not, that's fine because error messages come last. It would be nice to exit without an exception, but if it's not possible, it's fine as well.

Copy link
Contributor

@cwillum cwillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. just nit picking comments.

It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior.

Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://api.semanticscholar.org/CorpusID:927435).
Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space after "detection"?


##
# Defines the priority of the plugin
# The hooks are registered with a very low priority to make sure they runs after any content modifying hook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... make sure they run after ..." verb agreement
Not sure if it's necessary to make proper sentences above here and add periods. Just notes, I suppose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't change them all just to match the code in project-website repo for now.

Adding `stdout{}` to the `output{}` section of your `pipeline.conf` file prints the query results to the console.

To reindex the data into an OpenSearch domain, add the destination domain configuration in the `output{}` section like shown [here](https://opensearch.org/docs/latest/clients/logstash/ship-to-opensearch/#opensearch-output-plugin).
To reindex the data into an OpenSearch domain, add the destination domain configuration in the `output{}` section like shown [here](https://opensearch.org/docs/latest/tools/logstash/index/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... section, as shown in the Logstash documentation."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to skip changing this, it's an existing content on the website and I'm only fixing links.

@dtaivpp
Copy link
Contributor

dtaivpp commented Apr 3, 2023

@kolchfa-aws would it be helpful if it ran locally every time before you tried to push? I've thought about adding local commit hooks that could run things like this automatically when you push.

@kolchfa-aws
Copy link
Collaborator

@dtaivpp We're running a build script locally when we build the site that checks internal links (build.sh). I think it accomplishes the same thing.

@kolchfa-aws
Copy link
Collaborator

@dblock Could we run fatally on a pull request? We would like the PR to fail if it has broken links.

@dblock
Copy link
Member Author

dblock commented Apr 3, 2023

@dblock Could we run fatally on a pull request? We would like the PR to fail if it has broken links.

We could, but then it would be different from opensearch-project/project-website#1470, which had the exact opposite opinion via @krisfreedain, and different from what you suggested in #3584 (comment) :) What would you like to do?

The argument against breaking on PR is that it takes several minutes to run the link checker, and most broken links will be outside of the scope of the PR being made IMO (some existing website, partner, etc., disappearing). With this change as is you'll get a GitHub issue on every push that some link is broken, so it can be dealt with separately.

@dblock
Copy link
Member Author

dblock commented Apr 3, 2023

@dblock Thanks so much! Is there a way to run the local build in quiet mode so it only displays errors and not info? If not, that's fine because error messages come last. It would be nice to exit without an exception, but if it's not possible, it's fine as well.

Not easily. Let's consider this as future improvements.

@kolchfa-aws
Copy link
Collaborator

kolchfa-aws commented Apr 3, 2023

@dblock After talking it over with the team, we'd like to try the following if possible:

  • On every PR, check internal links and don't allow merging until the internal links are fixed.
  • On a cron schedule, once a week, check all links and create an issue with the links that are broken.

I realize that this is different from what the project site is doing, but I think it's ok. Our main goal is to enforce internal link fixing, where we see the most problems. If the run takes several minutes, it's ok as well. And if we don't allow merging without link fixing, then theoretically the only broken links in a given PR should come only from the changes in that PR. If we do that, can we then rerun the link checker automatically when there is a push to the same PR that fixes the links?

Also, if we could annotate the PR with the list of broken links and the file names where the broken links exist, that would be ideal. Thank you!

@krisfreedain
Copy link
Member

krisfreedain commented Apr 4, 2023

@dblock Could we run fatally on a pull request? We would like the PR to fail if it has broken links.

We could, but then it would be different from opensearch-project/project-website#1470, which had the exact opposite opinion via @krisfreedain, and different from what you suggested in #3584 (comment) :) What would you like to do?

The argument against breaking on PR is that it takes several minutes to run the link checker, and most broken links will be outside of the scope of the PR being made IMO (some existing website, partner, etc., disappearing). With this change as is you'll get a GitHub issue on every push that some link is broken, so it can be dealt with separately.

yeah - on the project-website, I've filed an issue to have it removed from the build process completely (opensearch-project/project-website#1501) and just be something similar to @kolchfa-aws's feedback of "On a cron schedule, once a week, check all links and create an issue with the links that are broken."

@kolchfa-aws
Copy link
Collaborator

Thank you, @dblock!

@kolchfa-aws kolchfa-aws merged commit 680c821 into opensearch-project:main Apr 4, 2023
@kolchfa-aws kolchfa-aws added backport 1.3 PR: Backport label for v1.3.x backport 2.0 PR: Backport label for v2.0.x backport 2.1 PR: Backport label for 2.1 backport 2.2 PR: Backport label for 2.2 backport 2.3 PR: Backport label for 2.3 backport 2.4 PR: Backport label for 2.4 backport 2.5 PR: Backport label for 2.5 backport 2.6 PR: Backport label for 2.6 labels Apr 4, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.0 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.0 2.0
# Navigate to the new working tree
pushd ../.worktrees/backport-2.0
# Create a new branch
git switch --create backport/backport-3584-to-2.0
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.0
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.0

Then, create a pull request where the base branch is 2.0 and the compare/head branch is backport/backport-3584-to-2.0.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 1.3 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-1.3 1.3
# Navigate to the new working tree
pushd ../.worktrees/backport-1.3
# Create a new branch
git switch --create backport/backport-3584-to-1.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-1.3
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-1.3

Then, create a pull request where the base branch is 1.3 and the compare/head branch is backport/backport-3584-to-1.3.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.1 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.1 2.1
# Navigate to the new working tree
pushd ../.worktrees/backport-2.1
# Create a new branch
git switch --create backport/backport-3584-to-2.1
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.1
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.1

Then, create a pull request where the base branch is 2.1 and the compare/head branch is backport/backport-3584-to-2.1.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.2 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.2 2.2
# Navigate to the new working tree
pushd ../.worktrees/backport-2.2
# Create a new branch
git switch --create backport/backport-3584-to-2.2
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.2
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.2

Then, create a pull request where the base branch is 2.2 and the compare/head branch is backport/backport-3584-to-2.2.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.3 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.3 2.3
# Navigate to the new working tree
pushd ../.worktrees/backport-2.3
# Create a new branch
git switch --create backport/backport-3584-to-2.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.3
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.3

Then, create a pull request where the base branch is 2.3 and the compare/head branch is backport/backport-3584-to-2.3.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.5 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.5 2.5
# Navigate to the new working tree
pushd ../.worktrees/backport-2.5
# Create a new branch
git switch --create backport/backport-3584-to-2.5
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.5
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.5

Then, create a pull request where the base branch is 2.5 and the compare/head branch is backport/backport-3584-to-2.5.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 4, 2023
* Add CI with link checker.

Signed-off-by: dblock <dblock@amazon.com>

* Capture URI::InvalidURIError.

Signed-off-by: dblock <dblock@amazon.com>

* Use HEAD and catch URI errors.

Signed-off-by: dblock <dblock@amazon.com>

* Retry on a 405 with a GET.

Signed-off-by: dblock <dblock@amazon.com>

* Replaced external link checker with ruby-link-checker.

Signed-off-by: dblock <dblock@amazon.com>

* Don't exit with an exception.

Signed-off-by: dblock <dblock@amazon.com>

* Run internal link checker on build/ci.

Signed-off-by: dblock <dblock@amazon.com>

* Added broken links issue template.

Signed-off-by: dblock <dblock@amazon.com>

* Added host exclusions that 404 or fail on bots.

Signed-off-by: dblock <dblock@amazon.com>

* Raise anyway because Jekyll does it for us.

Signed-off-by: dblock <dblock@amazon.com>

* Fix broken links.

Signed-off-by: dblock <dblock@amazon.com>

* Only run link checker on main.

Signed-off-by: dblock <dblock@amazon.com>

* Re-add check-links.sh.

Signed-off-by: dblock <dblock@amazon.com>

* Run once a day on cron.

Signed-off-by: dblock <dblock@amazon.com>

---------

Signed-off-by: dblock <dblock@amazon.com>
(cherry picked from commit 680c821)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.4 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.4 2.4
# Navigate to the new working tree
pushd ../.worktrees/backport-2.4
# Create a new branch
git switch --create backport/backport-3584-to-2.4
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.4
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.4

Then, create a pull request where the base branch is 2.4 and the compare/head branch is backport/backport-3584-to-2.4.

@dblock dblock deleted the add-ci branch April 4, 2023 20:01
@dblock
Copy link
Member Author

dblock commented Apr 4, 2023

@kolchfa-aws do you need me to backport some of these?

Naarcha-AWS pushed a commit that referenced this pull request Apr 4, 2023
* Add CI with link checker.



* Capture URI::InvalidURIError.



* Use HEAD and catch URI errors.



* Retry on a 405 with a GET.



* Replaced external link checker with ruby-link-checker.



* Don't exit with an exception.



* Run internal link checker on build/ci.



* Added broken links issue template.



* Added host exclusions that 404 or fail on bots.



* Raise anyway because Jekyll does it for us.



* Fix broken links.



* Only run link checker on main.



* Re-add check-links.sh.



* Run once a day on cron.



---------


(cherry picked from commit 680c821)

Signed-off-by: dblock <dblock@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@kolchfa-aws
Copy link
Collaborator

@dblock I plan to do it later today but sure if you want it done sooner😄

@dblock
Copy link
Member Author

dblock commented Apr 5, 2023

@kolchfa-aws
Copy link
Collaborator

@dblock I already saw. Nice!

vagimeli pushed a commit that referenced this pull request Apr 25, 2023
* Add CI with link checker.

Signed-off-by: dblock <dblock@amazon.com>

* Capture URI::InvalidURIError.

Signed-off-by: dblock <dblock@amazon.com>

* Use HEAD and catch URI errors.

Signed-off-by: dblock <dblock@amazon.com>

* Retry on a 405 with a GET.

Signed-off-by: dblock <dblock@amazon.com>

* Replaced external link checker with ruby-link-checker.

Signed-off-by: dblock <dblock@amazon.com>

* Don't exit with an exception.

Signed-off-by: dblock <dblock@amazon.com>

* Run internal link checker on build/ci.

Signed-off-by: dblock <dblock@amazon.com>

* Added broken links issue template.

Signed-off-by: dblock <dblock@amazon.com>

* Added host exclusions that 404 or fail on bots.

Signed-off-by: dblock <dblock@amazon.com>

* Raise anyway because Jekyll does it for us.

Signed-off-by: dblock <dblock@amazon.com>

* Fix broken links.

Signed-off-by: dblock <dblock@amazon.com>

* Only run link checker on main.

Signed-off-by: dblock <dblock@amazon.com>

* Re-add check-links.sh.

Signed-off-by: dblock <dblock@amazon.com>

* Run once a day on cron.

Signed-off-by: dblock <dblock@amazon.com>

---------

Signed-off-by: dblock <dblock@amazon.com>
vagimeli added a commit that referenced this pull request Apr 25, 2023
vagimeli pushed a commit that referenced this pull request May 4, 2023
* Add CI with link checker.

Signed-off-by: dblock <dblock@amazon.com>

* Capture URI::InvalidURIError.

Signed-off-by: dblock <dblock@amazon.com>

* Use HEAD and catch URI errors.

Signed-off-by: dblock <dblock@amazon.com>

* Retry on a 405 with a GET.

Signed-off-by: dblock <dblock@amazon.com>

* Replaced external link checker with ruby-link-checker.

Signed-off-by: dblock <dblock@amazon.com>

* Don't exit with an exception.

Signed-off-by: dblock <dblock@amazon.com>

* Run internal link checker on build/ci.

Signed-off-by: dblock <dblock@amazon.com>

* Added broken links issue template.

Signed-off-by: dblock <dblock@amazon.com>

* Added host exclusions that 404 or fail on bots.

Signed-off-by: dblock <dblock@amazon.com>

* Raise anyway because Jekyll does it for us.

Signed-off-by: dblock <dblock@amazon.com>

* Fix broken links.

Signed-off-by: dblock <dblock@amazon.com>

* Only run link checker on main.

Signed-off-by: dblock <dblock@amazon.com>

* Re-add check-links.sh.

Signed-off-by: dblock <dblock@amazon.com>

* Run once a day on cron.

Signed-off-by: dblock <dblock@amazon.com>

---------

Signed-off-by: dblock <dblock@amazon.com>
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
* Add CI with link checker.

Signed-off-by: dblock <dblock@amazon.com>

* Capture URI::InvalidURIError.

Signed-off-by: dblock <dblock@amazon.com>

* Use HEAD and catch URI errors.

Signed-off-by: dblock <dblock@amazon.com>

* Retry on a 405 with a GET.

Signed-off-by: dblock <dblock@amazon.com>

* Replaced external link checker with ruby-link-checker.

Signed-off-by: dblock <dblock@amazon.com>

* Don't exit with an exception.

Signed-off-by: dblock <dblock@amazon.com>

* Run internal link checker on build/ci.

Signed-off-by: dblock <dblock@amazon.com>

* Added broken links issue template.

Signed-off-by: dblock <dblock@amazon.com>

* Added host exclusions that 404 or fail on bots.

Signed-off-by: dblock <dblock@amazon.com>

* Raise anyway because Jekyll does it for us.

Signed-off-by: dblock <dblock@amazon.com>

* Fix broken links.

Signed-off-by: dblock <dblock@amazon.com>

* Only run link checker on main.

Signed-off-by: dblock <dblock@amazon.com>

* Re-add check-links.sh.

Signed-off-by: dblock <dblock@amazon.com>

* Run once a day on cron.

Signed-off-by: dblock <dblock@amazon.com>

---------

Signed-off-by: dblock <dblock@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 1.3 PR: Backport label for v1.3.x backport 2.0 PR: Backport label for v2.0.x backport 2.1 PR: Backport label for 2.1 backport 2.2 PR: Backport label for 2.2 backport 2.3 PR: Backport label for 2.3 backport 2.4 PR: Backport label for 2.4 backport 2.5 PR: Backport label for 2.5 backport 2.6 PR: Backport label for 2.6

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants