Add CI with link checker.#3584
Conversation
16df190 to
5f157a0
Compare
d78f2d7 to
8bd3e06
Compare
kolchfa-aws
left a comment
There was a problem hiding this comment.
Thank you, dB! This will partly close #681. The only other improvement would be to add a broken image check.
| def self.verify(_site) | ||
| return unless @check_links | ||
|
|
||
| @base_url_matcher = %r{^#{@site.config["url"]}#{@site.baseurl}(/.*)$}.freeze |
There was a problem hiding this comment.
Could we add a check for the case where a writer forgets a slash after the base URL? For example, {{site.url}}{{site.baseurl}}opensearch/supported-field-types/range/.
There was a problem hiding this comment.
It's a bit tricky, let's defer this to a future PR.
|
In opensearch-project/project-website#1470 we changed the link checker to run on push to main only (not in PRs) and autocut a ticket. Would you prefer that to the implementation that checks on every PR, @kolchfa-aws? |
|
@dblock I think the best way would be to run an internal link checker on every PR, and run an all link checker (primarily for the purpose of identifying external links) on a cron schedule once a week and create an issue if there are failures. Alternatively, we can run an all link checker on every PR. Also, we are running an internal link checker locally every time we build, so could you please make a change to the build.sh file to include an internal link checker env variable? |
|
@kolchfa-aws Updated!
|
|
@dblock Thanks so much! Is there a way to run the local build in quiet mode so it only displays errors and not info? If not, that's fine because error messages come last. It would be nice to exit without an exception, but if it's not possible, it's fine as well. |
cwillum
left a comment
There was a problem hiding this comment.
LGTM. just nit picking comments.
| It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior. | ||
|
|
||
| Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://api.semanticscholar.org/CorpusID:927435). | ||
| Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9). |
There was a problem hiding this comment.
extra space after "detection"?
|
|
||
| ## | ||
| # Defines the priority of the plugin | ||
| # The hooks are registered with a very low priority to make sure they runs after any content modifying hook |
There was a problem hiding this comment.
"... make sure they run after ..." verb agreement
Not sure if it's necessary to make proper sentences above here and add periods. Just notes, I suppose.
There was a problem hiding this comment.
I won't change them all just to match the code in project-website repo for now.
| Adding `stdout{}` to the `output{}` section of your `pipeline.conf` file prints the query results to the console. | ||
|
|
||
| To reindex the data into an OpenSearch domain, add the destination domain configuration in the `output{}` section like shown [here](https://opensearch.org/docs/latest/clients/logstash/ship-to-opensearch/#opensearch-output-plugin). | ||
| To reindex the data into an OpenSearch domain, add the destination domain configuration in the `output{}` section like shown [here](https://opensearch.org/docs/latest/tools/logstash/index/). |
There was a problem hiding this comment.
"... section, as shown in the Logstash documentation."
There was a problem hiding this comment.
Going to skip changing this, it's an existing content on the website and I'm only fixing links.
|
@kolchfa-aws would it be helpful if it ran locally every time before you tried to push? I've thought about adding local commit hooks that could run things like this automatically when you push. |
|
@dtaivpp We're running a build script locally when we build the site that checks internal links (build.sh). I think it accomplishes the same thing. |
|
@dblock Could we run fatally on a pull request? We would like the PR to fail if it has broken links. |
We could, but then it would be different from opensearch-project/project-website#1470, which had the exact opposite opinion via @krisfreedain, and different from what you suggested in #3584 (comment) :) What would you like to do? The argument against breaking on PR is that it takes several minutes to run the link checker, and most broken links will be outside of the scope of the PR being made IMO (some existing website, partner, etc., disappearing). With this change as is you'll get a GitHub issue on every push that some link is broken, so it can be dealt with separately. |
Not easily. Let's consider this as future improvements. |
|
@dblock After talking it over with the team, we'd like to try the following if possible:
I realize that this is different from what the project site is doing, but I think it's ok. Our main goal is to enforce internal link fixing, where we see the most problems. If the run takes several minutes, it's ok as well. And if we don't allow merging without link fixing, then theoretically the only broken links in a given PR should come only from the changes in that PR. If we do that, can we then rerun the link checker automatically when there is a push to the same PR that fixes the links? Also, if we could annotate the PR with the list of broken links and the file names where the broken links exist, that would be ideal. Thank you! |
yeah - on the project-website, I've filed an issue to have it removed from the build process completely (opensearch-project/project-website#1501) and just be something similar to @kolchfa-aws's feedback of "On a cron schedule, once a week, check all links and create an issue with the links that are broken." |
|
Thank you, @dblock! |
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.0 2.0
# Navigate to the new working tree
pushd ../.worktrees/backport-2.0
# Create a new branch
git switch --create backport/backport-3584-to-2.0
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.0
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.0Then, create a pull request where the |
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-1.3 1.3
# Navigate to the new working tree
pushd ../.worktrees/backport-1.3
# Create a new branch
git switch --create backport/backport-3584-to-1.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-1.3
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-1.3Then, create a pull request where the |
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.1 2.1
# Navigate to the new working tree
pushd ../.worktrees/backport-2.1
# Create a new branch
git switch --create backport/backport-3584-to-2.1
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.1
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.1Then, create a pull request where the |
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.2 2.2
# Navigate to the new working tree
pushd ../.worktrees/backport-2.2
# Create a new branch
git switch --create backport/backport-3584-to-2.2
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.2
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.2Then, create a pull request where the |
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.3 2.3
# Navigate to the new working tree
pushd ../.worktrees/backport-2.3
# Create a new branch
git switch --create backport/backport-3584-to-2.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.3
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.3Then, create a pull request where the |
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.5 2.5
# Navigate to the new working tree
pushd ../.worktrees/backport-2.5
# Create a new branch
git switch --create backport/backport-3584-to-2.5
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.5
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.5Then, create a pull request where the |
* Add CI with link checker. Signed-off-by: dblock <dblock@amazon.com> * Capture URI::InvalidURIError. Signed-off-by: dblock <dblock@amazon.com> * Use HEAD and catch URI errors. Signed-off-by: dblock <dblock@amazon.com> * Retry on a 405 with a GET. Signed-off-by: dblock <dblock@amazon.com> * Replaced external link checker with ruby-link-checker. Signed-off-by: dblock <dblock@amazon.com> * Don't exit with an exception. Signed-off-by: dblock <dblock@amazon.com> * Run internal link checker on build/ci. Signed-off-by: dblock <dblock@amazon.com> * Added broken links issue template. Signed-off-by: dblock <dblock@amazon.com> * Added host exclusions that 404 or fail on bots. Signed-off-by: dblock <dblock@amazon.com> * Raise anyway because Jekyll does it for us. Signed-off-by: dblock <dblock@amazon.com> * Fix broken links. Signed-off-by: dblock <dblock@amazon.com> * Only run link checker on main. Signed-off-by: dblock <dblock@amazon.com> * Re-add check-links.sh. Signed-off-by: dblock <dblock@amazon.com> * Run once a day on cron. Signed-off-by: dblock <dblock@amazon.com> --------- Signed-off-by: dblock <dblock@amazon.com> (cherry picked from commit 680c821) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.4 2.4
# Navigate to the new working tree
pushd ../.worktrees/backport-2.4
# Create a new branch
git switch --create backport/backport-3584-to-2.4
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 680c821937d1a9b9b2330b974ad6c62ce9a3d169
# Push it to GitHub
git push --set-upstream origin backport/backport-3584-to-2.4
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.4Then, create a pull request where the |
|
@kolchfa-aws do you need me to backport some of these? |
* Add CI with link checker. * Capture URI::InvalidURIError. * Use HEAD and catch URI errors. * Retry on a 405 with a GET. * Replaced external link checker with ruby-link-checker. * Don't exit with an exception. * Run internal link checker on build/ci. * Added broken links issue template. * Added host exclusions that 404 or fail on bots. * Raise anyway because Jekyll does it for us. * Fix broken links. * Only run link checker on main. * Re-add check-links.sh. * Run once a day on cron. --------- (cherry picked from commit 680c821) Signed-off-by: dblock <dblock@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
@dblock I plan to do it later today but sure if you want it done sooner😄 |
|
Link checker ran successfully in https://github.com/opensearch-project/documentation-website/actions/runs/4618022749 fyi. |
|
@dblock I already saw. Nice! |
* Add CI with link checker. Signed-off-by: dblock <dblock@amazon.com> * Capture URI::InvalidURIError. Signed-off-by: dblock <dblock@amazon.com> * Use HEAD and catch URI errors. Signed-off-by: dblock <dblock@amazon.com> * Retry on a 405 with a GET. Signed-off-by: dblock <dblock@amazon.com> * Replaced external link checker with ruby-link-checker. Signed-off-by: dblock <dblock@amazon.com> * Don't exit with an exception. Signed-off-by: dblock <dblock@amazon.com> * Run internal link checker on build/ci. Signed-off-by: dblock <dblock@amazon.com> * Added broken links issue template. Signed-off-by: dblock <dblock@amazon.com> * Added host exclusions that 404 or fail on bots. Signed-off-by: dblock <dblock@amazon.com> * Raise anyway because Jekyll does it for us. Signed-off-by: dblock <dblock@amazon.com> * Fix broken links. Signed-off-by: dblock <dblock@amazon.com> * Only run link checker on main. Signed-off-by: dblock <dblock@amazon.com> * Re-add check-links.sh. Signed-off-by: dblock <dblock@amazon.com> * Run once a day on cron. Signed-off-by: dblock <dblock@amazon.com> --------- Signed-off-by: dblock <dblock@amazon.com>
* Add CI with link checker. Signed-off-by: dblock <dblock@amazon.com> * Capture URI::InvalidURIError. Signed-off-by: dblock <dblock@amazon.com> * Use HEAD and catch URI errors. Signed-off-by: dblock <dblock@amazon.com> * Retry on a 405 with a GET. Signed-off-by: dblock <dblock@amazon.com> * Replaced external link checker with ruby-link-checker. Signed-off-by: dblock <dblock@amazon.com> * Don't exit with an exception. Signed-off-by: dblock <dblock@amazon.com> * Run internal link checker on build/ci. Signed-off-by: dblock <dblock@amazon.com> * Added broken links issue template. Signed-off-by: dblock <dblock@amazon.com> * Added host exclusions that 404 or fail on bots. Signed-off-by: dblock <dblock@amazon.com> * Raise anyway because Jekyll does it for us. Signed-off-by: dblock <dblock@amazon.com> * Fix broken links. Signed-off-by: dblock <dblock@amazon.com> * Only run link checker on main. Signed-off-by: dblock <dblock@amazon.com> * Re-add check-links.sh. Signed-off-by: dblock <dblock@amazon.com> * Run once a day on cron. Signed-off-by: dblock <dblock@amazon.com> --------- Signed-off-by: dblock <dblock@amazon.com>
* Add CI with link checker. Signed-off-by: dblock <dblock@amazon.com> * Capture URI::InvalidURIError. Signed-off-by: dblock <dblock@amazon.com> * Use HEAD and catch URI errors. Signed-off-by: dblock <dblock@amazon.com> * Retry on a 405 with a GET. Signed-off-by: dblock <dblock@amazon.com> * Replaced external link checker with ruby-link-checker. Signed-off-by: dblock <dblock@amazon.com> * Don't exit with an exception. Signed-off-by: dblock <dblock@amazon.com> * Run internal link checker on build/ci. Signed-off-by: dblock <dblock@amazon.com> * Added broken links issue template. Signed-off-by: dblock <dblock@amazon.com> * Added host exclusions that 404 or fail on bots. Signed-off-by: dblock <dblock@amazon.com> * Raise anyway because Jekyll does it for us. Signed-off-by: dblock <dblock@amazon.com> * Fix broken links. Signed-off-by: dblock <dblock@amazon.com> * Only run link checker on main. Signed-off-by: dblock <dblock@amazon.com> * Re-add check-links.sh. Signed-off-by: dblock <dblock@amazon.com> * Run once a day on cron. Signed-off-by: dblock <dblock@amazon.com> --------- Signed-off-by: dblock <dblock@amazon.com>
Description
Builds Jekyll site similarly to project-website, runs link checker, fails if a link is broken.
Link checker had some improvements, makes HEAD requests first, which is a lot faster.
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.