Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7903 solr config hack #8130

Merged
merged 20 commits into from
Nov 3, 2021
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
9dfda2b
refactor(solr): remove external field config file #7903
poikilotherm Oct 4, 2021
d74e463
feat(solr): add new schema config script update-field.sh #7903
poikilotherm Oct 4, 2021
e92eff4
docs(metadata): refactor metadata customization with new solr script …
poikilotherm Oct 4, 2021
e6a6bf9
refactor(solr): remove references to updateSchemaMDB.sh #7903
poikilotherm Oct 4, 2021
70f60b1
refactor(solr): include new update-fields.sh instead of updateSchemaM…
poikilotherm Oct 4, 2021
1288bbd
docs(solr): add release note for update-fields.sh #7903
poikilotherm Oct 4, 2021
c24f034
ci(solr): start adding shellspec tests for update-fields.sh #7903
poikilotherm Oct 4, 2021
63f3f80
ci(solr): finish shellspec tests for update-fields.sh #7903
poikilotherm Oct 5, 2021
97d1a55
feat(solr): add some more validation and make update-fields.sh less n…
poikilotherm Oct 5, 2021
7cb28a9
ci: add shellspec github workflow #7903
poikilotherm Oct 5, 2021
bfb8243
ci(solr): do not test on MacOS due to VERY strange bash problems #7903
poikilotherm Oct 5, 2021
84aa193
refactor(solr): add bash v4+ requirement for update-fields.sh #7903
poikilotherm Oct 5, 2021
44ee9eb
ci(solr): skip update-fields.sh test for no input if on Github #7903
poikilotherm Oct 5, 2021
9fb2bb8
ci(solr): add Shellcheck as Github Action #7903
poikilotherm Oct 5, 2021
baef977
ci(solr): test adding a CentOS based shellspec run #7903
poikilotherm Oct 5, 2021
ea6a1b9
ci(solr): test adding a RockyLinux based shellspec run #7903
poikilotherm Oct 5, 2021
194a5e0
feat(solr): let update-fields.sh check for presence of ed and bc #7903
poikilotherm Oct 5, 2021
3e83032
fix(solr): make update-fields.sh MacOS compatible
poikilotherm Oct 26, 2021
c5b09ff
ci(solr): re-add MacOS shellspec test for update-fields.sh #7903
poikilotherm Oct 26, 2021
7995722
Merge branch 'develop' into 7903-solr-config-hack
poikilotherm Oct 27, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions .github/workflows/shellspec.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: "Shellspec"
on:
push:
paths:
- tests/shell/**
- conf/solr/**
# add more when more specs are written relying on data
pull_request:
paths:
- tests/shell/**
- conf/solr/**
# add more when more specs are written relying on data
env:
SHELLSPEC_VERSION: 0.28.1
jobs:
shellcheck:
name: Shellcheck
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: shellcheck
uses: reviewdog/action-shellcheck@v1
with:
github_token: ${{ secrets.github_token }}
reporter: github-pr-review # Change reporter.
fail_on_error: true
exclude: "./tests/shell/*"
shellspec-ubuntu:
name: "Ubuntu"
runs-on: ubuntu-latest
steps:
- name: Install shellspec
run: curl -fsSL https://git.io/shellspec | sh -s ${{ env.SHELLSPEC_VERSION }} --yes
- uses: actions/checkout@v2
- name: Run Shellspec
run: |
cd tests/shell
shellspec
shellspec-centos7:
name: "CentOS 7"
runs-on: ubuntu-latest
container:
image: centos:7
steps:
- uses: actions/checkout@v2
- name: Install shellspec
run: |
curl -fsSL https://github.com/shellspec/shellspec/releases/download/${{ env.SHELLSPEC_VERSION }}/shellspec-dist.tar.gz | tar -xz -C /usr/share
ln -s /usr/share/shellspec/shellspec /usr/bin/shellspec
- name: Install dependencies
run: yum install -y ed
- name: Run shellspec
run: |
cd tests/shell
shellspec
shellspec-rocky8:
name: "RockyLinux 8"
runs-on: ubuntu-latest
container:
image: rockylinux/rockylinux:8
steps:
- uses: actions/checkout@v2
- name: Install shellspec
run: |
curl -fsSL https://github.com/shellspec/shellspec/releases/download/${{ env.SHELLSPEC_VERSION }}/shellspec-dist.tar.gz | tar -xz -C /usr/share
ln -s /usr/share/shellspec/shellspec /usr/bin/shellspec
- name: Install dependencies
run: dnf install -y ed bc diffutils
- name: Run shellspec
run: |
cd tests/shell
shellspec
1 change: 0 additions & 1 deletion conf/docker-aio/1prep.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
mkdir -p testdata/doc/sphinx-guides/source/_static/util/
cp ../solr/8.8.1/schema*.xml testdata/
cp ../solr/8.8.1/solrconfig.xml testdata/
cp ../solr/8.8.1/updateSchemaMDB.sh testdata/
cp ../jhove/jhove.conf testdata/
cp ../jhove/jhoveConfig.xsd testdata/
cd ../../
Expand Down
1 change: 0 additions & 1 deletion conf/docker-aio/c8.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ COPY dv/install/ /opt/dv/
COPY install.bash /opt/dv/
COPY entrypoint.bash /opt/dv/
COPY testdata /opt/dv/testdata
COPY testdata/updateSchemaMDB.sh /opt/dv/testdata/
COPY testscripts/* /opt/dv/testdata/
COPY setupIT.bash /opt/dv
WORKDIR /opt/dv
Expand Down
197 changes: 192 additions & 5 deletions conf/solr/8.8.1/schema.xml

Large diffs are not rendered by default.

161 changes: 0 additions & 161 deletions conf/solr/8.8.1/schema_dv_mdb_fields.xml

This file was deleted.

220 changes: 220 additions & 0 deletions conf/solr/8.8.1/update-fields.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
#!/usr/bin/env bash

set -euo pipefail

#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
# This script will
# 1. take a file (or read it from STDIN) with all <field> and <copyField> definitions
# 2. and replace the sections between the include guards with those in a given
# schema.xml file
# The script validates the presence, uniqueness and order of the include guards.
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####


### Variables
# Internal use only (fork to change)
VERSION="0.1"
INPUT=""
FIELDS=""
COPY_FIELDS=""
TRIGGER_CHAIN=0
ED_DELETE_FIELDS="'a+,'b-d"
ED_DELETE_COPYFIELDS="'a+,'b-d"

SOLR_SCHEMA_FIELD_BEGIN_MARK="SCHEMA-FIELDS::BEGIN"
SOLR_SCHEMA_FIELD_END_MARK="SCHEMA-FIELDS::END"
SOLR_SCHEMA_COPYFIELD_BEGIN_MARK="SCHEMA-COPY-FIELDS::BEGIN"
SOLR_SCHEMA_COPYFIELD_END_MARK="SCHEMA-COPY-FIELDS::END"
MARKS_ORDERED="${SOLR_SCHEMA_FIELD_BEGIN_MARK} ${SOLR_SCHEMA_FIELD_END_MARK} ${SOLR_SCHEMA_COPYFIELD_BEGIN_MARK} ${SOLR_SCHEMA_COPYFIELD_END_MARK}"

### Common functions
function error {
echo "ERROR:" "$@" >&2
exit 2
}

function exists {
type "$1" >/dev/null 2>&1 && return 0
( IFS=:; for p in $PATH; do [ -x "${p%/}/$1" ] && return 0; done; return 1 )
}

function usage {
cat << EOF
$(basename "$0") ${VERSION}
Usage: $(basename "$0") [-hp] [ schema file ] [ source file ]

-h Print usage (this text)
-p Chained printing: write all metadata schema related <field>
and <copyField> present in Solr XML to stdout

Provide target Solr Schema XML via argument or \$SCHEMA env var.

Provide source file via argument, \$SOURCE env var or piped input
(wget/curl, chained). Source file = "-" means read STDIN.
EOF
exit 0
}

### Options
while getopts ":hp" opt; do
case $opt in
h) usage ;;
p) TRIGGER_CHAIN=1 ;;
\?) echo "Invalid option: -$OPTARG" >&2; exit 1 ;;
:) echo "Option -$OPTARG requires an argument." >&2; exit 1 ;;
esac
done

# Check for recent Bash version
# shellcheck disable=SC2086
if [ ${BASH_VERSION%%.*} -lt 4 ]; then
error "Bash v4.x or later required"
fi

# Check for ed and bc being present
exists ed || error "Please ensure ed & bc are installed"
exists bc || error "Please ensure ed & bc are installed"

# remove all the parsed options
shift $((OPTIND-1))

# User overrideable locations
SCHEMA=$(readlink -f "${SCHEMA:-${1:-schema.xml}}")
SOURCE=${SOURCE:-${2:-"-"}}


### VERIFY SCHEMA FILE EXISTS AND CONTAINS INCLUDE GUARDS ###
# Check for schema file & writeable
if [ ! -w "${SCHEMA}" ]; then
error "Cannot find or write to a XML schema at ${SCHEMA}"
else
# Check schema file for include guards
CHECKS=$(
for MARK in ${MARKS_ORDERED}
do
grep -c "${MARK}" "${SCHEMA}" || error "Missing ${MARK} from ${SCHEMA}"
done
)

# Check guards are unique (count occurrences and sum calc via bc)
[ "$(echo -n "${CHECKS}" | tr '\n' '+' | sed -e 's#$#\n#' | bc)" -eq 4 ] || \
error "Some include guards are not unique in ${SCHEMA}"

# Check guards are in order (line number comparison via bc tricks)
CHECKS=$(
for MARK in ${MARKS_ORDERED}
do
grep -n "${MARK}" "${SCHEMA}" | cut -f 1 -d ":"
done
)
# Actual comparison of line numbers
[ "$(echo "${CHECKS}" | tr '\n' '<' | sed -e 's#<$#\n#' -e 's#\(<[0-9]\+\)<\([0-9]\+\)#\1 \&\& \2#' | bc)" -eq 1 ] || \
error "Include guards are not in correct order in ${SCHEMA}"

# Check guards are exclusively in their lines
# (no field or copyField on same line)
for MARK in ${MARKS_ORDERED}
do
grep "${MARK}" "${SCHEMA}" | grep -q -v -e '\(<field \|<copyField \)' \
|| error "Mark ${MARK} is not on an exclusive line"
done

# Check if there are no lines between the field marks (then skip delete in ed)
DISTANCE_FIELDS_MARKS=$( \
grep -n -e "\(${SOLR_SCHEMA_FIELD_BEGIN_MARK}\|${SOLR_SCHEMA_FIELD_END_MARK}\)" "${SCHEMA}" \
| cut -f 1 -d ":" | tr '\n' '<' | sed -e 's#<$#-1\n#' | bc
)
if [ "${DISTANCE_FIELDS_MARKS}" -eq 0 ]; then
ED_DELETE_FIELDS="#"
fi
# Check if there are no lines between the copyfield marks (then skip delete in ed)
DISTANCE_COPYFIELDS_MARKS=$( \
grep -n -e "\(${SOLR_SCHEMA_COPYFIELD_BEGIN_MARK}\|${SOLR_SCHEMA_COPYFIELD_END_MARK}\)" "${SCHEMA}" \
| cut -f 1 -d ":" | tr '\n' '<' | sed -e 's#<$#-1\n#' | bc
)
if [ "${DISTANCE_COPYFIELDS_MARKS}" -eq 0 ]; then
ED_DELETE_COPYFIELDS="#"
fi
#TODO
#-> IF NO ELEMENTS BETWEEN GUARDS, DO NOT DELETE TO AVOID ED ERRORS
fi


### READ DATA ###
# Switch to standard input if no file present or "-"
if [ -z "${SOURCE}" ] || [ "${SOURCE}" = "-" ]; then
# But ONLY if stdin for this script has not been attached to a terminal, but a pipe
if [ ! -t 0 ]; then
SOURCE="/dev/stdin"
else
error "No data - either provide source file or piped input"
fi
else
# Always make the path absolute
SOURCE=$(readlink -f "${SOURCE}")
# Check the given file for readability and non-zero length
if [ ! -r "${SOURCE}" ] || [ ! -s "${SOURCE}" ]; then
error "Cannot read from or empty file ${SOURCE}"
fi
fi
# Read relevant parts only, filter nonsense and avoid huge memory usage
INPUT=$(grep -e "<\(field\|copyField\) .*/>" "${SOURCE}" | sed -e 's#^\s\+##' -e 's#\s\+$##' || true)


### DATA HANDLING ###
# Split input into different types
if [ -z "${INPUT}" ]; then
error "No <field> or <copyField> in input"
else
# Check for <field> definitions (if nomatch, avoid failing pipe)
FIELDS=$(mktemp)
echo "${INPUT}" | grep -e "<field .*/>" | sed -e 's#^# #' > "${FIELDS}" || true
# If file actually contains output, write to schema
if [ -s "${FIELDS}" ]; then
# Use an ed script to replace all <field>
cat << EOF | ed -s -v "${SCHEMA}"
# Mark field begin as 'a'
/${SOLR_SCHEMA_FIELD_BEGIN_MARK}/ka
# Mark field end as 'b'
/${SOLR_SCHEMA_FIELD_END_MARK}/kb
# Delete all between lines a and b
${ED_DELETE_FIELDS}
# Read fields file and paste after line a
'ar ${FIELDS}
# Write fields to schema
w
q
EOF
fi
rm "${FIELDS}"

# Check for <copyField> definitions (if nomatch, avoid failing pipe)
COPY_FIELDS=$(mktemp)
echo "${INPUT}" | grep -e "<copyField .*/>" | sed -e 's#^# #' > "${COPY_FIELDS}" || true
# If file actually contains output, write to schema
if [ -s "${COPY_FIELDS}" ]; then
# Use an ed script to replace all <copyField>
cat << EOF | ed -s "${SCHEMA}"
# Mark copyField begin as 'a'
/${SOLR_SCHEMA_COPYFIELD_BEGIN_MARK}/ka
# Mark copyField end as 'b'
/${SOLR_SCHEMA_COPYFIELD_END_MARK}/kb
# Delete all between lines a and b
${ED_DELETE_COPYFIELDS}
# Read fields file and paste after line a
'ar ${COPY_FIELDS}
# Write copyFields to schema
w
q
EOF
fi
rm "${COPY_FIELDS}"
fi


### CHAINING OUTPUT
# Scripts following this one might want to use the field definitions now present
if [ "${TRIGGER_CHAIN}" -eq 1 ]; then
grep -A1000 "${SOLR_SCHEMA_FIELD_BEGIN_MARK}" "${SCHEMA}" | grep -B1000 "${SOLR_SCHEMA_FIELD_END_MARK}"
grep -A1000 "${SOLR_SCHEMA_COPYFIELD_BEGIN_MARK}" "${SCHEMA}" | grep -B1000 "${SOLR_SCHEMA_COPYFIELD_END_MARK}"
fi
85 changes: 0 additions & 85 deletions conf/solr/8.8.1/updateSchemaMDB.sh

This file was deleted.

Loading