-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for #7903 update schema.xml in place #7904
Conversation
I didn't run this but the changes look straight forward and I think with a Q/A check that it does what's intended can quickly verify that. The one thing I'd suggest checking for - if you don't specify a target on the command line, the script defaults to the /tmp dir and worked (creating the two new files there) - does the updated script handle that case some way (i.e. if schema.xml isn't in the target dir). Just warning that schema.xml doesn't exist in that dir and quitting (or similar is fine) but it would be good to make sure the user gets a useful message. |
@qqmyers That is a very good point you made. I did not realize this before, but the meaning of the TARGET directory modified ever so slightly from 'the location to write the included XML files' to 'the location of the schema.xml file'. In most installations this will not make a difference, but that may not be the case for everyone. Notably in @poikilotherm 's kubernetes Dockerfile for solr the included XML files will be generated in a separate directory and are included in the schema.xml via symlinks. Maybe we should leave the original script alone and create a new script instead? updateSchemaInline.sh for instance? On the other hand, with the changes in #7864, the old script no longer does what it was originally expected to do anyway. The changes in this PR would make the script work again as expected, if some changes are made to the Dockerfile. What would do you suggest? |
Since the original script won't work for Docker either, I'd suggest just having one script and getting the Dockerfile updated. That said, I haven't worked with the Docker version, and others who do may have a different/better idea. In any case, we should avoid delaying getting this out - it will simplify setup for those who haven't upgraded yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some minor tweaks.Thank you for working on and fixing this @Kris-LIBIS!
About Docker/K8s: yes, these community projects follow this repos lead, so they will need to adapt.
Will become part of #7245
@@ -228,7 +228,166 @@ | |||
<field name="filePersistentId" type="text_en" multiValued="false" stored="true" indexed="true"/> | |||
|
|||
<!-- Dynamic Dataverse fields from http://localhost:8080/api/admin/index/solr/schema --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please create more visible Search & Replace Guards here? Someone might delete that by accident, then things break.
Same for the <copyField>
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me. I was using whatever already was there to make sure the script works with schema.xml files that are already out there. Since that that is no concern, I'll add better markers in the schema.xml.
### Retrieval | ||
echo "Retrieve schema data from ${DATAVERSE_URL}/api/admin/index/solr/schema" | ||
TMPFILE=`mktemp` | ||
curl -f -sS "${DATAVERSE_URL}/api/admin/index/solr/schema${UNBLOCK_KEY}" > $TMPFILE | ||
|
||
### Fail gracefull if Dataverse is not ready yet. | ||
if [[ "`wc -l ${TMPFILE}`" < "3" ]]; then | ||
if [[ `wc -l < ${TMPFILE}` -lt 3 ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should start using $()
here and let shellcheck
bark at other things it finds.
### -- create temp file with new content | ||
TMPF=`mktemp` | ||
echo "$START" > ${TMPF} | ||
cat ${TMPFILE} | grep ".*<field" >> ${TMPF} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary anymore. Let's just save the filtered multiline string in a var to be used in the replacement text of sed
sed -e 's@'"$INCL"'@'"$END"'@' -i ${TARGET}/schema.xml | ||
|
||
### -- replace START->END with content of temp file | ||
sed -e '\@'"$START"'@,\@'"$END"'@!b' -e '\@'"$END"'@!d;r '"$TMPF" -e 'd' -i ${TARGET}/schema.xml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might become easier if just sending the variable into the command, right?
|
||
### Processing copyField entries | ||
echo "Replacing copyField lines" | ||
START='<!-- Dataverse copyField from http://localhost:8080/api/admin/index/solr/schema -->' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as outlined above: we should use guards around all of this that are self-explaining and less prone to be deleted by accident.
### -- create temp file with new content | ||
TMPCF=`mktemp` | ||
echo "$START" > ${TMPCF} | ||
cat ${TMPFILE} | grep ".*<copyField" >> ${TMPCF} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above: maybe just send to a var instead of tmpfile.
Sorry for the inactivity. I had issues getting some of the requested changes to work, so I have left it as-is in our own installation. I haven't had much time to work on this and it seems that an alternative solution is on its way. I would suggest to close this PR and wait for #8130 instead. |
What this PR does / why we need it:
With PR #7865 the updateSchemaMDB.sh no longer works. This PR restores that functionality without breaking #7864 that the PR fixed. Some institutions may rely on the script to automate their metadata customization.
Which issue(s) this PR closes:
Closes #7903
Special notes for your reviewer:
This PR also removed the schema_dv_mdb_fields.xml file and included its contents directly in the schema.xml. The updated script will however still work with older schema.xml files that still have the include lines in them.
Suggestions on how to test this:
Perform some customization on the metadata blocks - removing or adding metadata fields and/or blocks - and run the script. The schema.xml should contain an updated list of and entries.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No UI changes.
Is there a release notes update needed for this change?:
Probably. Some of the changes for PR #7865 in release 5.5 release notes and documentation will need to be updated.
Additional documentation:
I have not yet looked at the documentation changes required, but overall it should be simplified.