Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #7903 update schema.xml in place #7904

Closed
wants to merge 6 commits into from
Closed

Fix for #7903 update schema.xml in place #7904

wants to merge 6 commits into from

Conversation

Kris-LIBIS
Copy link
Contributor

What this PR does / why we need it:

With PR #7865 the updateSchemaMDB.sh no longer works. This PR restores that functionality without breaking #7864 that the PR fixed. Some institutions may rely on the script to automate their metadata customization.

Which issue(s) this PR closes:

Closes #7903

Special notes for your reviewer:

This PR also removed the schema_dv_mdb_fields.xml file and included its contents directly in the schema.xml. The updated script will however still work with older schema.xml files that still have the include lines in them.

Suggestions on how to test this:

Perform some customization on the metadata blocks - removing or adding metadata fields and/or blocks - and run the script. The schema.xml should contain an updated list of and entries.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No UI changes.

Is there a release notes update needed for this change?:

Probably. Some of the changes for PR #7865 in release 5.5 release notes and documentation will need to be updated.

Additional documentation:

I have not yet looked at the documentation changes required, but overall it should be simplified.

@qqmyers
Copy link
Member

qqmyers commented May 26, 2021

I didn't run this but the changes look straight forward and I think with a Q/A check that it does what's intended can quickly verify that. The one thing I'd suggest checking for - if you don't specify a target on the command line, the script defaults to the /tmp dir and worked (creating the two new files there) - does the updated script handle that case some way (i.e. if schema.xml isn't in the target dir). Just warning that schema.xml doesn't exist in that dir and quitting (or similar is fine) but it would be good to make sure the user gets a useful message.

@Kris-LIBIS
Copy link
Contributor Author

@qqmyers That is a very good point you made. I did not realize this before, but the meaning of the TARGET directory modified ever so slightly from 'the location to write the included XML files' to 'the location of the schema.xml file'.

In most installations this will not make a difference, but that may not be the case for everyone. Notably in @poikilotherm 's kubernetes Dockerfile for solr the included XML files will be generated in a separate directory and are included in the schema.xml via symlinks.

Maybe we should leave the original script alone and create a new script instead? updateSchemaInline.sh for instance?

On the other hand, with the changes in #7864, the old script no longer does what it was originally expected to do anyway. The changes in this PR would make the script work again as expected, if some changes are made to the Dockerfile.

What would do you suggest?

@coveralls
Copy link

Coverage Status

Coverage remained the same at 19.324% when pulling 4681de3 on libis:develop into 52fee75 on IQSS:develop.

@qqmyers
Copy link
Member

qqmyers commented May 27, 2021

Since the original script won't work for Docker either, I'd suggest just having one script and getting the Dockerfile updated. That said, I haven't worked with the Docker version, and others who do may have a different/better idea. In any case, we should avoid delaying getting this out - it will simplify setup for those who haven't upgraded yet.

Copy link
Contributor

@poikilotherm poikilotherm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor tweaks.Thank you for working on and fixing this @Kris-LIBIS!

About Docker/K8s: yes, these community projects follow this repos lead, so they will need to adapt.

Will become part of #7245

@@ -228,7 +228,166 @@
<field name="filePersistentId" type="text_en" multiValued="false" stored="true" indexed="true"/>

<!-- Dynamic Dataverse fields from http://localhost:8080/api/admin/index/solr/schema -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please create more visible Search & Replace Guards here? Someone might delete that by accident, then things break.

Same for the <copyField> part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. I was using whatever already was there to make sure the script works with schema.xml files that are already out there. Since that that is no concern, I'll add better markers in the schema.xml.

### Retrieval
echo "Retrieve schema data from ${DATAVERSE_URL}/api/admin/index/solr/schema"
TMPFILE=`mktemp`
curl -f -sS "${DATAVERSE_URL}/api/admin/index/solr/schema${UNBLOCK_KEY}" > $TMPFILE

### Fail gracefull if Dataverse is not ready yet.
if [[ "`wc -l ${TMPFILE}`" < "3" ]]; then
if [[ `wc -l < ${TMPFILE}` -lt 3 ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should start using $() here and let shellcheck bark at other things it finds.

### -- create temp file with new content
TMPF=`mktemp`
echo "$START" > ${TMPF}
cat ${TMPFILE} | grep ".*<field" >> ${TMPF}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary anymore. Let's just save the filtered multiline string in a var to be used in the replacement text of sed

sed -e 's@'"$INCL"'@'"$END"'@' -i ${TARGET}/schema.xml

### -- replace START->END with content of temp file
sed -e '\@'"$START"'@,\@'"$END"'@!b' -e '\@'"$END"'@!d;r '"$TMPF" -e 'd' -i ${TARGET}/schema.xml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might become easier if just sending the variable into the command, right?


### Processing copyField entries
echo "Replacing copyField lines"
START='<!-- Dataverse copyField from http://localhost:8080/api/admin/index/solr/schema -->'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as outlined above: we should use guards around all of this that are self-explaining and less prone to be deleted by accident.

### -- create temp file with new content
TMPCF=`mktemp`
echo "$START" > ${TMPCF}
cat ${TMPFILE} | grep ".*<copyField" >> ${TMPCF}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: maybe just send to a var instead of tmpfile.

@Kris-LIBIS
Copy link
Contributor Author

Sorry for the inactivity. I had issues getting some of the requested changes to work, so I have left it as-is in our own installation. I haven't had much time to work on this and it seems that an alternative solution is on its way.

I would suggest to close this PR and wait for #8130 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update schema.xml in-place
4 participants