-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7903 solr config hack #8130
7903 solr config hack #8130
Conversation
As of IQSS#7865 (merged DV 5.6), the Solr Schema XML file did not include <copyField>s for metadata schemas from an external file anymore. Instead they had been included. This broke the updateSchemaMDB.sh script, which is IQSS#7903 This commit will remove the other include construction for <field> definitions (plus remove the file) and include the content directly in the schema. It also introduced some docs about the whereabouts of these fields plus (most importantly) INCLUDE GUARDS. These are going to be used for new scripting of half-automatic index management. IQSS#7903
Removing the old updateSchemaMDB.sh script, as it's broken. It's also of limited use for the container use case it had been invented for. Replacing with a script capable of taking the solr schema bits and stuffing it into a prepared schema.xml file. IQSS#7903
…DB.sh in installer IQSS#7903
I started adding |
638f14e
to
37ea000
Compare
651b48c
to
d742cc1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Sorry for the delay. For various reasons we could not cram this into 5.7. At this point the PR is waiting for another person to review it. But I just want to let you know that I played with it earlier and I like it. Specifically the fact that it finally solves the issue of the schema config left in a half-way state; with half of it still in an external file, with the other half moved back into the main schema.xml, defying the purpose of keeping external config files...
But I do have a serious issue with the lack of support for MacOS. With so many developers using MacOS, it is a must for us.
The part that you mentioned, about the bash version. Is this even really-really necessary:
if [ ${BASH_VERSION%%.*} -lt 4 ]; then
? MacOS 11 ships with 3.3. It feels wrong to require people to upgrade it, just to be able to index extra metadata.
And there seem to be other things that are failing. sed may be ok, but readlink seem to have incompatible command line args?
If you cannot make it work under MacOS, somebody here will have to do that part. It's not by any means rocket science. But, just want to mention that it would be an "expect further delays" kind of a deal.
But, to finish on a positive note, I am excited to see this being addressed finally.
- Replaced many GNU sed function with escaped chars in replacement with workaround for appending newline via echo - Replace one sed to take guards line numbers apart with awk because not compatible with BSD sed - Remove unnecessary restriction for bash 4 - Add check for presence of awk and sed - Remove unknown -v flag for BSD ed - Add grep filter for ed script comments, as comment commands not supported by BSD ed - Removed readlink -f for the sake of compatibility with MacOS. So be it. Bye bye nice error messages
736e70b
to
c5b09ff
Compare
Here we go party people... MacOS support for update-fields.sh has landed and shellspec does a wonderful job on CI. This was fun! Shellspec really helped during debugging and testing and I can be sure it works on all platforms in CI 😄 |
Thank you so much! Looking good so far...
|
The integration tests that have just failed - that's almost certainly something on the Jenkins side, and NOT in the PR... I'm trying to figure that out now, and confirm if we should wait for it to be resolved, or just move the PR into QA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said earlier, MacOS X was my only concern. It's working now, and I haven't heard any other objections, so I'm moving it along.
Thank you @poikilotherm !
(I know the PR isn't touching any code, so it's not that important - but, if you get a chance, please sync it up with develop... just in case?)
(It looks like the integration tests failed because the branch still has 5.6 in the pom.xml...) |
As requested, I merged latest EDIT: looks like now it's all green 😄 |
@poikilotherm Thanks for the contribution -a nice improvement. One quick question: how would you recommend a multi host installation, where solr and payara are on different hosts, run your script to update schema.xml? |
Glad you liked it @kcondon - thx for merging! 😄 (Glad this made it into 5.8 🥳 )
This depends on wheter you need to do this because of a metadata block change tied to a database migration or just for a small change in a metadata block. (And of course if you deployed in a way to use A/B testing/rollouts.) Lets say this is a multi-host installation without A/B. Then you will need to update any Dataverse installation / load metadata blocks / do database migrations first, then update the schema. As Solr is not required to deploy the Dataverse application (at least not as of writing this), you can use the API endpoint to retrieve the new blocks via EDIT: in case this question was about how to do this with multiple hosts because of Admin API = localhost only... |
@poikilotherm Thanks, that's helpful. The fact the script doesn't affect Solr means it could be run local to the web server, then the schema.xml moved to the Solr box, since the admin api is blocked by default. One could alternatively change the api policy temporarily to use unblock-key and run the curl command from the Solr box. Good point about A/B set up, I've used that approach elsewhere. |
What this PR does / why we need it:
updateSchemaMDB.sh
update-fields.sh
(single purpose tool, no side tasks like reload)Which issue(s) this PR closes:
Closes #7871
Closes #7903
Closes #7904
Closes #8025
Special notes for your reviewer:
Looks like a huge beast to review... But most of it is test data. Have no fear! 😄
Suggestions on how to test this:
Try to update the Solr schema.xml with the script. e. g. by following the docs advise.
curl "https://demo.dataverse.org/api/admin/index/solr/schema" | conf/solr/8.8.1/update-fields.sh conf/solr/8.8.1/schema.xml
Easier and quicker: take a look at running the Shellspec tests on your machine!
(You can easily install shellspec via homebrew on Mac, too...)
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Nope.
Is there a release notes update needed for this change?:
🔋 included
Additional documentation:
This is one half of a solution to the chicken-and-egg problem described in gdcc/dataverse-kubernetes#188. (The other half will be a new subcommand in gdcc/dvcli to convert TSV files to Solr schema without a running Dataverse instance.)
Pinging @Kris-LIBIS here as he opened #7904 which was never finished. He said they're relying on scripting in their installation, so this new script might be helpful. (We can discuss on Slack/Matrix, if you want! 😄 )