Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7903 solr config hack #8130

Merged
merged 20 commits into from
Nov 3, 2021
Merged

Conversation

poikilotherm
Copy link
Contributor

@poikilotherm poikilotherm commented Oct 4, 2021

What this PR does / why we need it:

  • Removes broken updateSchemaMDB.sh
  • Introduces new update-fields.sh (single purpose tool, no side tasks like reload)
  • Adds docs etc

Which issue(s) this PR closes:

Closes #7871
Closes #7903
Closes #7904
Closes #8025

Special notes for your reviewer:
Looks like a huge beast to review... But most of it is test data. Have no fear! 😄

Suggestions on how to test this:
Try to update the Solr schema.xml with the script. e. g. by following the docs advise.
curl "https://demo.dataverse.org/api/admin/index/solr/schema" | conf/solr/8.8.1/update-fields.sh conf/solr/8.8.1/schema.xml

Easier and quicker: take a look at running the Shellspec tests on your machine!

curl -fsSL https://git.io/shellspec | sh -s 0.28.1
cd tests/shell
shellspec

(You can easily install shellspec via homebrew on Mac, too...)

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Nope.

Is there a release notes update needed for this change?:
🔋 included

Additional documentation:
This is one half of a solution to the chicken-and-egg problem described in gdcc/dataverse-kubernetes#188. (The other half will be a new subcommand in gdcc/dvcli to convert TSV files to Solr schema without a running Dataverse instance.)

Pinging @Kris-LIBIS here as he opened #7904 which was never finished. He said they're relying on scripting in their installation, so this new script might be helpful. (We can discuss on Slack/Matrix, if you want! 😄 )

As of IQSS#7865 (merged DV 5.6), the Solr Schema XML file
did not include <copyField>s for metadata schemas from
an external file anymore. Instead they had been included.

This broke the updateSchemaMDB.sh script, which is IQSS#7903

This commit will remove the other include construction for
<field> definitions (plus remove the file) and include the
content directly in the schema.

It also introduced some docs about the whereabouts of these
fields plus (most importantly) INCLUDE GUARDS. These are
going to be used for new scripting of half-automatic index
management.

IQSS#7903
Removing the old updateSchemaMDB.sh script, as it's broken.
It's also of limited use for the container use case it had been
invented for.

Replacing with a script capable of taking the solr schema bits
and stuffing it into a prepared schema.xml file.

IQSS#7903
@poikilotherm poikilotherm marked this pull request as draft October 4, 2021 12:47
@poikilotherm poikilotherm marked this pull request as ready for review October 4, 2021 13:09
@poikilotherm poikilotherm added Feature: Admin Guide Feature: Metadata Small User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh labels Oct 4, 2021
@poikilotherm
Copy link
Contributor Author

I started adding shellspec tests and will create a Github workflow, too. That way we can test the script on Ubuntu and MacOS and make sure things break when someone accidentally deletes sth. like the include guards.

@poikilotherm poikilotherm force-pushed the 7903-solr-config-hack branch 2 times, most recently from 638f14e to 37ea000 Compare October 5, 2021 07:45
@poikilotherm poikilotherm force-pushed the 7903-solr-config-hack branch 5 times, most recently from 651b48c to d742cc1 Compare October 5, 2021 08:40
Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Sorry for the delay. For various reasons we could not cram this into 5.7. At this point the PR is waiting for another person to review it. But I just want to let you know that I played with it earlier and I like it. Specifically the fact that it finally solves the issue of the schema config left in a half-way state; with half of it still in an external file, with the other half moved back into the main schema.xml, defying the purpose of keeping external config files...
But I do have a serious issue with the lack of support for MacOS. With so many developers using MacOS, it is a must for us.
The part that you mentioned, about the bash version. Is this even really-really necessary:

if [ ${BASH_VERSION%%.*} -lt 4 ]; then

? MacOS 11 ships with 3.3. It feels wrong to require people to upgrade it, just to be able to index extra metadata.
And there seem to be other things that are failing. sed may be ok, but readlink seem to have incompatible command line args?
If you cannot make it work under MacOS, somebody here will have to do that part. It's not by any means rocket science. But, just want to mention that it would be an "expect further delays" kind of a deal.

But, to finish on a positive note, I am excited to see this being addressed finally.

- Replaced many GNU sed function with escaped chars in replacement
  with workaround for appending newline via echo
- Replace one sed to take guards line numbers apart with awk
  because not compatible with BSD sed
- Remove unnecessary restriction for bash 4
- Add check for presence of awk and sed
- Remove unknown -v flag for BSD ed
- Add grep filter for ed script comments, as comment commands
  not supported by BSD ed
- Removed readlink -f for the sake of compatibility with MacOS.
  So be it. Bye bye nice error messages
@poikilotherm poikilotherm force-pushed the 7903-solr-config-hack branch from 736e70b to c5b09ff Compare October 26, 2021 21:23
@poikilotherm
Copy link
Contributor Author

Here we go party people... MacOS support for update-fields.sh has landed and shellspec does a wonderful job on CI.

This was fun! Shellspec really helped during debugging and testing and I can be sure it works on all platforms in CI 😄

@landreev
Copy link
Contributor

Thank you so much! Looking good so far...

Here we go party people... MacOS support for update-fields.sh has landed and shellspec does a wonderful job on CI.

This was fun! Shellspec really helped during debugging and testing and I can be sure it works on all platforms in CI 😄

@landreev
Copy link
Contributor

The integration tests that have just failed - that's almost certainly something on the Jenkins side, and NOT in the PR... I'm trying to figure that out now, and confirm if we should wait for it to be resolved, or just move the PR into QA.

Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said earlier, MacOS X was my only concern. It's working now, and I haven't heard any other objections, so I'm moving it along.
Thank you @poikilotherm !
(I know the PR isn't touching any code, so it's not that important - but, if you get a chance, please sync it up with develop... just in case?)

@landreev
Copy link
Contributor

(It looks like the integration tests failed because the branch still has 5.6 in the pom.xml...)

@poikilotherm
Copy link
Contributor Author

poikilotherm commented Oct 27, 2021

As requested, I merged latest develop and pushed again.

EDIT: looks like now it's all green 😄

@djbrooke djbrooke added this to the 5.8 milestone Nov 1, 2021
@kcondon kcondon self-assigned this Nov 1, 2021
@kcondon
Copy link
Contributor

kcondon commented Nov 2, 2021

@poikilotherm Thanks for the contribution -a nice improvement. One quick question: how would you recommend a multi host installation, where solr and payara are on different hosts, run your script to update schema.xml?

@kcondon kcondon merged commit 538f0b4 into IQSS:develop Nov 3, 2021
@poikilotherm poikilotherm deleted the 7903-solr-config-hack branch November 3, 2021 15:07
@poikilotherm
Copy link
Contributor Author

poikilotherm commented Nov 3, 2021

Glad you liked it @kcondon - thx for merging! 😄 (Glad this made it into 5.8 🥳 )

How would you recommend a multi host installation, where solr and payara are on different hosts, run your script to update schema.xml?

This depends on wheter you need to do this because of a metadata block change tied to a database migration or just for a small change in a metadata block. (And of course if you deployed in a way to use A/B testing/rollouts.)

Lets say this is a multi-host installation without A/B. Then you will need to update any Dataverse installation / load metadata blocks / do database migrations first, then update the schema.

As Solr is not required to deploy the Dataverse application (at least not as of writing this), you can use the API endpoint to retrieve the new blocks via curl or wget and pipe the output into the script: curl "https://demo.dataverse.org/api/admin/index/solr/schema" | update-fields.sh schema.xml. This does not affect a running Solr. If you had Solr running, make it reload via API call (curl http://solrhost:8983/solr/admin/cores?action=RELOAD&core=collection1) or restart. Depending on your changes to the schema you might need to reindex - this hasn't changed from before.

EDIT: in case this question was about how to do this with multiple hosts because of Admin API = localhost only...
Either use unblock-key or copy your current schema.xml to a DV host and do the changes, then replace on Solr machines. You might also try your luck with a pipe to an ssh'ed remote script call...

@kcondon
Copy link
Contributor

kcondon commented Nov 3, 2021

@poikilotherm Thanks, that's helpful. The fact the script doesn't affect Solr means it could be run local to the web server, then the schema.xml moved to the Solr box, since the admin api is blocked by default. One could alternatively change the api policy temporarily to use unblock-key and run the curl command from the Solr box. Good point about A/B set up, I've used that approach elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Admin Guide Feature: Metadata User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Projects
None yet
5 participants