-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quick fix for #7864 (Simple search of most metadata fields broken) #7865
Quick fix for #7864 (Simple search of most metadata fields broken) #7865
Conversation
How will this affect installations that have customized the metadata blocks if the copyField tags are no longer updated based on the metadatablock? |
The problem and fix are independent of whether your are manually updating the schema*.xml files or using the updateSchemaMDB.sh script. In either case, having the entries in the separate schema_dv_mdb_copies.xml files won't work and the entries need to be copied into the schema.xml file itself. If you have custom metadata blocks, using the script and then adding an extra step to copy the results from the schema_dv_mdb_copies.xml into schema.xml would work. Editing the script to make it write directly into the schema.xml file would be another option to fix this, as would moving to using the solr api to let dataverse add the fields and copyField requests directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested to confirm the bug (assuming it's valid, it's a bad one, thanks for finding it!) nor played around with this quick fix but I think before we send this to QA, we need to add some documentation.
I'm aware of two places where updateSchemaMDB.sh is used:
- The last step after enabling a new custom metadata block: https://guides.dataverse.org/en/5.4.1/admin/metadatacustomization.html#updating-the-solr-schema
- In a scenario where Solr has been upgraded and the installation uses custom metadata blocks: https://github.com/IQSS/dataverse/blob/v5.4/doc/release-notes/5.4-release-notes.md#additional-release-steps
These are both still valid scenarios so I think we should explain (probably in the "Updating the Solr Schema" section of the "Metadata Customization" page) the extra step that is necessary, which is what you wrote in a comment:
"If you have custom metadata blocks, using the script and then adding an extra step to copy the results from the schema_dv_mdb_copies.xml into schema.xml would work."
That way, we have something to link to in either of the scenarios mentioned above.
And yes, longer term let's fix up updateSchemaMDB.sh do this extra step or switch to using the Solr API (#5989).
As a side note, a test for this bug would be nice ("is copyField working?") but we can wait for a future pull request.
We are relying on the updateSchemaMDB.sh script. We are currently in a phase where users are still discussing about the metadata schema and we do updates to the citation block on a regular base and sometimes even reset our pilot installation database and index from scratch. We are also running dataverse and solr in docker containers. The way it is currently setup, the schema files are stored in the container itself and any changes are lost when the container restarts. I therefore added a script that performs the metadata schema update at startup to get it synchronized with the current state of the metadata blocks. When a metadata block is updated, I just restart the solr container(s) and we're done. I also had the intention to use the script at least during installation of the production repository. BTW, the updateSchemaMDB.sh script as it is now, does not work - at least for us. I have already fixed it locally and was about to create an issue and PR, but I'm holding off for now. With this change in place, it needs to incorporate the changes into the schema.xml and I'd like to tackle that first. Edit: Issue #7871 created. |
@Kris-LIBIS - do you have time to fix the script for this? All that should be needed is to change creating the separate schema_dv_mdb_copies.xml file with scanning through schema.xml and replacing all of the lines between the |
@qqmyers - yes, that is what I was planning to do. But give me another week or so. I'm in the middle of upgrading our installation to 5.4.1 and the Solr upgrade is hurting me. I have to sort that out first. |
@Kris-LIBIS thanks for being willing to work on this! For now I co-assigned you to this pull request but you are also very welcome to make a fresh one or whatever works. I think review is done for now so I'm going to move it to the "community dev" column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good now that a release note has been added. Thanks @qqmyers
I'm going to go ahead and move this to QA as-is. It's important to get the word out.
@Kris-LIBIS we are still very much interested in your fixed to the script but please make a fresh pull request when you are ready.
@pdurbin I have a fix for the script running locally for testing. But I was wondering why don't we do the same for the field entries? We could as well add them directly to the schema.xml too. |
@Kris-LIBIS that's great news! Stopping now might be good because @poikilotherm and @pkiraly are having a little Solr-related hackathon on Friday. I'm sure you'd be very welcome to join! |
@Kris-LIBIS when I met with @poikilotherm and @pkiraly on Friday we didn't actually talk about the script much. The focus was on #5989 (Add new Solr fields via an API call). I think it would be best to go ahead and create a pull request for what you have working so we can take a look. Thanks! |
As of IQSS#7865 (merged DV 5.6), the Solr Schema XML file did not include <copyField>s for metadata schemas from an external file anymore. Instead they had been included. This broke the updateSchemaMDB.sh script, which is IQSS#7903 This commit will remove the other include construction for <field> definitions (plus remove the file) and include the content directly in the schema. It also introduced some docs about the whereabouts of these fields plus (most importantly) INCLUDE GUARDS. These are going to be used for new scripting of half-automatic index management. IQSS#7903
What this PR does / why we need it: Restores the ability to search across all metadata block fields in the simple search box
Which issue(s) this PR closes:
Closes #7864
Special notes for your reviewer:
Suggestions on how to test this: Add a value in a field such as OriginOfSources (see issue and the example there), typing your text in the search box should return no result. Using the PR, and restarting solr and reindexing if using an existing machine, the same search should find the dataset.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: