Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick fix for #7864 (Simple search of most metadata fields broken) #7865

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented May 10, 2021

What this PR does / why we need it: Restores the ability to search across all metadata block fields in the simple search box

Which issue(s) this PR closes:

Closes #7864

Special notes for your reviewer:

Suggestions on how to test this: Add a value in a field such as OriginOfSources (see issue and the example there), typing your text in the search box should return no result. Using the PR, and restarting solr and reindexing if using an existing machine, the same search should find the dataset.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@pdurbin pdurbin requested a review from poikilotherm May 11, 2021 17:44
@pdurbin pdurbin self-assigned this May 11, 2021
@Kris-LIBIS
Copy link
Contributor

How will this affect installations that have customized the metadata blocks if the copyField tags are no longer updated based on the metadatablock?

@qqmyers
Copy link
Member Author

qqmyers commented May 12, 2021

The problem and fix are independent of whether your are manually updating the schema*.xml files or using the updateSchemaMDB.sh script. In either case, having the entries in the separate schema_dv_mdb_copies.xml files won't work and the entries need to be copied into the schema.xml file itself.

If you have custom metadata blocks, using the script and then adding an extra step to copy the results from the schema_dv_mdb_copies.xml into schema.xml would work.

Editing the script to make it write directly into the schema.xml file would be another option to fix this, as would moving to using the solr api to let dataverse add the fields and copyField requests directly.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested to confirm the bug (assuming it's valid, it's a bad one, thanks for finding it!) nor played around with this quick fix but I think before we send this to QA, we need to add some documentation.

I'm aware of two places where updateSchemaMDB.sh is used:

These are both still valid scenarios so I think we should explain (probably in the "Updating the Solr Schema" section of the "Metadata Customization" page) the extra step that is necessary, which is what you wrote in a comment:

"If you have custom metadata blocks, using the script and then adding an extra step to copy the results from the schema_dv_mdb_copies.xml into schema.xml would work."

That way, we have something to link to in either of the scenarios mentioned above.

And yes, longer term let's fix up updateSchemaMDB.sh do this extra step or switch to using the Solr API (#5989).

As a side note, a test for this bug would be nice ("is copyField working?") but we can wait for a future pull request.

@pdurbin pdurbin assigned qqmyers and unassigned pdurbin May 12, 2021
@pdurbin pdurbin changed the title Quick fix for #7864 Quick fix for #7864 (Simple search of most metadata fields broken) May 12, 2021
@Kris-LIBIS
Copy link
Contributor

Kris-LIBIS commented May 13, 2021

We are relying on the updateSchemaMDB.sh script. We are currently in a phase where users are still discussing about the metadata schema and we do updates to the citation block on a regular base and sometimes even reset our pilot installation database and index from scratch. We are also running dataverse and solr in docker containers. The way it is currently setup, the schema files are stored in the container itself and any changes are lost when the container restarts. I therefore added a script that performs the metadata schema update at startup to get it synchronized with the current state of the metadata blocks. When a metadata block is updated, I just restart the solr container(s) and we're done.

I also had the intention to use the script at least during installation of the production repository.

BTW, the updateSchemaMDB.sh script as it is now, does not work - at least for us. I have already fixed it locally and was about to create an issue and PR, but I'm holding off for now. With this change in place, it needs to incorporate the changes into the schema.xml and I'd like to tackle that first.

Edit: Issue #7871 created.

@qqmyers
Copy link
Member Author

qqmyers commented May 13, 2021

@Kris-LIBIS - do you have time to fix the script for this? All that should be needed is to change creating the separate schema_dv_mdb_copies.xml file with scanning through schema.xml and replacing all of the lines between the
<!-- Dataverse copyField from http://localhost:8080/api/admin/index/solr/schema -->
and
<!-- End: Dataverse-specific -->
with the same set of elements. (No wrapping element as in the copies files though.)

@Kris-LIBIS
Copy link
Contributor

@qqmyers - yes, that is what I was planning to do. But give me another week or so. I'm in the middle of upgrading our installation to 5.4.1 and the Solr upgrade is hurting me. I have to sort that out first.

@pdurbin
Copy link
Member

pdurbin commented May 13, 2021

@Kris-LIBIS thanks for being willing to work on this! For now I co-assigned you to this pull request but you are also very welcome to make a fresh one or whatever works. I think review is done for now so I'm going to move it to the "community dev" column.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now that a release note has been added. Thanks @qqmyers

I'm going to go ahead and move this to QA as-is. It's important to get the word out.

@Kris-LIBIS we are still very much interested in your fixed to the script but please make a fresh pull request when you are ready.

@kcondon kcondon self-assigned this May 14, 2021
@kcondon kcondon merged commit 2dbf9b7 into IQSS:develop May 14, 2021
@djbrooke djbrooke added this to the 5.5 milestone May 17, 2021
@Kris-LIBIS
Copy link
Contributor

Kris-LIBIS commented May 18, 2021

@pdurbin I have a fix for the script running locally for testing. But I was wondering why don't we do the same for the field entries? We could as well add them directly to the schema.xml too.

@pdurbin
Copy link
Member

pdurbin commented May 18, 2021

@Kris-LIBIS that's great news! Stopping now might be good because @poikilotherm and @pkiraly are having a little Solr-related hackathon on Friday. I'm sure you'd be very welcome to join!

@pdurbin
Copy link
Member

pdurbin commented May 24, 2021

@Kris-LIBIS when I met with @poikilotherm and @pkiraly on Friday we didn't actually talk about the script much. The focus was on #5989 (Add new Solr fields via an API call).

I think it would be best to go ahead and create a pull request for what you have working so we can take a look. Thanks!

@Kris-LIBIS
Copy link
Contributor

Issue #7903 and PR #7904 created.

@pdurbin pdurbin mentioned this pull request Jul 1, 2021
poikilotherm added a commit to poikilotherm/dataverse that referenced this pull request Oct 4, 2021
As of IQSS#7865 (merged DV 5.6), the Solr Schema XML file
did not include <copyField>s for metadata schemas from
an external file anymore. Instead they had been included.

This broke the updateSchemaMDB.sh script, which is IQSS#7903

This commit will remove the other include construction for
<field> definitions (plus remove the file) and include the
content directly in the schema.

It also introduced some docs about the whereabouts of these
fields plus (most importantly) INCLUDE GUARDS. These are
going to be used for new scripting of half-automatic index
management.

IQSS#7903
@qqmyers qqmyers deleted the IQSS/7864_simple_search_misses_fields branch May 17, 2024 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simple search of most metadata fields broken
5 participants