-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full re-index of solr data on prod #1067
Comments
These are the other
<doc>
<str name="key">/subjects/org:conseil_national_économique_(france)</str>
<str name="name">Conseil national économique (France)</str>
<str name="subject_type">org</str>
<arr name="text">
<str>Conseil national économique (France)</str>
<str>/subjects/org:conseil_national_économique_(france)</str>
</arr>
<str name="type">subject</str>
<int name="work_count">1</int>
</doc>
<doc>
<arr name="author_key">
<str>OL6941607A</str>
</arr>
<arr name="author_name">
<str>Carlos Arturo Jiménez</str>
</arr>
<bool name="has_fulltext">false</bool>
<str name="key">/books/OL25648663M</str>
<int name="last_modified_i">1419832732</int>
<arr name="seed">
<str>/books/OL25648663M</str>
<str>/works/OL15935579W</str>
<str>/subjects/politics_and_government</str>
<str>/subjects/presidents</str>
<str>/subjects/frente_sandinista_de_liberación_nacional</str>
<str>/subjects/assassination_attempts</str>
<str>/subjects/person:daniel_ortega</str>
<str>/subjects/person:carlos_arturo_jiménez</str>
<str>/subjects/place:nicaragua</str>
<str>/subjects/time:1979-1990</str>
<str>/authors/OL6941607A</str>
</arr>
<arr name="text">
<str>Nosotros no le decíamos presidente</str>
<str>Carlos Arturo Jiménez</str>
<str>/books/OL25648663M</str>
<str>OL6941607A</str>
</arr>
<str name="title">Nosotros no le decíamos presidente</str>
<str name="title_suggest">Nosotros no le decíamos presidente</str>
<str name="type">edition</str>
</doc> |
There are a set of official Docker images for Solr that we may want to consider using: |
Unfortunately none of them support our current version of solr :/ |
That's because Solr 3.6 is so ancient it hasn't been supported for years. Given that Solr only supports indexes from one major release prior before requiring a complete reindex, and we're planning a reindex anyway, it seems like the perfect opportunity to upgrade to a more modern (and supported) version. As far as I know we have a pretty vanilla installation and schema and don't make use of any exotic features which are likely to be version dependent. The current supported Solr releases are 7.7 and 8.1. |
Are you willing to bet on that assumption, though? :P Doing them together increases the risk that the reindex will have a bug and be unusable. I want to switch openlibrary to the reindex as soon as possible so that we can resolve a lot of those outdated index issues we've been having. Next step is updating the schema to better support diacritics/etc. After that updating solr version (which would require an audit of every where the solr API is used in our code to make sure the APIs in the latest version are still the same). The full reindex is mostly automated, so takes an ~fixed amount of time. Adding new features will take developer time (which is more valuable) and has more uncertainty about how long it will take to add/guarantee those features. |
I'm certainly willing to test the hypothesis. Based on my review of the 5 (!) major version upgrade notes and spot checking the upgrade notes for dozens of point releases in between, I judge the risk to be small. Facets are probably the most volatile API visible feature, but even there I didn't see anything that should impact us. A lot of the things affect clusters, replication, and other features that we don't use. Another advantage of using a more modern version is that we get to take advantage of 7 years of performance improvements.
Fixing the search infrastructure is a high priority, but it's valuable to keep the historical perspective in mind. Many of these problems have existed for 5+ years. Another few weeks isn't going to make or break users' perceptions of search quality on OpenLibrary.
It needs to be fully automated and as lightweight as possible (preferably network independent) with no private side channel information required so that we can iterate on search improvements.
True, but we've already invested the time for the main features that we want. Testing time is also significant and the more iterations we break this into, the greater the testing time required. BTW, I'm not trying to talk anyone else into testing this. I'm happy to roll it into my testing and performance improvements. It may make sense to defer a decision until we have more supporting (or not) data. |
I still think lumping everything together is risky. Right now, we have 2 big changes: a full reindex (with lots of new code), and switching our production env to use docker (lots of room for strange errors). Hooking this up to production is crucial to fully testing this. This is essentially a refactor–we want to maintain the ~same functionality, but with changes to how the code/env works. The more changes we pile on, the harder it will be to know what is causing a bug if a bug appears. To ~quote Martin Fowler:
So doing this in 3 stages has the benefits of:
Doing this in 1 stage:
So I'm convinced that 3 stages is better ¯\_(ツ)_/¯ |
Gall's Law: In other words, baby steps, please |
I reported on the results of my Solr 8.1 experiments many months ago but didn't update this issue, so to close the loop, re:
#2246 includes all the necessary (very minimal) schema updates to support a modern Solr as well as the multicore changes required since there's no such thing as single core Solr any more. The commits should be easily identifiable from the commit messages, but I'm happy to break them out into a separate branch if that makes things easier. |
This opinion is 9 months old, so hopefully it has changed, but I think a key factor which might be being overlooked is the testing cycle. Even the "minimal" reindex is a complete reboot which will require extensive human testing to confirm that things are working as expected. It's very likely that bug fixes will, themselves, require additional complete rebuilds. Given this, I think it makes sense to bundle a reasonable amount of functionality into these heavyweight rebuilds. |
This is deployed to prod ol-web3; monitoring for issues. |
Monitoring is going well; next month will do another re-index + deploy. There are hints that there might be some perf issues, need to add more graphite logging to check. This issue is done though. More issues need to be created for those other things. |
@cdrini Could you describe what "monitoring" means in this context and how the new index was validated to be correct and complete. I've got to say that I'm finding this whole process quite opaque. |
The correctness of the new index was tested mostly here: #2222 ; and it was connected to 1 of our web nodes for ~3 weeks. The biggest risk of error at this point is mostly performance (which is what led to c702875, and I did notice some more peculiarities in performance even after this, but we'll get more information as it goes). I closed this issue because a full re-index is running on production; the initial checklist on the issue had a number of issues, but I consider it done once it went to production and ran hooked to prod successfully for weeks. I need to create an issue for the next small steps (which involve removing the "old" solr entirely). |
This will be an important step into having a more reliable solr environment. Being able to locally create an identical solr environment will get rid of a lot of confusion. It would also allow us a path to move forward on #178 and #599 , since we can spin up a new solr, re-index it with the new settings, and then swap it with the old solr without any downtime.
Subtasks
type: subject
? This looks like it's used for/search/subjects
, so these needed to be included.type: edition
? This looks like residuals of dead code for/search/editions
(which does appear to work for the measly ~3.5K editions stored in solr)/solr/process_stats.py
looks like dead code.Notes/Comments
NOT(type:work) AND NOT(type:author)
?The text was updated successfully, but these errors were encountered: