-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor reindex_studio management command to support large instances #235
Comments
@ormsbee @pomegranited Does the revised spec above for incremental indexing of studio content make sense? |
@bradenmacdonald @pomegranited I can't figure out if
But for now I'm assuming that we shouldn't delete the active index because if we do then it is not resumable. |
@DanielVZ96 Sorry, you are right. I've updated it. The |
cf comment
Refactor reindex_studio to support allow incremental index building for large instances.
Part of: openedx/frontend-app-authoring#1334
Requirements:
--reset
flag that will create a new index, set its parameters (distinct_attribute
,filterable_attributes
, etc.), and swap it to become the active index, but not actually index any content.--init
flag that is the same as--reset
but only if no index currently exists. If an index exists, it should print a warning saying that "A rebuild of the index is required. Please run ./manage.py cms reindex_studio --experimental [--incremental]"--incremental
flag that will add content to the current active index (NOT creating a temporary index, adding to that, and then swapping them). If there is no current index, it should create one automatically (same as running--reset
). This script should be interruptable and resumable. It should also be easy to use and not require carefully specifying what courses to include in what order.In "incremental mode", the script should:
--reset
is used, erase all rows from the "incremental indexes completed" database tableSummary:
The existing
./manage.py cms reindex_studio
command will create a new search index matching the latest requirements, populate it with data from all courses/libraries in Studio, and then swap it to become the active index. This can be done anytime and works well for smaller instances; there will be no outage of search features during this time as any previously created index continues to be available until the new index is completely ready. This is not suitable for large instances (in terms of content, not users) because it may take many days for the index to complete, and if there's a problem it must start all over from scratch.The new
./manage.py cms reindex_studio --incremental
command willdelete any existing studio search index andcreate a new search index matching the latest requirements (if necessary). Then, it will populate the index with content from courses/libraries - a process that can be paused and resumed as needed. This process may take several days. During this time, studio search will work without errors but results will be incomplete or missing entirely. This is recommended for large instances (in terms of content).The new
./manage.py cms reindex_studio --init
command is suitable to run during initial instance setup or during an upgrade and will work on any instance type/size. It will set up an empty index that's ready for content, but won't add any content to the index. Users will have to manually run one of the two above commands to populate the index if there are any existing courses/libraries.In any case, as long as the index exists, newly changed content will get added to it as changes are made in studio. These commands are only necessary for mass indexing of existing content.
Future
We may simplify this and only support the
--incremental
mode in the future. Or maybe we should just change to that now?The text was updated successfully, but these errors were encountered: