-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: let es_index.create return versioned index name #1660
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a smart little solution to the problem.
Pending the index management functionality this has my approval.
For safety, I added a review reqest for @lukavdplas. I'm not that familiar with the indexing procedure. So any possible interactions this may cause come to mind?
Ah, I'd asked you because you did the most recent changes in this file. Thanks for the approval, but let's see if @lukavdplas still spots anything we both may have missed indeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I think this solution makes a lot of sense.
I added one comment about using --add
, which I think will not work in this version. Feel free to merge if I'm wrong about that. Otherwise, bonus points if you also add a unit test for that option 🙃
@@ -93,20 +99,26 @@ def create(client: Elasticsearch, corpus: Corpus, add: bool = False, clear: bool | |||
settings=settings, | |||
mappings=es_mapping, | |||
) | |||
return index_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're using the --add
flag (to add documents to an existing index), this function won't reach this line, but return None
on line 69. I expect that will cause an issue when populate()
is called with versioned_index_name = None
.
This should be straightforward to fix: just return the name of the existing index in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and luckily, this is already caught by a unit test. :-)
"""get the name of the current corpus' associated index""" | ||
alias = corpus.es_alias or corpus.es_index | ||
indices = client.indices.get(index="{}".format(alias)) | ||
return max(sorted(indices.keys())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sorting alphabetically? That would mean that if my_corpus_9
and my_corpus_10
both exist, you'd use my_corpus_9
, no?
For now, this PR should fix #1656 . Perhaps not the prettiest solution, but I wanted to avoid repeated requests to the Elasticsearch client just to reaffirm the current version number. Perhaps in the long run, some database bookkeeping of index versions would not be a bad idea, but let's discuss this at a later point.