Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: let es_index.create return versioned index name #1660

Merged
merged 3 commits into from
Sep 25, 2024

Conversation

BeritJanssen
Copy link
Contributor

For now, this PR should fix #1656 . Perhaps not the prettiest solution, but I wanted to avoid repeated requests to the Elasticsearch client just to reaffirm the current version number. Perhaps in the long run, some database bookkeeping of index versions would not be a bad idea, but let's discuss this at a later point.

Copy link
Contributor

@JeltevanBoheemen JeltevanBoheemen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a smart little solution to the problem.
Pending the index management functionality this has my approval.

For safety, I added a review reqest for @lukavdplas. I'm not that familiar with the indexing procedure. So any possible interactions this may cause come to mind?

@BeritJanssen
Copy link
Contributor Author

Ah, I'd asked you because you did the most recent changes in this file. Thanks for the approval, but let's see if @lukavdplas still spots anything we both may have missed indeed.

Copy link
Contributor

@lukavdplas lukavdplas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think this solution makes a lot of sense.

I added one comment about using --add, which I think will not work in this version. Feel free to merge if I'm wrong about that. Otherwise, bonus points if you also add a unit test for that option 🙃

@@ -93,20 +99,26 @@ def create(client: Elasticsearch, corpus: Corpus, add: bool = False, clear: bool
settings=settings,
mappings=es_mapping,
)
return index_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're using the --add flag (to add documents to an existing index), this function won't reach this line, but return None on line 69. I expect that will cause an issue when populate() is called with versioned_index_name = None.

This should be straightforward to fix: just return the name of the existing index in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and luckily, this is already caught by a unit test. :-)

@BeritJanssen BeritJanssen merged commit a512140 into develop Sep 25, 2024
1 check passed
@BeritJanssen BeritJanssen deleted the bugfix/es-index-version branch September 25, 2024 12:49
"""get the name of the current corpus' associated index"""
alias = corpus.es_alias or corpus.es_index
indices = client.indices.get(index="{}".format(alias))
return max(sorted(indices.keys()))
Copy link
Contributor

@lukavdplas lukavdplas Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sorting alphabetically? That would mean that if my_corpus_9 and my_corpus_10 both exist, you'd use my_corpus_9, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: cannot create new index / will write to wrong index version in production
3 participants