Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relevancy index with Logistic regression and cosine distance #411

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

vbhavank
Copy link
Collaborator

The following merge request adds a RelevancyIndex plugin that uses logistic regression and cosine distance for re-ranking IQR results during 'refine.'

@lgtm-com
Copy link

lgtm-com bot commented Jul 14, 2020

This pull request introduces 1 alert when merging c8a8f8c into 9697837 - view on LGTM.com

new alerts:

  • 1 for Unused import

@vbhavank vbhavank requested a review from Purg July 14, 2020 18:46
python/smqtk/algorithms/nn_index/faiss.py Show resolved Hide resolved
@@ -232,6 +232,7 @@ def __init__(self, descriptor_set, idx2uid_kvs, uid2idx_kvs,
self.random_seed = int(random_seed)
# Index value for the next added element. Reset to 0 on a build.
self._next_index = 0
self._distance_metric = distance_m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be being used where I would have expected it to be used: in selecting the faiss.METRIC_* value in index construction here (L312). It's still always using L2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here is to normalize the vector before we build an index with regular Euclidean distance. By doing so, we build the index with cosine distance.

@@ -435,6 +436,11 @@ def _build_index(self, descriptors):

faiss_index = self._index_factory_wrapper(d, self.factory_string)
# noinspection PyArgumentList
if self._distance_metric:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be treating the input string value as a boolean. is the intent here to just always normalize descriptors when using FAISS? If so then this behavior should be documented in the class doc-string.

I also see this logic repeated below. I know this isn't a huge about of code, but it might make some sense to break this out into an encapsulated function that both call.

Copy link
Collaborator Author

@vbhavank vbhavank Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the normalize function a static method.

python/smqtk/algorithms/relevancy_index/logistic_reg.py Outdated Show resolved Hide resolved
python/smqtk/algorithms/relevancy_index/logistic_reg.py Outdated Show resolved Hide resolved
python/smqtk/algorithms/relevancy_index/logistic_reg.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Aug 14, 2020

This pull request introduces 2 alerts when merging c719e45 into 9697837 - view on LGTM.com

new alerts:

  • 2 for Unused import

Removed unused imports in logistic _reg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants