Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search updates #99

Closed
3 of 5 tasks
Tracked by #107
nsheff opened this issue Feb 28, 2023 · 10 comments
Closed
3 of 5 tasks
Tracked by #107

Search updates #99

nsheff opened this issue Feb 28, 2023 · 10 comments

Comments

@nsheff
Copy link
Contributor

nsheff commented Feb 28, 2023

Brainstorming some ideas for bringing the search to the next level?

  • Number of GEO entries: do we have enough?
  • Re-indexing, and automated re-indexing
  • add metadata description to index
  • Only 11-14 results are showing. Enable paging? Adjust cutoff?
  • Show similarity score on results page
@nleroy917
Copy link
Member

nleroy917 commented Mar 1, 2023

Re: pagination, I am a bit stumped since the qdrant.search() API doesn't return a count. A workaround is to just set the limit to an incredibly high value and count what get's returned

count = len(qdrant.search(
    collection_name=(
         query.collection_name or DEFAULT_QDRANT_COLLECTION_NAME
    ),
    query_vector=query_vec,
     limit=1e99, # get everything above the threshold
     offset=offset,
     score_threshold=score_threshold,
))

This would be really non-performant, however. I guess we could still paginate but adopt an "infinite scroll" type scenario. Just let them keep paging, incrementing the offset by the previous value + limit, until results are of length 0. With a low enough score_threshold you could in theory just page forever.

@nleroy917
Copy link
Member

New UI:

image

Settings modal

image

@nsheff
Copy link
Contributor Author

nsheff commented Mar 1, 2023

Re: pagination, I am a bit stumped since the qdrant.search() API doesn't return a count. A workaround is to just set the limit to an incredibly high value and count what get's returned

No -- use the limit and offset of the search. So, by default you search with limit = 100 and offset = 0. You display these results with a "page 2" button. If the user clicks "page 2", then you repeat the search with limit = 100 and offset = 100.

Offset = (page-1) * limit
limit is always 100.

There is no "number of results", in this case.

@nsheff nsheff modified the milestones: Version 0.5.1, Version 0.6.0 Mar 2, 2023
nleroy917 added a commit that referenced this issue Mar 3, 2023
@nleroy917
Copy link
Member

I feel pretty good about the updates here. Maybe some UI/UX/styling things could be done, but the functionality is basically there.

@nleroy917
Copy link
Member

Two things should be completed:

  1. A GitHub action should be created to run once a day (or week) to index the PEPs
  2. We should augment how we are mining descriptions of all PEPs (@khoroshevskyi mentioned he might be able to do this)

@nsheff
Copy link
Contributor Author

nsheff commented Mar 9, 2023

Some search results that come up for me, when I click on them, it says "not found".

I think this may just reflect that the re-indexing is still needed, is that right?

@nleroy917
Copy link
Member

This is related to pepkit/pepembed#2

@khoroshevskyi it would be nice to get this stuff fleshed out prior to the latest release of PEPhub

@nleroy917
Copy link
Member

Seems like this still needs to occur

@nleroy917
Copy link
Member

Still open because of the indexing strategy we are still using.

@nleroy917 nleroy917 modified the milestones: Version 0.6.x, 0.8.x Jun 7, 2023
@nleroy917
Copy link
Member

I'll close this since really the re-indexing is a pepembed issue and being worked on over here: pepkit/pepembed#2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants