Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced search #727

Open
ionelmc opened this issue Oct 17, 2015 · 20 comments
Open

Advanced search #727

ionelmc opened this issue Oct 17, 2015 · 20 comments
Labels
feature request search Elasticsearch, search filters, and so on

Comments

@ionelmc
Copy link

ionelmc commented Oct 17, 2015

Sorry if this has been asked before.

I would be really nice if I could make metadata searches like these on PyPI:

  • Give me all packages that depend on X
  • Give me all packages that don't depend on Y
  • Give me all packages that depend on X>1.2.3
  • Give me all packages that don't depend on Y<1.2.3
  • Give me all packages that have classifier A :: B :: C
  • Give me all packages that "work" on Python X.Y (but there are multiple ways to declare compatibility? maybe some compound index)
  • Give me all package that have a release after 2015-01-02

And so on ...

@ionelmc
Copy link
Author

ionelmc commented Dec 31, 2015

Not really. Use-case: give me all packages that do X but don't depend on Y, cause Y is broken/whatever.

I frequently look for packages that don't have heavy dependencies. Eg: I want a plotting library that don't depend on matplotlib.

To make the banana analogy: you asked for a banana, you got it, but there's a monkey and the whole forest attached to that banana.

@alexwlchan
Copy link
Contributor

Another criteria that would be useful:

  • Give me all packages that are available under license L
  • Give me all packages that are available under licenses other than L

@ionelmc
Copy link
Author

ionelmc commented Jan 1, 2016

Another idea:

  • Give me all packages that don't have license L and don't have any dependencies with license L (the "can't use GPL" problem many people have)

@toddrjen
Copy link

toddrjen commented Jun 6, 2016

I think some of these would be easier if there was a boolean search that could be used to find packages that don't match a particular result. So rather than:

  • Give me all packages that are available under license L
  • Give me all packages that are available under licenses other than L

You could just have

  • Give me all packages that are available under license L

And use the boolean "not" operator to exclude results matching that.

@nlhkabu nlhkabu added feature request requires triaging maintainers need to do initial inspection of issue labels Jul 2, 2016
@nlhkabu nlhkabu added the search Elasticsearch, search filters, and so on label May 13, 2017
@di di removed the requires triaging maintainers need to do initial inspection of issue label Dec 7, 2017
@brainwane brainwane added this to the 5: Shut Down Legacy PyPI milestone Dec 7, 2017
@brainwane
Copy link
Contributor

A related issue about exclusion in search: #1971.

@brainwane
Copy link
Contributor

@waseem18 is writing up a bit of a proposal on how to do this.

@brainwane
Copy link
Contributor

@waseem18 it would be great to get to see your work in progress! Feel free to share it in a GitHub gist and link to it here, or put it right into a comment. It's fine if it's rough.

@waseem18
Copy link
Contributor

waseem18 commented Mar 1, 2018

@brainwane I'll put up a comment about what and how and then start on after getting feedback.

@waseem18
Copy link
Contributor

waseem18 commented Mar 4, 2018

Below is a rough UI screen on how Advanced Search might look like.

1

  • I tried placing the Advanced Search button/link below Filter projects section. As this is a rough UI this might not be the best place to have the link and we can discuss it's placement.
  • Section on right contains Advanced Search title and below that some search options and their respective input elements. UI for this section can also be improved.
  • Search by entry point #1677 Can also be put up in Advanced Search section.
  • We can start with implementing Search by entry point #1677 and later on proceeding with the next options.

@brainwane @nlhkabu Will be happy to receive your feedback / suggestions on this.

@pradyunsg
Copy link
Contributor

I have to point out that the information about dependencies of a package are not statically available for source distributions. Thus, this information is incompletely available right now. There's an open issue on this repository regarding the same.

IIRC, Warehouse stores the install_requires (i don't remember the name?) metadata for packages that upload a wheel first.

@waseem18
Copy link
Contributor

waseem18 commented Mar 5, 2018

Thanks for the information @pradyunsg I was unaware of #474 and #2502 and I was looking into the JSON's of packages - Your comment put me on track now.

I've gone through #474 #2502 and found that as of now it's not trivial to implement Advance Searching.

And as mentioned on #474

it looks like out of ~120k packages in the PyPi index, only ~17k have a non null info->requires_dist field

Glad that PEP 566 has been accepted which paves way for having meta data for packages that upload a wheel first.

@pradyunsg
Copy link
Contributor

Thanks for the information @pradyunsg

Glad to be of help. :)

@brainwane brainwane modified the milestones: 5: Shut Down Legacy PyPI, 6. Post Legacy Shutdown Mar 6, 2018
@brainwane
Copy link
Contributor

In today's Warehouse core developers' meeting we decided to pare down our near-future milestones on our development roadmap so they really only contain the essential bugfixes and features we need to launch, replace legacy PyPI, and shut down the old site.

So I'm moving this issue into a milestone further in the future; sorry for the wait. And I would love for @waseem18 to make further progress on it, if he would like to!

@waseem18
Copy link
Contributor

waseem18 commented Mar 6, 2018

I would be happy to work on this @brainwane
I'll keep a close look on the issues that this issue depends on so that we can start on once they are resolved.

Similar is the case for #1677

@nlhkabu
Copy link
Contributor

nlhkabu commented Mar 10, 2018

hi @waseem18 thanks for your work on this so far.

A couple of UX ideas:

  1. I think it could be better to have the advanced search appear below the main search bar - something like this:

screenshot from 2018-03-10 11-07-41

  1. it would be awesome if we could develop some kind of advanced search syntax - similar to github:
    https://help.github.com/articles/searching-issues-and-pull-requests/

What do you think?

@waseem18
Copy link
Contributor

Thanks for the UX ideas @nlhkabu . The suggested UX looks really great.

I'll implement the UI in the same way as you suggested once work on Advance Search is started.

@brainwane
Copy link
Contributor

#3452 (comment) has a suggestion from @drunkwcodes:

Maybe introducing https://github.com/nepsilon/search-query-parser and letting users to type search queries like "Framework:Django" in the search bar will help.

Because we are familiar with Google search and Github search.


@honzakral I'd appreciate your assessment on what we need to configure or what components/extensions we need to add to our ElasticSearch setup to get more advanced search in Warehouse, if you have time to give your opinion!

@honzakral
Copy link
Contributor

There are no additional components needed from the installation part, as long as all the fields you'd want to query exist on the documents. Then it's a matter of extracting those conditions in a structured way (either by parsing text input or by processing a more complex/broken down form), validating them (by providing a whitelist of options) and adding conditions to the search. Something like:

# get search object from current code
search = get_search()

# create Query objects from form data ...
for filter in parse_and_validate_filters(form_data):
    # and apply to search
    search = search.filter(filter)

To create a filter there would be somewhere (I'd assume a Form object of some sort) logic to convert the input to Query:

assert parse_input('version>=1') == Q('range', version={'gte': 1})
assert parse_input('version>=1,<3') == Q('range', version={'gte': 1, 'lt': 3})
assert parse_input('Framework:django') == Q('match', framework='django')

Alternatively you could also use FacetedSearch abstraction which is already part of elasticsearch-dsl (0) and that has the ability to use filters as well as calculate/display facets which is always a nice addition to search.

I would be happy to talk more and provide any help with the elasticsearch part of this

0 - http://elasticsearch-dsl.readthedocs.io/en/latest/faceted_search.html#example

@brainwane
Copy link
Contributor

@honzakral If you're open to actually making the improvement in Warehouse yourself, that would be great! If not, I totally understand, and will ask @robhudson whether he has time. :)

@honzakral
Copy link
Contributor

I would love to work on it, but not sure about the time. I will try to make time at PyCon sprints, will update the issue then. Thanks for the ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request search Elasticsearch, search filters, and so on
Projects
None yet
Development

No branches or pull requests

9 participants