Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARXIVNG-372 refacted author search as part of simple and advanced search interfaces #122

Merged
merged 6 commits into from
Mar 23, 2018

Conversation

erickpeirson
Copy link
Contributor

@erickpeirson erickpeirson commented Mar 22, 2018

  • Author search is now part of advanced and simple searches
  • Instead of directing author searches from abs page, results, to a separate author search interface, we should direct them to the simple search interface with: searchtype=author&query={surname}, {forename} (implemented already for search results).
  • Multiple author names (conjunction) can be specified, delimited with semicolons (;).
  • Each author name may be a literal -- if enclosed in quotes, the search will be for exact matches only.
  • Whether a literal or not, author names with commas are unpacked as {surname}, {forename} -> {forename} {surname} before executing queries.
  • Includes an opportunistic first pass at Refactor and/or modularize search.services.index #112 .

Also delivers ARXIVNG-338. To use (from root of repo):

pip install -U pipenv
pipenv install
pipenv run nose2 --with-coverage     # Here's how you would run the test suite.
pipenv run pylint search     # Here's how you would run pylint
pipenv run mypy --ignore-missing-imports -p search   # And mypy.
pipenv run pydocstyle --convention=numpy --add-ignore=D401 search .   # And pydocstyle.

# When running Python, env vars go before the pipenv invocation, like this:
FLASK_APP=app.py pipenv run python create_index.py     

# So you'd start Flask like this:
FLASK_DEBUG=1 FLASK_APP=app.py pipenv run flask run

Note that this doesn't impact how you would run ES on your machine. For example, if you are using docker-compose you would continue to do this precisely as you did before.

docker-compose build
docker-compose up

@eawoods
Copy link
Contributor

eawoods commented Mar 22, 2018

I got through the pipenv steps through "run flask run" (which is just my new favorite thing and I am always going to use the Forrest Gump voice while pasting that in). Works like a charm!
However, the create_index command is coming back with Connection refused, I assume because I didn't start up Elasticsearch. Which method of doing so is the best one for using pipenv? Does pipenv simply replace the "pip install requirements" part, and I would docker-compose up to get elasticsearch as before?

@eawoods
Copy link
Contributor

eawoods commented Mar 22, 2018

Nevermind, flailed at my own questions, tried docker-compose up and works just fine. Suggestion to make sure that's clear in the instructions - I'm likely to forget the exact steps next time too.

@eawoods
Copy link
Contributor

eawoods commented Mar 22, 2018

Me again. :)
Will continue testing, but first pass: somehow the paper title field has been dropped from results display, at least on my instance. Scanned the code changes but can't figure out why.

@erickpeirson
Copy link
Contributor Author

pipenv instructions: updated -- better?

@erickpeirson
Copy link
Contributor Author

Huh, that's weird about the title. Here's what it looks like on my machine (with a fresh index):

image

@eawoods
Copy link
Contributor

eawoods commented Mar 23, 2018

@erickpeirson Must be my index, then, somehow. Will purge and rebuild index presently.
Mine looks like this:
screen shot 2018-03-23 at 9 37 55 am

Instructions: much better, thank you!! I will follow along to the best of my ability and cry for help if I get lost. :)

@mhl10
Copy link
Contributor

mhl10 commented Mar 23, 2018

When I index the following set of papers:

hep-ph/0402236
hep-ph/0609048
hep-ph/0411394
1710.02357
1410.6043
1509.07688
1608.00420
0712.1442
1003.3986

and then search for Jurgen Korner in simple search with author field, the first two hits look good. Third hit is for Janos Korner; subsequent hits are for Jürgen Körner, which I expected to be higher. Similarly, If I search for Jürgen Körner, first three hits are good, fourth hit is for János Körner, and subsequent hits are for Jurgen Korner.

@eawoods
Copy link
Contributor

eawoods commented Mar 23, 2018

You were on fire with this, @erickpeirson!
Overall, author field is functioning incredibly well AND names in "all fields" are much better also.
I searched for "john" in both author and all fields and got two results in our standard metadata, one with John (and Johnson) as a last name and one with John as a first name. AND....same results on the "all fields" simple search box. Names in quotes (and not in quotes) improved overall.

Wildcards not working with capitalized terms - same as Jref bug.
Otherwise my test cases were all excellent.

Possible note to provide in help text about searching for authors with name1; name2 : this works in Authors field much better than in All Fields (it still works, but with varied results).

Oh - possible missing match? In the following sequence:
Search for --> results
Della Ceca --> 0807.1067v2 and 0510845v1 (author listed as R Della Ceca)
R Ceca --> same
Della --> same
dell* --> same
cec* --> same, plus a match for Cecilia Tarantino (as expected)
R Della Ceca --> same
R Della --> same
Robert Della Ceca --> no results (should at least have the two R Della Ceca results as partials? Yes?)

@erickpeirson
Copy link
Contributor Author

image

@erickpeirson
Copy link
Contributor Author

image

@erickpeirson
Copy link
Contributor Author

image

@erickpeirson
Copy link
Contributor Author

erickpeirson commented Mar 23, 2018

@eawoods What do you think of adding some small help text under the search box describing how to search by author name? E.g.

Author names: You may search by surname, forename or forename surname. Separate multiple authors on the same paper with semicolons, e.g. Joe Bloggs; Doe, Jane. Wildcards may also be used, provided that they are not at the beginning of the name.

@eawoods
Copy link
Contributor

eawoods commented Mar 23, 2018

@erickpeirson Sounds terrific! Are you thinking that particular help text would show underneath the box all the time, or only when Author(s) is selected as the field type? Or all the time? (it might be worth having something like that available for both all fields and authors, or a variant of it)

P.S. there are a couple of author search help lines in the Tips box in my ARXIVNG-384 PR but I am all about the idea of having helper text right underneath the box for authors in particular.

Copy link
Contributor

@eawoods eawoods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to go! Numerous permutations of requests are returning as expected (including arm*; ire* and many multiple name cases).

@erickpeirson erickpeirson merged commit 139debe into develop Mar 23, 2018
@bmaltzan bmaltzan deleted the task/ARXIVNG-372 branch April 1, 2021 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants