Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix #4265] Port Document search API for Elasticsearch 6.x #4309

Merged
merged 9 commits into from
Jul 6, 2018

Conversation

safwanrahman
Copy link
Member

This is the first etaration of making a API for Documentation Search. Its not finalized yet and the work is not finished yet.
Need more work on highlighting as well as test.
Needed to do some refactor to keep it DRY.
@ericholscher Can you look for a slight review?

@safwanrahman safwanrahman self-assigned this Jun 27, 2018
@safwanrahman safwanrahman added the PR: work in progress Pull request is not ready for full review label Jun 27, 2018
@safwanrahman
Copy link
Member Author

Currently the API looks something like this!
image

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of my questions are around whether we're using proper querysets, or we have things that look like querysets, but are actually search results that we're filtering. Could use a bit more explanation, and code comments if that is what we're doing.

@@ -60,14 +61,19 @@ class Meta(object):
content = fields.TextField(attr='processed_json.content')
path = fields.TextField(attr='processed_json.path')

# Fields to perform search with weight
search_fields = ['title^10', 'headers^5', 'content']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used by the DocType class directly, or are we storing it here to add later? If the prior, we should probably set it in an __init__ method or somewhere else, so it doesn't get confused with the other data here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not actually used by DocType class, but we are using it in class method. I dont know if keeping it in __init__ method will make it work in class method!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, good point. I guess it's fine for now then.


def get_queryset(self):
query = self.request.query_params.get('query')
queryset = PageDocument.search(query=query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does search() return a queryset? Seems like it would return a search object or similar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. it returns a Search object.

@@ -67,6 +67,7 @@
url(r'^api/', include(v1_api.urls)),
url(r'^api/v2/', include('readthedocs.restapi.urls')),
url(r'^api-auth/', include('rest_framework.urls', namespace='rest_framework')),
url(r'^api/search/', PageSearchAPIView.as_view()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should likely continue to live in v2 for now. Perhaps as api/v2/indocsearch or something, so it can live beside the old docsearch during rollout?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! I added it for testing purpose actually!

project_slug = request.query_params.get('project')
project_slug_list = get_project_slug_list_or_404(project_slug=project_slug,
user=request.user)
return queryset.filter('terms', project=project_slug_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we're conflating querysets and Search queries here again. It feels a bit odd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! I understand. Its not queryset, but a Search object but it is similar to queryset.

@safwanrahman
Copy link
Member Author

I think most of my questions are around whether we're using proper querysets, or we have things that look like querysets, but are actually search results that we're filtering.

@ericholscher We actually have a Search object that act closely similar to queryset. So we can use the Search object as queryset to easily implement the API!

@safwanrahman safwanrahman removed the PR: work in progress Pull request is not ready for full review label Jun 30, 2018
@safwanrahman
Copy link
Member Author

@ericholscher Its ready for a final review.
r?

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. We still need to wire this up to the front-end JS right?

@@ -48,7 +48,7 @@
url(r'index_search/',
search_views.index_search,
name='index_search'),
url(r'search/$', views.search_views.search, name='api_search'),
# url(r'search/$', views.search_views.search, name='api_search'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not remove this yet, until we've deployed a full version to replace it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know, but with this, the reverse function act differently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without having comment, its returns something.

resolve('/api/v2/docsearch/')
>>> ResolverMatch(func=readthedocs.restapi.views.search_views.search, args=(), kwargs={}, url_name=api_search, app_names=[], namespaces=[])

But it should return

>>> ResolverMatch(func=readthedocs.search.api.PageSearchAPIView, args=(), kwargs={}, url_name=doc_search, app_names=[], namespaces=[])

I have no idea why this is happening. maybe a bug in django?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Believe it's a bad regex. It should be r'^search/$' -- now it's just catching anything that ends in /api/v2/<anythinghere>search/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ericholscher . I did not suspect the regex can be wrong! It worked!

# Run bool query with should, so it returns result where either of the query matches
bool_query = Bool(should=all_queries)
search = search.query(bool_query)
search = search.query(query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we shouldn't be overwriting the object here. Will we return something invalid if there is no query because we have the same object name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have actually a mixed feeling about this. Do you have any idea about how to do this withtout overriding the search method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean mostly just the name of the object. search = search.query(query) -- is that similar to a queryset = queryset.filter()? It just feels like a weird bit of syntax to rewrite the name of the object over again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. search = search.query(query) -- is that similar to a queryset = queryset.filter(). If we do not overwrite the same object, then another variable need to be assigned and check if that is not None

@@ -66,6 +66,8 @@
api_urls = [
url(r'^api/', include(v1_api.urls)),
url(r'^api/v2/', include('readthedocs.restapi.urls')),
# Keep the `doc_search` at root level, so the test does not fail for other API
url(r'^api/v2/docsearch/$', PageSearchAPIView.as_view(), name='doc_search'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. What tests were failing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@safwanrahman
Copy link
Member Author

@ericholscher Anything left for fixing?

@safwanrahman
Copy link
Member Author

@ericholscher while I was working in the frontend, I realized that the link of document is needed. I have implemented that. Can you please make a review?

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with a few small nits, and fixing the tests on py3. Glad we caught the link addition from the client side, I had a feeling we'd need more to render the HTML properly :)

context['projects_info'] = self.get_projects_info()
return context

def _get_all_projects(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same logic as in get_project_slug_list_or_404? It should probably use the util function for this I think? Perhaps it could be get_project_list_or_404 and we can pass slug_only param to it or something?


def get_serializer_context(self):
context = super(PageSearchAPIView, self).get_serializer_context()
context['projects_info'] = self.get_projects_info()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

projects_info I think is a bad name. It seems like we're just getting the docs_url from it for now, so we should probably be more clear about that. Perhaps context['project_urls'] or something?

@safwanrahman
Copy link
Member Author

@ericholscher Fixed python3 compatibility and changes as you reviewed. r?

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. 👍

@ericholscher ericholscher merged commit d6638b9 into readthedocs:search_upgrade Jul 6, 2018
@safwanrahman safwanrahman deleted the search_api branch July 7, 2018 19:23
safwanrahman pushed a commit to safwanrahman/readthedocs.org that referenced this pull request Jul 16, 2018
[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x
safwanrahman pushed a commit to safwanrahman/readthedocs.org that referenced this pull request Jul 16, 2018
[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants