Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix #4268] Adding Documentation for upgraded Search #4467

Merged
merged 8 commits into from
Sep 7, 2018

Conversation

safwanrahman
Copy link
Member

I have added instruction for preparing the local development for search. Its first step towards documentation of search as proposed in #4268.
@ericholscher r?

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good start. Needs some spell checking, and I think I touched on the grammar issues I found.

Search
============

Read The Docs uses Elasticsearch_ instead of built in Sphinx search for providing better search result. Documentations are indexed in Elasticsearch index and the search is made through API. All the Search Code is Opensource and lives in `Github Repository`_. Currently we are using `Elasticsearch 6.3`_ version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result -> results

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentations -> Documents


Installing and running Elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You must need to install and run `Elasticsearch 6.3`_ version in your local development machine. You can get installation instuction `here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instuction is misspelled.

^^^^^^^^
For using search, you need to index data to Elasticsearch Index. Run `reindex_elasticsearch` management command::

./manage.py reindex_elasticsearch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this take arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

./manage.py search_index which is provided by DED takes argument. our implementation of management command do not take arguments.

@safwanrahman safwanrahman changed the title Adding Documentation for upgraded Search [Fix #4268] Adding Documentation for upgraded Search Aug 7, 2018
@safwanrahman
Copy link
Member Author

@ericholscher fixed the grammer and added some docs about architecture. Is it ready to merge?

Search
============

Read The Docs uses Elasticsearch_ instead of built in Sphinx search for providing better search
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...instead of the built-in Sphinx search...

============

Read The Docs uses Elasticsearch_ instead of built in Sphinx search for providing better search
results. Documents are indexed in Elasticsearch index and the search is made through API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...in the Eleasticearch index...through the API.


Read The Docs uses Elasticsearch_ instead of built in Sphinx search for providing better search
results. Documents are indexed in Elasticsearch index and the search is made through API.
All the Search Code is Opensource and lives in `Github Repository`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...the GitHub Repository.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And "open source", not Opensource

Read The Docs uses Elasticsearch_ instead of built in Sphinx search for providing better search
results. Documents are indexed in Elasticsearch index and the search is made through API.
All the Search Code is Opensource and lives in `Github Repository`_.
Currently we are using `Elasticsearch 6.3`_ version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the Elasticsearch 6.3 version.


Installing and running Elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You must need to install and run `Elasticsearch 6.3`_ version in your local development machine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to install and run the Elasticsearch 6.3 version on your local development machine. You can get the installation instructions here.

You must need to install and run `Elasticsearch 6.3`_ version in your local development machine.
You can get installation instruction
`here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_.
Otherwise, you can also start a Elasticsearch Docker container by running following command::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running the following command


Auto Indexing
^^^^^^^^^^^^^
By default, Auto Indexing is turned off in development mode. To turn in on, change the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To turn it on

^^^^^^^^^^^^^
By default, Auto Indexing is turned off in development mode. To turn in on, change the
`ELASTICSEARCH_DSL_AUTOSYNC` settings to `True` in `readthedocs/settings/dev.py` file.
After that, whenever a documentation successfully build, or project gets added,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builds

------------
The search architecture is devided into 2 parts.
One part is responsible for **indexing** the documents and projects and
other part is responsible to query through the Index for showing proper result to users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

results

The search architecture is devided into 2 parts.
One part is responsible for **indexing** the documents and projects and
other part is responsible to query through the Index for showing proper result to users.
We use `django-elasticsearch-dsl`_ package mostly for the keep the search working.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to keep the


Indexing
^^^^^^^^
All the Sphinx documents are indexed into elasticsearch after build gets successfully finish.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the build is successful.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

After any build gets successfully finished, `HTMLFile` objects are created for each of the
`HTML` file of the build version and delete the old version's `HTMLFile` object. Signal_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files and the old version's HTMLFile object is deleted.

files. Both of the signals are dispatched with the list of the instances of `HTMLFile`
in `instance_list` parameter.

We listen the `bulk_post_create` and `bulk_post_delete` signals in our `Search` application and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listen to the


How we index projects
~~~~~~~~~~~~~~~~~~~~~
We also index project informations in our search index so that user can search for projects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

information ... so that they user can...

~~~~~~~~~~~~~~~~~~~~~~

`elasticsearch-dsl`_ provide model like wrapper for `Elasticsearch document`_.
As per requirements of `django-elasticsearch-dsl`_, its stored in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's stored

Elasticsearch Document
~~~~~~~~~~~~~~~~~~~~~~

`elasticsearch-dsl`_ provide model like wrapper for `Elasticsearch document`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provides a model-like wrapper for the

As per requirements of `django-elasticsearch-dsl`_, its stored in
`readthedocs/search/documents.py` file.

**ProjectDocument:** Its used for indexing projects. Signal listener of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's

`django-elasticsearch-dsl`_ listen the `post_save` singal of `Project` model and
then index/delete into Elasticsearch.

**PageDocument**: Its used for indexing documentation of projects. By default, the auto
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's

**PageDocument**: Its used for indexing documentation of projects. By default, the auto
indexing is turned off by `ignore_signals = settings.ES_PAGE_IGNORE_SIGNALS`.
`settings.ES_PAGE_IGNORE_SIGNALS` is `False` both in development and production.
As mentioned above, our `Search` app listens the `bulk_post_create` and `bulk_post_delete`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listens to the

As mentioned above, our `Search` app listens the `bulk_post_create` and `bulk_post_delete`
signals and index/delete documentations into Elasticsearch. The signal listeners are in
the `readthedocs/search/signals.py` file. Both of the signals are dispatched
after successful documentation build.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after a successful

@safwanrahman
Copy link
Member Author

@RichardLitt I have fixed all the issue you have mentioned. can you check again?


Installing and running Elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You need to install and run the `Elasticsearch 6.3`_ version on your local development machine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...run Elastichsearch version 6.3_ on your...

This might be a bit more fluent.

You need to install and run the `Elasticsearch 6.3`_ version on your local development machine.
You can get the installation instructions
`here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_.
Otherwise, you can also start a Elasticsearch Docker container by running the following command::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start an Elasticsearch


Indexing into Elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^
For using search, you need to index data to Elasticsearch Index. Run `reindex_elasticsearch`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data to the Elasticsearch

Auto Indexing
^^^^^^^^^^^^^
By default, Auto Indexing is turned off in development mode. To turn it on, change the
`ELASTICSEARCH_DSL_AUTOSYNC` settings to `True` in `readthedocs/settings/dev.py` file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the readthedocs... file

------------
The search architecture is devided into 2 parts.
One part is responsible for **indexing** the documents and projects and
other part is responsible to query through the Index for showing proper results to users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other part is responsible for querying the Index to show the proper results to users.

The search architecture is devided into 2 parts.
One part is responsible for **indexing** the documents and projects and
other part is responsible to query through the Index for showing proper results to users.
We use `django-elasticsearch-dsl`_ package mostly to the keep the search working.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the


Indexing
^^^^^^^^
All the Sphinx documents are indexed into elasticsearch after the build is successful.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch should always be capitalized.

How we index documentations
~~~~~~~~~~~~~~~~~~~~~~~~~~~

After any build gets successfully finished, `HTMLFile` objects are created for each of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any build is successfully finished

~~~~~~~~~~~~~~~~~~~~~~~~~~~

After any build gets successfully finished, `HTMLFile` objects are created for each of the
`HTML` files and the old version's `HTMLFile` object is deleted. Signal_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Signal_

~~~~~~~~~~~~~~~~~~~~~
We also index project information in our search index so that the user can search for projects
from the main site. `django-elasticsearch-dsl`_ listen `post_create` and `post_delete` signals of
`Project` model and index/delte into Elasticsearch accordingly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

Elasticsearch Document
~~~~~~~~~~~~~~~~~~~~~~

`elasticsearch-dsl`_ provides model like wrapper for the `Elasticsearch document`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provides a model-like wrapper


`elasticsearch-dsl`_ provides model like wrapper for the `Elasticsearch document`_.
As per requirements of `django-elasticsearch-dsl`_, it is stored in
`readthedocs/search/documents.py` file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stored in a

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be the

`readthedocs/search/documents.py` file.

**ProjectDocument:** It is used for indexing projects. Signal listener of
`django-elasticsearch-dsl`_ listen the `post_save` singal of `Project` model and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listens to the

indexing is turned off by `ignore_signals = settings.ES_PAGE_IGNORE_SIGNALS`.
`settings.ES_PAGE_IGNORE_SIGNALS` is `False` both in development and production.
As mentioned above, our `Search` app listens to the `bulk_post_create` and `bulk_post_delete`
signals and index/delete documentations into Elasticsearch. The signal listeners are in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexes/deleted documentation into...

Copy link
Member

@RichardLitt RichardLitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there! Sorry about all of these comments.

@safwanrahman
Copy link
Member Author

@RichardLitt I have updated the issues you have mentioned.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. We could definitely add a bit more detail in some places, but I think this is a solid start to work from. 👍

For using search, you need to index data to the Elasticsearch Index. Run `reindex_elasticsearch`
management command::

./manage.py reindex_elasticsearch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth noting here why we implemented our own version, instead of using the detail one provided by django-elasticsearch-dsl.

in `instance_list` parameter.

We listen to the `bulk_post_create` and `bulk_post_delete` signals in our `Search` application and
index/delete the documentation content from the `HTMLFile` instances.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it's probably worth mentioning why we designed it this way, because by default it was doing an HTTP request per object.

the `readthedocs/search/signals.py` file. Both of the signals are dispatched
after a successful documentation build.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably useful to mention how we parse the data from the JSON files to get the actual indexed data.

@RichardLitt
Copy link
Member

@safwanrahman Thanks. :) Sorry for being so persnickety!

@safwanrahman
Copy link
Member Author

@ericholscher updated with fixes. Possible to merge?

@agjohnson agjohnson added the PR: work in progress Pull request is not ready for full review label Aug 27, 2018
@agjohnson agjohnson added this to the Search improvements milestone Aug 27, 2018
@ericholscher ericholscher merged commit 88ff413 into readthedocs:search_upgrade Sep 7, 2018
safwanrahman pushed a commit to safwanrahman/readthedocs.org that referenced this pull request Sep 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: work in progress Pull request is not ready for full review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants