Skip to content

Usage with Elasticsearch

Shane Harvey edited this page Jan 10, 2017 · 7 revisions

NOTE: in mongo-connector versions < 2.3, the elastic doc manager was packaged as part of mongo-connector and only supports Elastic 1.x. In mongo-connector versions >= 2.3, the doc managers for Elastic 1.x and 2.x are available as plugins. For more information on how to install the elastic doc managers, please see the Elastic doc manager documentation for the version of Elastic you prefer. These doc managers will only work with mongo-connector 2.3.0+.

Elastic 1.x doc manager: https://github.com/mongodb-labs/elastic-doc-manager

Elastic 2.x doc manager: https://github.com/mongodb-labs/elastic2-doc-manager

Once the Elastic doc manager of your choice is installed, the following applies for running them:

Installation

New in Mongo Connector 2.5.0

The install command is different depending on the version of Elasticsearch you are targeting.

New in elastic2-doc-manager 0.3.0, support for Elasticsearch 5.x. Install with pip install 'mongo-connector[elastic5]' but continue to use the elastic2_doc_manager as the doc manager module name

Elasticsearch Version Install Command
Elasticsearch 1.x pip install 'mongo-connector[elastic]'
Amazon Elasticsearch 1.x Service pip install 'mongo-connector[elastic-aws]'
Elasticsearch 2.x pip install 'mongo-connector[elastic2]'
Amazon Elasticsearch 2.x Service pip install 'mongo-connector[elastic2-aws]'
Elasticsearch 5.x pip install 'mongo-connector[elastic5]'

The Basics

Mongo Connector can replicate to Elasticsearch using the Elastic DocManager. The most basic usage is the following:

mongo-connector -m localhost:27017 -t localhost:9200 -d elastic_doc_manager

Or, if you are using the Elastic2 DocManager:

mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager

old usage (before 2.0 release):

mongo-connector -m localhost:27017 -t localhost:9200 -d <your-doc-manager-folder>/elastic_doc_manager.py

This assumes there is a MongoDB replica set running on port 27017 and that Elasticsearch is running on port 9200 both on the local machine.

Elasticsearch Indexes, Mappings, and Types

Mongo Connector gives each MongoDB database its own index in Elasticsearch. Each MongoDB collection becomes its own mapping type. For example, documents from the collection kittens in the database animals will put into the animals index in Elasticsearch with a mapping type of kittens. Mongo Connector also stores metadata in another index called mongodb_meta by default (this can be configured by setting meta_index_name in the args document in the doc manager config.

You can set up all the indexes you want in advance, or you can have Mongo Connector create them automatically for you. If you want Mongo Connector to be able to create indexes automatically, make sure that action.auto_create_index is set to true in your elasticsearch.yml.

Using Elasticsearch for Geo Queries

Elasticsearch, like MongoDB, supports geographical field types and queries. In order to make geo queries to Elasticsearch, you must set up a mapping on Elasticsearch manually, before running mongo-connector. The dynamic mappings that Mongo Connector creates on first insert are not enough for Elasticsearch to detect geo field types. Please refer to the Elasticsearch documentation on setting up geospatial mapping types for points and shapes.

Support for GridFS

New in Mongo Connector 2.0

Starting in version 2.0, Mongo Connector can replicate files stored in GridFS to Elasticsearch using the attachment mapping type. In order for this to work, you need to do the following:

  1. Install the attachment plugin.

  2. Create the index where you will store your GridFS documents:

     curl -XPUT http://localhost:9200/myindex
    
  3. Create a mapping corresponding to the MongoDB collection where your GridFS files are stored, and add a field called content with a type of attachment. For example, if your GridFS files are in the fs collection in MongoDB, you will want to create a fs mapping in Elasticsearch:

     curl -XPUT http://localhost:9200/myindex/collection.fs/_mapping -d'{
         "fs": {
             "properties": {
                 "content": {"type": "attachment"}
     }}}'
    

Managing Refresh Behavior

Mongo Connector does not force a refresh for every write operation. The Elasticsearch administrator may configure refresh behavior as increase overall performance. You can configure how often Elasticsearch indexes are refreshed either by changing the refresh_interval in the index module settings or index settings.

Mongo Connector also provides the --auto-commit-interval option to override any configuration in Elasticsearch, though configuring refresh behavior in Elasticsearch should be preferred to this option. This option takes as an argument a number which is to be the maximum number of seconds allowed before a write must be committed. An argument of 0 means that every write operation is committed immediately:

# commit every write immediately (this was the old behavior)
mongo-connector --auto-commit-interval=0 -d elastic_doc_manager -t localhost:9200

Customizing Behavior of the Elastic Client

The Elastic DocManager wraps the Python Elastic client and allow you to pass arbitrary options to its constructor in the mongo-connector config file. The constructor options are passed in a JSON object under the key args.clientOptions. For example, if you wish to set the timeout option to 200:

  "docManagers": [
    {
      "docManager": "elastic_doc_manager",
      "targetURL": "localhost:9200",
      "args": {
        "clientOptions": {"timeout": 200}
      }
    }
  ]

This results in the Elastic client in elastic_doc_manager.py being created as:

Elasticsearch(hosts=["localhost:9200"], timeout=200)

Connecting to Multiple Elasticsearch Hosts

New in Elastic Doc Managers 0.3.0, support for connecting to multiple Elasticsearch hosts:

  "docManagers": [
    {
      "docManager": "elastic2_doc_manager",
      "targetURL": ["host1:9200", "host2:9200"],
      "args": {
        "clientOptions": {"timeout": 200}
      }
    }
  ]

This results in the Elastic client in elastic2_doc_manager.py being created as:

Elasticsearch(hosts=["host1:9200", "host2:9200"], timeout=200)

To configure the client to connect to "host1:9200" and "host2:9200" and then sniff new hosts:

  "docManagers": [
    {
      "docManager": "elastic2_doc_manager",
      "targetURL": ["host1:9200", "host2:9200"],
      "args": {
        "clientOptions": {
           "sniff_on_start": true,
           "sniff_on_connection_fail": true,
           "sniffer_timeout": 60
        }
      }
    }
  ]

Signing Requests to AWS Elasticsearch Service

You must install the Elastic DocManager with extra dependencies to use this feature:

pip install mongo-connector[elastic-aws]

Or, if you are using the Elastic2 DocManager:

pip install mongo-connector[elastic2-aws]

Both Elasticsearch 1.5.2 and 2.3 versions that, at the time of this writing, are provided by Amazon Elasticsearch Service through AWS are suported by elastic-doc-manager (for version 1.x) and elastic2-doc-manager (for 2.x), respectively. The args option for each doc manager accepts an aws object with a required region_name key and, optionally, an aws_access_key_id and aws_secret_access_key and/or profile_name corresponding to your AWS credentials. More specifically, a boto3.session.Session is created with the keyword arguments from the aws object.

Note: If aws_access_key_id and aws_secret_access_key are unspecified, the credentials found in the ~/.aws/credentials file (if any) for the user running mongo-connector will be used instead, which may be the behavior that you want, particularly if you've already used AWS CLI.

Example AWS Usage

The following is an example of using AWS Elasticsearch service running Elasticsearch 1.5.2 using elastic-doc-manager. The same args apply to both elastic-doc-manager and elastic2-doc-manager.

...
  "docManagers": [
    {
      "docManager": "elastic_doc_manager",
      "targetURL": "https://search-my-domain-29tg824978g24924t42.us-east-1.es.amazonaws.com/",
      "args": {
        "aws": {
            "region_name": "us-east-1",
            "aws_access_key_id": "ACCESS_ID",
            "aws_secret_access_key": "SECRET_KEY"
        }
      }
    }
  ]
  ...

As stated above, the aws_access_key_id and aws_secret_access_key arguments are optional if you are using credentials stored in ~/.aws/credentials for the user running the mongo-connector instance.

Common Issues

OperationFailed: TransportError(404, u'index_not_found_exception')...

This error will occur if mongo-connector makes a request to an Elastic index that doesn't exist. If action.auto_create_index is true in your elasticsearch.yml, then mongo-connector can create indexes automatically for you. However, you can also create all the indexes ahead of time yourself. You'll need to create:

  • A mongodb_meta index (used internally by mongo-connector)
  • One index for each database being copied by mongo-connector

Do this before running mongo-connector to prevent getting a TransportError(404, u'index_not_found_exception')

Clone this wiki locally