-
Notifications
You must be signed in to change notification settings - Fork 479
Configuration Options
You can use a custom configuration file to specify some options to mongo-connector.
This page details all the options that can be specified in Mongo Connector's configuration file. You can also look at an example. Taking a look at the tests also might be helpful to understand configuration options.
Mongo Connector uses JSON as the format for its configuration file. We'll use MongoDB "dot-notation" for the configuration option names themselves. For example, we'll use the name authentication.password
to mean:
{"authentication": {"password": XXX}}
Please note that any option that starts with __ will be ignored. For example,
"namespaces": {
"__include": ["test.talks"]
},
Will have the __include option ignored.
You can tell mongo-connector
what configuration file to use via the -c
option (this will also be shown with --help
). To invoke mongo-connector with a configuration file option, run:
mongo-connector -c config.json
(presuming your configuration file is called config.json and it is on the same directory that you are invoking mongo-connector)
Although JSON itself doesn't provide a syntax for comments, Mongo Connector allows its JSON configuration file to have comments, which are defined as any key in an object that is prefixed by 2 underscores (_
). For example:
{
"__comment": "this is a comment"
}
Command-line equivalent: -m
, --main
Default: localhost:27017
The address of the replica set or sharded cluster from which to replicate. This may be any MongoDB connection string.
Command-line equivalent: -o
, --oplog-ts
Default: oplog.timestamp
The path to the oplog progress file. Note: backslashes must be escaped, eg "C:\\path\\to\\oplog.timestamp"
.
Command-line equivalent: --no-dump
Default: false
Do not dump collections from MongoDB to the remote system prior to tailing the MongoDB oplog. With this option, mongo-connector starts tailing the oplog from the oldest entry in the oplog.
Command-line equivalent: --batch-size
Default: -1
Number of records processed from the oplog before updating the timestamp file.
Command-line equivalent: -v
, --verbose
Default: 1 (only output warnings and errors)
The verbosity of Mongo Connector. Note that the command-line option only turns on/off debug-level logging. In the config file, verbosity
may be set according to the following table:
Verbosity | Log Level |
---|---|
0 | ERROR |
1 | WARNING |
2 | INFO |
3 | DEBUG |
Command-line equivalent: --continue-on-error
Default: false
Whether to continue tailing the oplog after an error occurred while dumping a collection. This doesn't affect the connector's behavior while already tailing the oplog.
Command-line equivalent: -i
, --fields
Default: all fields
Comma-separated list of fields to read from MongoDB documents. This option can be used to select just a few fields out of every document. Note that the _id
field, and the ns
and _ts
fields for Solr, will always be included. This option is mutually exclusive with the exclude_fields
option.
Command-line equivalent: -e
, --exclude_fields
Default: empty
Comma-separated list of fields to exclude from MongoDB documents. This option can be used to select just a few fields out of every document. Note that the _id
field, and the ns
and _ts
fields for Solr, will always be included. This option is mutually exclusive with the fields
option.
Command-line equivalent: --tz-aware
Default: false
Whether Dates read from MongoDB should be timezone-aware.
Command-line equivalents: --logfile
, -s
, --enable-syslog
Default: file
Where to direct Mongo Connector logs. This may be one of "file", "syslog", or "stream".
Command-line equivalent: --logfile
Default: mongo-connector.log
The path to Mongo Connector's log file. This option only applies if logging.type
is "file". Note: backslashes must be escaped, eg "C:\\path\\to\\mongo-connector.log"
.
Command-line equivalent: --logfile-when
Default: midnight
The type of period defining when Mongo Connector should rotate its log file. This must be one of:
- S (second)
- M (minute)
- H (hour)
- D (day)
- W0 - W6 (days of the week, numbered 0 - 6)
- midnight
For more details, see the Python documentation for TimedRotatingFileHandler
This option only applies if logging.type
is "file".
Command-line equivalent: --logfile-interval
Default: 1
How frequently the log file should be rotated. Specifically, how many units of logging.rotationWhen
should occur before rotation. This option cannot be used if logging.rotationWhen
is any of W0 - W6.
For more details, see the Python documentation for TimedRotatingFileHandler
This option only applies if logging.type
is "file".
Command-line equivalent: --logfile-backups
Default: 7
How many rotated log files to keep around.
This option only applies if logging.type
is "file".
Command-line equivalent: --syslog-host
Default: localhost:512
Address of the syslog. This can include a host and port like "localhost:512" or, on Unix/Linux, be a Unix domain socket such as "/dev/log".
This option only applies if logging.type
is "syslog".
Command-line equivalent: --syslog-facility
Default: user
The syslog facility to use.
This option only applies if logging.type
is "syslog".
Command-line equivalent: -a
, --admin-username
Default: (no default)
The username that Mongo Connector should use to log into MongoDB.
Command-line equivalent: -p
, --password
Default: (no default)
The password for authentication.adminUsername
. This option cannot be used with authentication.passwordFile
.
Command-line equivalent: -f
, --password-file
Default: (no default)
A path to a file that contains the password for authentication.adminUsername
. This option cannot be used with authentication.password
.
Command-line equivalent: --ssl-certfile
Default: (no default)
A path to the SSL certificate that Mongo Connector should use to identify the local connection to MongoDB.
Command-line equivalent: --ssl-keyfile
Default: (no default)
A path to the private key for ssl.sslCertfile
. This option isn't necessary if ssl.sslCertfile
already has the private key included.
Command-line equivalent: --ssl-certificate-policy
Default: ignored
Policy for validating SSL certificates provided from the other end of the connection (i.e., to MongoDB). Must be one of:
- required - Require and validate the remote certificate.
- optional - The same as required, unless the server was configured to use anonymous ciphers.
- ignored - Remote SSL certificates are ignored completely.
Default: Include all namespaces except system and GridFS collections.
NEW in 2.5.0: The namespaces
configuration option is used to control
how and which MongoDB
namespaces
are replicated. By default, Mongo Connector will replicate all namespaces except
for system and GridFS collections. Namespaces should be given as
database_name.collection_name
. Each namespace may contain a single wildcard
(*
) which matches any characters. For example, db_*.foo
matches
db_bar.foo
and db_a.foo
.
Command-line equivalent: -x
, --exclude-namespace-set
To prevent replication of a set of namespaces, add "db.collection": false
to the "namespaces"
config object.
Example:
{
"namespaces": {
"db.excluded_collection": false,
"excluded_database.*": false,
"*.exclude_collection_from_every_database": false,
}
}
Command line: -x 'db.excluded_collection,excluded_database.*,*.exclude_collection_from_every_database'
Command-line equivalent: -n
, --namespace-set
To replicate only a specific set of namespaces, add "db.collection": true
,
"db.collection": "db.collection"
, or "db.collection": {}
to the "namespaces"
config object. Included namespaces support additional options such as renaming,
GridFS, and filtering fields in documents.
Config file usage:
{
"namespaces": {
"db.included_collection1": true,
"db.included_collection2": {},
"included_wildcard_db.*": true
}
}
Command line usage: -n 'db.included_collection1,db.included_collection2,included_wildcard_db.*'
To rename a namespace, add "db.collection": "db.new_collection"
or
"db.collection": {"rename": "db.new_collection"}
. By default, no renaming
will occur. Renaming works with wildcard (*
) namespaces with the following
limitation: if the source namespace contains a wildcard in the collection name,
then the destination must also contain a wildcard in the collection name.
The same is true for a wildcard in a database name.
Renamed namespaces can also specify fields to include or exclude.
Note: mongo-connector 2.5.0 does not support renaming GridFS collections.
Config file usage:
{
"namespaces": {
"renamed_database.collection1": "new_database.new_collection1",
"renamed_database.collection2": {
"rename": "new_database.new_collection2"
},
"renamed_wildcard_db.*": {
"rename": "new_database_name.*"
}
}
}
Note: when replicating to Elasticsearch, the MongoDB database name, which will become the Elasticsearch index name, is always made lowercase.
Command-line equivalent: --gridfs-set
GridFS collections are not replicated by default. To include a GridFS
collection, add "gridfs": true
to the options for that namespace. For
example, if GridFS metadata is stored in the test.fs.files
collection,
and chunks are stored in the test.fs.chunks
collection, add
"test.fs": {"gridfs": true}
. To include all GridFS collections in the
test
database, add "test.*": {"gridfs": true}
.
Config file usage:
{
"namespaces": {
"gridfs_db.collection": {"gridfs": true},
"gridfs_wildcard_db.*": {"gridfs": true}
}
}
Command line usage: --gridfs-set 'gridfs_db.collection,gridfs_wildcard_db.*'
Command-line equivalent: None
By default, all fields in all documents are replicated in each included
namespace. The "includeFields"
and "excludeFields"
can be used to limit
the fields per namespace. To include only a specific set of fields in a
namespace, add "includeFields": <list of fields to include>
to the options.
To exclude only a specific set of fields in a namespace, add
"excludeFields": <list of fields to exclude>
to the options.
Note: the _id
field will always be included. It is not possible to both
include and exclude fields on the same namespace.
Note: mongo-connector does not support filtering fields inside arrays, you can only include or exclude the entire array field.
Config file usage:
{
"namespaces": {
"db.included_collection": true,
"db.filtered_collection1": {
"includeFields": ["included_field", "included.nested.field"]
},
"db.filtered_collection2": {
"excludeFields": ["excluded_field", "excluded.nested.field"]
},
"filtered_database.*": {
"includeFields": ["included_field", "included.nested.field"]
},
"filtered_renamed_database.*": {
"rename": "new_filtered_database.*",
"includeFields": ["included_field", "included.nested.field"]
}
}
}
Command-line equivalent: -n
, --namespace-set
Default: all namespaces
DEPREPCATED in 2.5.0: List of collections to read from MongoDB.
Collection names should be given as database_name.collection_name
.
By default, Mongo Connector will replicate all namespaces except for
system and GridFS collections.
Usage Examples: -n test.test,alpha.bar,db_1.foo
on the command line
or ["test.test", "alpha.bar", "db_1.foo"]
in a config file.
Command-line equivalent: -x
, --exclude-namespace-set
Default: no namespaces
DEPREPCATED in 2.5.0: List of collections to not read from MongoDB.
Collection names should be given as database_name.collection_name
.
By default, Mongo Connector will not exclude any name.
Usage Examples: -x test.test,alpha.bar,db_1.foo
on the command line
or ["test.test", "alpha.bar", "db_1.foo"]
in a config file.
Command-line equivalent: -g
, --dest-namespace-set
Default: no mapping
DEPREPCATED in 2.5.0: Comma-separated list of new names to use
for each collection. Each namespace provided in namespaces.include
will be renamed respectively at the destination according to this list.
This option may only be used with namespaces.include
, and both
options must include the same number of names. By default, no renaming
will occur. For example:
{
"namespaces": {
"include": ["company.employees"],
"mapping": {
"company.employees": "company.new_employees"
}
}
}
Command line usage: -n company.employees -g company.new_employees
The company.employees
collection from MongoDB, will be renamed
and sent to the target system as company.new_employees
instead.
Note that when replicating to Elasticsearch, the MongoDB database name, which will become the Elasticsearch index name, is always made lowercase.
Command-line equivalent: --gridfs-set
Default: empty
DEPREPCATED in 2.5.0: Comma-separated list of GridFS root
collections. For example, if GridFS metadata is stored in the
test.fs.files
collection, and chunks are stored in the test.fs.chunks
collection, pass test.fs
as the namespace.
Mongo Connector may use more than one DocManager at a time to support replicating to more than one location simultaneously. An array of DocManagers should be provided, even if that array only contains one DocManager configuration. Here we use <index>
in the configuration key name to mean "at any index within the array". For example, docManagers.0.docManager
means:
{"docManagers": [{"docManager": XXX}]}
Command-line equivalent: -d
, --doc-manager
Default: doc_manager_simulator
Module name of the DocManager to use. Included in Mongo Connector are mongo_doc_manager
, solr_doc_manager
, and doc_manager_simulator
. To write your own DocManager, see Writing Your Own DocManager.
The elastic_doc_manager
is included in mongo-connector versions < 2.3, and only supports Elastic 1.x. For mongo-connector versions >= 2.3, doc managers for Elastic 1.x and 2.x are available as plugins.
Elastic 1.x doc manager: https://github.com/mongodb-labs/elastic-doc-manager
Elastic 2.x doc manager: https://github.com/mongodb-labs/elastic2-doc-manager
Command-line equivalent: -t
, --target-url
Default: (no default)
URL to pass to the DocManager. For example, this should point to the base REST endpoint for a Solr core, or should be a MongoDB connection string, or the base REST endpoint for Elasticsearch.
Command-line equivalent: -u
, --unique-key
_Default: id
What to call the _id
field from the MongoDB document in the target system. This is useful for certain systems that call their primary key something else (e.g., Solr uses id
instead) or when the primary key field is configurable (e.g., Elasticsearch's _id path mapping).
Command-line equivalent: --auto-commit-interval
Default: no auto commit
Interval in seconds between when the DocManager forces the end system to flush changes. This doesn't apply to every system.
Command-line equivalent: (none)
Default: 1000
The number of documents that are sent in a single batch to the remote system.
Command-line equivalent: (none)
Default: (no default)
Any arbitrary keyword arguments to pass to the constructor of the DocManager. What arguments can be passed should be documented by the author of the DocManager.