tap-mongodb
is a Singer tap for extracting data from a MongoDB
or AWS DocumentDB database. The tap supports extracting records from the database
directly (incremental replication mode, the default) and also supports extracting change events from the database's
Change Stream API (in log-based replication mode).
Built with the Meltano Tap SDK for Singer Taps.
Install from GitHub:
pipx install git+https://github.com/MeltanoLabs/tap-mongodb.git@main
Setting | Type | Required | Default | Description |
---|---|---|---|---|
database | string | True | - | Database from which records will be extracted. |
mongodb_connection_string | password | False | - | MongoDB connection string. See the MongoDB documentation for specification. The username and password included in this string must be url-encoded - the tap will not url-encode it. |
documentdb_credential_json_string | password | False | - | JSON string with keys 'username', 'password', 'engine', 'host', 'port', 'dbClusterIdentifier' or 'dbName', 'ssl'. See example and strucure in the AWS documentation here. The password from this JSON object will be url-encoded by the tap before opening the database connection. The intent of this setting is to enable management of an AWS DocumentDB database credential via AWS SecretsManager |
documentdb_credential_json_extra_options | string | False | - | JSON string containing key-value pairs which will be added to the connection string options when using documentdb_credential_json_string. For example, when set to the string {"tls":"true","tlsCAFile":"my-ca-bundle.pem"} , the options tls=true&tlsCAFile=my-ca-bundle.pem will be passed to the MongoClient. |
datetime_conversion | string | False | datetime | Parameter passed to MongoClient 'datetime_conversion' parameter. See documentation at https://pymongo.readthedocs.io/en/stable/examples/datetimes.html#handling-out-of-range-datetimes for details. The default value is 'datetime', which will throw a bson.errors.InvalidBson error if a document contains a date outside the range of datetime.MINYEAR (year 1) to datetime.MAXYEAR (9999). |
prefix | string | False | '' | An optional prefix which will be added to the name of each stream. |
start_date | date_iso8601 | False | 1970-01-01 | Start date - used for incremental replication only. In log-based replication mode, this setting is ignored. |
add_record_metadata | boolean | False | False | When true, _sdc metadata fields will be added to records produced by the tap. |
allow_modify_change_streams | boolean | False | False | In AWS DocumentDB (unlike MongoDB), change streams must be enabled specifically (see the documentation here ). If attempting to open a change stream against a collection on which change streams have not been enabled, an OperationFailure error will be raised. If this property is set to True, when this error is seen, the tap will execute an admin command to enable change streams and then retry the read operation. Note: this may incur new costs in AWS DocumentDB. |
operation_types | list(string) | False | create,delete,insert,replace,update | List of MongoDB change stream operation types to include in tap output. The default behavior is to limit to document-level operation types. See full list of operation types in the MongoDB documentation. Note that the list of allowed_values for this property includes some values not available to all MongoDB versions. |
This Singer tap will automatically import any environment variables within the working directory's
.env
if the --config=ENV
is provided, such that config values will be considered if a matching
environment variable is set either in the terminal context or in the .env
file.
In order to run tap-mongodb
in incremental replication mode, the credential used must have read privileges to the
collections from which you wish to extract records. If your credential has the readAnyDatabase@admin
permission, for
example, or read@test_database
(where test_database
is the database
setting in the tap's configuration), that
should be sufficient.
Collection-level read permissions are untested but are expected to work as well:
privileges: [
{resource: {db: "test_database", collection: "TestOrders"}, actions: ["find"]}
]
The above collection-level read permission should allow the tap to extract from the test_database.TestOrders
collection in incremental replication mode.
In order to run tap-mongodb
in log-based replication mode, which extracts records via the database's Change Streams
API, MongoDB and AWS DocumentDB have different requirements around permissions.
In MongoDB, the credential must have both find
and changeStreams
permissions on a database collection in order to
use tap-mongodb
in log-based replication mode. The readAnyDatabase@admin
built-in role provides this for all
databases, while read@test_database
will provide the necessary access for all collections in the test_database
database.
You can easily run tap-mongodb
by itself or in a pipeline using Meltano.
tap-mongodb --version
tap-mongodb --help
tap-mongodb --config CONFIG --discover > ./catalog.json
Follow these instructions to contribute to this project.
pipx install poetry
poetry install
Create tests within the tap_mongodb/tests
subfolder and then run:
poetry run pytest
You can also test the tap-mongodb
CLI interface directly using poetry run
:
poetry run tap-mongodb --help
Testing with Meltano
Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.
Next, install Meltano (if you haven't already) and any needed plugins:
# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-mongodb
meltano install
Now you can test and orchestrate using Meltano:
# Test invocation:
meltano invoke tap-mongodb --version
# OR run a test `elt` pipeline:
meltano run tap-mongodb target-jsonl
See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.