Table of Contents:
- Bots Documentation
- General remarks
- Initialization parameters
- Common parameters
- Collectors
- API
- Generic URL Fetcher
- Generic URL Stream Fetcher
- Generic Mail URL Fetcher
- Generic Mail Attachment Fetcher
- Generic Mail Body Fetcher
- GitHub API
- Fileinput
- MISP Generic
- Request Tracker
- Rsync
- Shodan Stream
- TCP
- XMPP collector
- Alien Vault OTX
- Blueliv Crimeserver
- Calidog Certstream
- McAfee openDXL
- Microsoft Azure
- Microsoft Interflow
- Stomp
- Parsers
- Experts
- Abusix
- ASN Lookup
- CSV Converter
- Copy Extra
- Cymru Whois
- Deduplicator
- Domain Suffix
- DO-Portal
- Field Reducer Bot
- Filter
- Format Field
- Generic DB Lookup
- Gethostbyname
- IDEA
- MaxMind GeoIP
- MISP
- Modify
- McAfee Active Response Hash lookup
- McAfee Active Response IP lookup
- McAfee Active Response URL lookup
- National CERT contact lookup by CERT.AT
- Recorded Future IP Risk
- Reverse DNS
- RFC1918
- RipeNCC Abuse Contact
- Sieve
- Taxonomy
- Tor Nodes
- Url2FQDN
- Wait
- Outputs
By default all of the bots are started when you start the whole botnet, however there is a possibility to
disable a bot. This means that the bot will not start every time you start the botnet, but you can start
and stop the bot if you specify the bot explicitly. To disable a bot, add the following to your
runtime.conf
: "enabled": false
. Be aware that this is not a normal parameter (like the others
described in this file). It is set outside of the parameters
object in runtime.conf
. Check the
User-Guide for an example.
There are two different types of parameters: The initialization parameters are need to start the bot. The runtime parameters are needed by the bot itself during runtime.
The initialization parameters are in the first level, the runtime parameters live in the parameters
sub-dictionary:
{
"bot-id": {
"parameters": {
runtime parameters...
},
initialization parameters...
}
}
For example:
{
"abusech-feodo-domains-collector": {
"parameters": {
"provider": "Abuse.ch",
"name": "Abuse.ch Feodo Domains",
"http_url": "http://example.org/feodo-domains.txt"
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "collect report messages from remote hosts using http protocol",
"enabled": true,
"run_mode": "scheduled"
}
}
This configuration resides in the file runtime.conf
in your IntelMQ's configuration directory for each configured bot.
name
anddescription
: The name and description of the bot as can be found in BOTS-file, not used by the bot itself.group
: Can be"Collector"
,"Parser"
,"Expert"
or"Output"
. Only used for visualization by other tools.module
: The executable (should be in$PATH
) which will be started.enabled
: If the parameter is set totrue
(which is NOT the default value if it is missing as a protection) the bot will start when the botnet is started (intelmqctl start
). If the parameter was set tofalse
, the Bot will not be started byintelmqctl start
, however you can run the bot independently usingintelmqctl start <bot_id>
. Check the User-Guide for more details.run_mode
: There are two run modes, "continuous" (default run mode) or "scheduled". In the first case, the bot will be running forever until stopped or exits because of errors (depending on configuration). In the latter case, the bot will stop after one successful run. This is especially useful when scheduling bots via cron or systemd. Default iscontinuous
. Check the User-Guide for more details.
Feed parameters: Common configuration options for all collectors.
name
: Name for the feed (feed.name
). In IntelMQ versions smaller than 2.2 the parameter namefeed
is also supported.accuracy
: Accuracy for the data of the feed (feed.accuracy
).code
: Code for the feed (feed.code
).documentation
: Link to documentation for the feed (feed.documentation
).provider
: Name of the provider of the feed (feed.provider
).rate_limit
: time interval (in seconds) between fetching data if applicable.
HTTP parameters: Common URL fetching parameters used in multiple bots.
http_timeout_sec
: A tuple of floats or only one float describing the timeout of the HTTP connection. Can be a tuple of two floats (read and connect timeout) or just one float (applies for both timeouts). The default is 30 seconds in default.conf, if not given no timeout is used. See also https://requests.readthedocs.io/en/master/user/advanced/#timeoutshttp_timeout_max_tries
: An integer depicting how often a connection is retried, when a timeout occurred. Defaults to 3 in default.conf.http_username
: username for basic authentication.http_password
: password for basic authentication.http_proxy
: proxy to use for HTTPhttps_proxy
: proxy to use for HTTPShttp_user_agent
: user agent to use for the request.http_verify_cert
: path to trusted CA bundle or directory,false
to ignore verifying SSL certificates, ortrue
(default) to verify SSL certificatesssl_client_certificate
: SSL client certificate to use.ssl_ca_certificate
: Optional string of path to trusted CA certificate. Only used by some bots.http_header
: HTTP request headers
Cache parameters: Common Redis cache parameters used in multiple bots (mainly lookup experts):
redis_cache_host
: Hostname of the Redis database.redis_cache_port
: Port of the Redis database.redis_cache_db
: Database number.redis_cache_ttl
: TTL used for caching.redis_cache_password
: Optional password for the Redis database (default: none).
Multihreading is disabled for all Collectors, as this would lead to duplicated data.
Requires the pika
python library, minimum version 1.0.0.
name
: intelmq.bots.collectors.amqp.collector_amqplookup
: yespublic
: yescache (redis db)
: nonedescription
: collect data from (remote) AMQP servers, for both IntelMQ as well as external data
- Feed parameters (see above)
connection_attempts
: The number of connection attempts to defined server, defaults to 3connection_heartbeat
: Heartbeat to server, in seconds, defaults to 3600connection_host
: Name/IP for the AMQP server, defaults to 127.0.0.1connection_port
: Port for the AMQP server, defaults to 5672connection_vhost
: Virtual host to connect, on an HTTP(S) connection would be http:/IP/expect_intelmq_message
: Boolean, if the data is from IntelMQ or not. Default:false
. If true, then the data can be any Report or Event and will be passed to the next bot as is. Otherwise a new report is created with the raw data.password
: Password for authentication on your AMQP serverqueue_name
: The name of the queue to fetch data fromusername
: Username for authentication on your AMQP serveruse_ssl
: Use ssl for the connection, make sure to also set the correct port, usually 5671 (true
/false
)
Currently only fetching from a queue is supported can be extended in the future. Messages will be acknowledge at AMQP after it is sent to the pipeline.
name:
intelmq.bots.collectors.api.collectorlookup:
yespublic:
yescache (redis db):
nonedescription:
collect report messages from an HTTP REST API
- Feed parameters (see above)
port
: Optional, integer. Default: 5000. The local port, the API will be available at.
The API is available at /intelmq/push
.
The tornado
library is required.
name:
intelmq.bots.collectors.http.collector_httplookup:
yespublic:
yescache (redis db):
nonedescription:
collect report messages from remote hosts using HTTP protocol
- Feed parameters (see above)
- HTTP parameters (see above)
extract_files
: Optional, boolean or list of strings. If it is true, the retrieved (compressed) file or archived will be uncompressed/unpacked and the files are extracted. If the parameter is a list for strings, only the files matching the filenames are extracted. Extraction handles gziped files and both compressed and uncompressed tar-archives as well as zip archives.http_url
: location of information resource (e.g. https://feodotracker.abuse.ch/blocklist/?download=domainblocklist)http_url_formatting
: (bool|JSON
, default:false
) Iftrue
,{time[format]}
will be replaced by the current time in local timezone formatted by the given format. E.g. if the URL ishttp://localhost/{time[%Y]}
, then the resulting URL ishttp://localhost/2019
for the year 2019. (Python's Format Specification Mini-Language is used for this.) You may use aJSON
specifying time-delta parameters to shift the current time accordingly. For example use{"days": -1}
for the yesterday's date; the URLhttp://localhost/{time[%Y-%m-%d]}
will get translated to "http://localhost/2018-12-31" for the 1st Jan of 2019.
Zipped files are automatically extracted if detected.
For extracted files, every extracted file is sent in its own report. Every report has a field named extra.file_name
with the file name in the archive the content was extracted from.
name:
intelmq.bots.collectors.http.collector_http_streamlookup:
yespublic:
yescache (redis db):
nonedescription:
Opens a streaming connection to the URL and sends the received lines.
- Feed parameters (see above)
- HTTP parameters (see above)
strip_lines
: boolean, if single lines should be stripped (removing whitespace from the beginning and the end of the line)
If the stream is interrupted, the connection will be aborted using the timeout parameter. Then, an error will be thrown and rate_limit applies if not null.
The parameter http_timeout_max_tries
is of no use in this collector.
name:
intelmq.bots.collectors.mail.collector_mail_urllookup:
yespublic:
yescache (redis db):
nonedescription:
collect messages from mailboxes, extract URLs from that messages and download the report messages from the URLs.
- Feed parameters (see above)
- HTTP parameters (see above)
mail_host
: FQDN or IP of mail servermail_user
: user account of the email accountmail_password
: password associated with the user accountmail_port
: IMAP server port, optional (default: 143 without SSL, 993 for SSL)mail_ssl
: whether the mail account uses SSL (default:true
)folder
: folder in which to look for mails (default:INBOX
)subject_regex
: regular expression to look for a subjecturl_regex
: regular expression of the feed URL to search for in the mail bodysent_from
: filter messages by sendersent_to
: filter messages by recipientssl_ca_certificate
: Optional string of path to trusted CA certificate. Applies only to IMAP connections, not HTTP. If the provided certificate is not found, the IMAP connection will fail on handshake. By default, no certificate is used.
The resulting reports contains the following special fields:
feed.url
: The URL the data was downloaded fromextra.email_subject
: The subject of the emailextra.email_from
: The email's from addressextra.email_message_id
: The email's message IDextra.file_name
: The file name of the downloaded file (extracted from the HTTP Response Headers if possible).
For line-based inputs the bot can split up large reports into smaller chunks.
This is particularly important for setups that use Redis as a message queue which has a per-message size limitation of 512 MB.
To configure chunking, set chunk_size
to a value in bytes.
chunk_replicate_header
determines whether the header line should be repeated
for each chunk that is passed on to a parser bot.
Specifically, to configure a large file input to work around Redis' size
limitation set chunk_size
to something like 384000000
, i.e., ~384 MB.
name:
intelmq.bots.collectors.mail.collector_mail_attachlookup:
yespublic:
yescache (redis db):
nonedescription:
collect messages from mailboxes, download the report messages from the attachments.
- Feed parameters (see above)
extract_files
: Optional, boolean or list of strings. See documentation of the Generic URL Fetcher for more details.mail_host
: FQDN or IP of mail servermail_user
: user account of the email accountmail_password
: password associated with the user accountmail_port
: IMAP server port, optional (default: 143 without SSL, 993 for SSL)mail_ssl
: whether the mail account uses SSL (default:true
)folder
: folder in which to look for mails (default:INBOX
)subject_regex
: regular expression to look for a subjectattach_regex
: regular expression of the name of the attachmentattach_unzip
: whether to unzip the attachment. Only extracts the first file. Deprecated, useextract_files
instead.sent_from
: filter messages by sendersent_to
: filter messages by recipientssl_ca_certificate
: Optional string of path to trusted CA certificate. Applies only to IMAP connections, not HTTP. If the provided certificate is not found, the IMAP connection will fail on handshake. By default, no certificate is used.
The resulting reports contains the following special fields:
extra.email_subject
: The subject of the emailextra.email_from
: The email's from addressextra.email_message_id
: The email's message IDextra.file_name
: The file name of the attachment or the file name in the attached archive if attachment is to uncompress.
name:
intelmq.bots.collectors.mail.collector_mail_bodylookup:
yespublic:
yescache (redis db):
nonedescription:
collect messages from mailboxes, forwards the bodies as reports. Each non-empty body with the matching content type is sent as individual report.
- Feed parameters (see above)
mail_host
: FQDN or IP of mail servermail_user
: user account of the email accountmail_password
: password associated with the user accountmail_port
: IMAP server port, optional (default: 143 without SSL, 993 for SSL)mail_ssl
: whether the mail account uses SSL (default:true
)folder
: folder in which to look for mails (default:INBOX
)subject_regex
: regular expression to look for a subjectsent_from
: filter messages by sendersent_to
: filter messages by recipientssl_ca_certificate
: Optional string of path to trusted CA certificate. Applies only to IMAP connections, not HTTP. If the provided certificate is not found, the IMAP connection will fail on handshake. By default, no certificate is used.content_types
: Which bodies to use based on the content_type. Default:true
/['html', 'plain']
for all:- string with comma separated values, e.g.
['html', 'plain']
true
,false
,null
: Same as default valuestring
, e.g.'plain'
- string with comma separated values, e.g.
The resulting reports contains the following special fields:
extra.email_subject
: The subject of the emailextra.email_from
: The email's from addressextra.email_message_id
: The email's message ID
name:
intelmq.bots.collectors.github_api.collector_github_contents_apilookup:
yespublic:
yescache (redis db):
nonedescription:
Collects files matched by regex from GitHub repository via the GitHub API. Optionally with GitHub credentials, which are used as the Basic HTTP authentication.
- Feed parameters (see above)
basic_auth_username:
GitHub account username (optional)basic_auth_password:
GitHub account password (optional)repository:
GitHub target repository (<USER>/<REPOSITORY>
)regex:
Valid regex of target files within the repository (defaults to.*.json
)extra_fields:
Comma-separated list of extra fields from GitHub contents API
The optional authentication parameters provide a high limit of the GitHub API requests. With the git hub user authentication, the requests are rate limited to 5000 per hour, otherwise to 60 requests per hour.
The collector recursively searches for regex
-defined files in the provided repository
.
Additionally it adds extra file metadata defined by the extra_fields
.
The bot always sets the url, from which downloaded the file, as feed.url
.
name:
intelmq.bots.collectors.file.collector_filelookup:
yespublic:
yescache (redis db):
nonedescription:
This bot is capable of reading files from the local file-system. This is handy for testing purposes, or when you need to react to spontaneous events. In combination with the Generic CSV Parser this should work great.
- Feed parameters (see above)
path
: path to filepostfix
: The postfix (file ending) of the files to look for. For example.csv
.delete_file
: whether to delete the file after reading (default:false
)
The resulting reports contains the following special fields:
feed.url
: The URI using thefile://
scheme and localhost, with the full path to the processed file.extra.file_name
: The file name (without path) of the processed file.
Additionally, for line-based inputs the bot can split up large reports into smaller chunks.
This is particularly important for setups that use Redis as a message queue which has a per-message size limitation of 512 MB.
To configure chunking, set chunk_size
to a value in bytes.
chunk_replicate_header
determines whether the header line should be repeated
for each chunk that is passed on to a parser bot.
Specifically, to configure a large file input to work around Redis' size
limitation set chunk_size
to something like 384000
, i.e., ~384 MB.
The bot loops over all files in path
and tests if their file name matches
*postfix, e.g. *.csv
. If yes, the file will be read and inserted into the
queue.
If delete_file
is set, the file will be deleted after processing. If deletion
is not possible, the bot will stop.
To prevent data loss, the bot also stops when no postfix
is set and
delete_file
was set. This cannot be overridden.
The bot always sets the file name as feed.url
Requires the rsync executable
name:
intelmq.bots.collectors.rsync.collector_rsynclookup:
yespublic:
yescache (redis db):
nonedescription:
Bot download file by rsync and then load data from downloaded file. Downloaded file is located invar/lib/bots/rsync_collector.
- Feed parameters (see above)
file
: Name of downloaded file.rsync_path
: Path to file. It can be "/home/username/directory" or "username@remote_host:/home/username/directory"temp_directory
: Path of a temporary state directory to use for rsync'd files. Optional. Default:/opt/intelmq/var/run/rsync_collector/
.
name:
intelmq.bots.collectors.misp.collectorlookup:
yespublic:
yescache (redis db):
nonedescription:
collect messages from MISP, a malware information sharing platform server.
- Feed parameters (see above)
misp_url
: URL of MISP server (with trailing '/')misp_key
: MISP Authkeymisp_tag_to_process
: MISP tag for events to be processedmisp_tag_processed
: MISP tag for processed events, optional
Generic parameters used in this bot:
http_verify_cert
: Verify the TLS certificate of the server, boolean (default:true
)
This collector will search for events on a MISP server that have a
to_process
tag attached to them (see the misp_tag_to_process
parameter)
and collect them for processing by IntelMQ. Once the MISP event has been
processed the to_process
tag is removed from the MISP event and a
processed
tag is then attached (see the misp_tag_processed
parameter).
NB. The MISP tags must be configured to be 'exportable' otherwise they will not be retrieved by the collector.
name:
intelmq.bots.collectors.rt.collector_rtlookup:
yespublic:
yescache (redis db):
nonedescription:
Request Tracker Collector fetches attachments from an RTIR instance.
You need the rt-library >= 1.9 from nic.cz, available via pypi: pip3 install rt
This rt bot will connect to RT and inspect the given search_queue
for tickets matching all criteria in search_*
,
Any matches will be inspected. For each match, all (RT-) attachments of the matching RT tickets are iterated over and within this loop, the first matching filename in the attachment is processed.
If none of the filename matches apply, the contents of the first (RT-) "history" item is matched against the URL-regex.
- Feed parameters (see above)
- HTTP parameters (see above)
extract_attachment
: Optional, boolean or list of strings. See documentation of the Generic URL Fetcher parameterextract_files
for more details.extract_download
: Optional, boolean or list of strings. See documentation of the Generic URL Fetcher parameterextract_files
for more details.uri
: URL of the REST interface of the RTuser
: RT usernamepassword
: RT passwordsearch_not_older_than
: Absolute time (use ISO format) or relative time, e.g.3 days
.search_owner
: owner of the ticket to search for (default:nobody
)search_queue
: queue of the ticket to search for (default:Incident Reports
)search_status
: status of the ticket to search for (default:new
)search_subject_like
: part of the subject of the ticket to search for (default:Report
)set_status
: status to set the ticket to after processing (default:open
).false
ornull
to not set a different status.take_ticket
: whether to take the ticket (default:true
)url_regex
: regular expression of an URL to search for in the ticketattachment_regex
: regular expression of an attachment in the ticketunzip_attachment
: whether to unzip a found attachment. Only the first file in the archive is used. Deprecated in favor ofextract_attachment
.
The parameter http_timeout_max_tries
is of no use in this collector.
The resulting reports contains the following special fields:
rtir_id
: The ticket IDextra.email_subject
andextra.ticket_subject
: The subject of the ticketextra.email_from
andextra.ticket_requestors
: Comma separated list of requestor's email addresses.extra.ticket_owner
: The ticket's owner nameextra.ticket_status
: The ticket's statusextra.ticket_queue
: The ticket's queueextra.file_name
: The name of the extracted file, the name of the downloaded file or the attachments' filename without.gz
postfix.time.observation
: The creation time of the ticket or attachment.
The parameters prefixed with search_
allow configuring the ticket search.
Empty strings and null
as value for search parameters are ignored.
Attachments can be optionally unzipped, remote files are downloaded with the http_*
settings applied (see defaults.conf
).
If url_regex
or attachment_regex
are empty strings, false or null, they are ignored.
Optionally, the RT bot can "take" RT tickets (i.e. the user
is assigned this ticket now) and/or the status can be changed (leave set_status
empty in case you don't want to change the status). Please note however that you MUST do one of the following: either "take" the ticket or set the status (set_status
). Otherwise, the search will find the ticket every time and we will have generated an endless loop.
In case a resource needs to be fetched and this resource is permanently not available (status code is 4xx), the ticket status will be set according to the configuration to avoid processing the ticket over and over. For temporary failures the status is not modified, instead the ticket will be skipped in this run.
To find only tickets newer than a given absolute or relative time, you can use the search_not_older_than
parameter. Absolute time specification can be anything parseable by dateutil, best use a ISO format.
Relative must be in this format: [number] [timespan]s
, e.g. 3 days
. timespan
can be hour, day, week, month, year. Trailing 's' is supported for all timespans. Relative times are subtracted from the current time directly before the search is performed.
name:
intelmq.bots.collectors.rsync.collector_rsynclookup:
yespublic:
yescache (redis db):
nonedescription:
Syncs a file via rsync and reads the file.
- Feed parameters (see above)
file
: The filename to process, combine withrsync_path
.temp_directory
: The temporary directory for rsync, by default$VAR_STATE_PATH/rsync_collector
.$VAR_STATE_PATH
is/var/run/intelmq/
or/opt/intelmq/var/run/
.rsync_path
: The path of the file to process
Requires the shodan library to be installed:
name:
intelmq.bots.collectors.shodan.collector_streamlookup:
yespublic:
yescache (redis db):
nonedescription:
Queries the Shodan Streaming API
- Feed parameters (see above)
- HTTP parameters (see above). Only the proxy is used (requires
shodan-python > 1.8.1
). Certificate is always verified. countries
: A list of countries to query for. If it is a string, it will be spit by,
.
name:
intelmq.bots.collectors.tcp.collectorlookup:
nopublic:
yescache (redis db):
nonedescription:
TCP is the bot responsible to receive events on a TCP port (ex: from TCP Output of another IntelMQ instance). Might not be working on Python3.4.6.
ip
: IP of destination serverport
: port of destination server
name:
intelmq.bots.collectors.xmpp.collectorlookup:
yespublic:
yescache (redis db):
nonedescription:
This bot can connect to an XMPP Server and one room, in order to receive reports from it. TLS is used by default. rate_limit is ineffective here. Bot can either pass the body or the whole event.
The Sleekxmpp - Library needs to be installed on your System
pip3 install -r intelmq/bots/collectors/xmpp/REQUIREMENTS.txt
- Feed parameters (see above)
xmpp_server
: The domain name of the server of the XMPP-Account (part after the @ sign)xmpp_user
: The username of the XMPP-Account the collector shall use (part before the @ sign)xmpp_password
: The password of the XMPP-Accountxmpp_room
: The room which has to be joined by the XMPP-Collector (full address [email protected])xmpp_room_nick
: The username / nickname the collector shall use within the roomxmpp_room_password
: The password which might be required to join a room
use_muc
: If this parameter istrue
, the bot will join the roomxmpp_room
.xmpp_userlist
: An array of usernames whose messages will (not) be processed.xmpp_whitelist_mode
: Iftrue
the list provided inxmpp_userlist
is a whitelist. Else it is a blacklist. In case of a whitelist, only messages from the configured users will be processed, else their messages are not processed. Default isfalse
/ blacklist.
ca_certs
: A path to a file containing the CA's which should be used (default:/etc/ssl/certs/ca-certificates.crt
)strip_message
: Iftrue
trailing white space will be removed from the message. Does not happen ifpass_full_xml
is set totrue
(default:true
)pass_full_xml
: If this parameter is set totrue
the collector will read the full-xmpp-xml message and add it to the pipeline. this is useful if other systems like AbuseHelper should be processed. (default:false
)
name:
intelmq.bots.collectors.alienvault_otx.collectorlookup:
yespublic:
yescache (redis db):
nonedescription:
collect report messages from Alien Vault OTX API
Install the library from GitHub, as there is no package in PyPi:
pip3 install -r intelmq/bots/collectors/alienvault_otx/REQUIREMENTS.txt
- Feed parameters (see above)
api_key
: API Keymodified_pulses_only
: get only modified pulses instead of all, set to it to true or false, default falseinterval
: if "modified_pulses_only" is set, define the time in hours (integer value) to get modified pulse since then, default 24 hours
name:
intelmq.bots.collectors.blueliv.collector_crimeserverlookup:
yespublic:
nocache (redis db):
nonedescription:
collect report messages from Blueliv API
For more information visit https://github.com/Blueliv/api-python-sdk
Install the required library:
pip3 install -r intelmq/bots/collectors/blueliv/REQUIREMENTS.txt
- Feed parameters (see above)
api_key
: location of information resource, see https://map.blueliv.com/?redirect=get-started#signupapi_url
: The optional API endpoint, by defaulthttps://freeapi.blueliv.com
.
A Bot to collect data from the Certificate Transparency Log (CTL) This bot works based on certstream library (https://github.com/CaliDog/certstream-python)
name:
intelmq.bots.collectors.calidog.collector_certstreamlookup:
yespublic:
nocache (redis db):
nonedescription:
collect data from Certificate Transparency Log
- Feed parameters (see above)
name:
intelmq.bots.collectors.eset.collectorlookup:
yespublic:
nocache (redis db):
nonedescription:
collect data from ESET ETI TAXII server
For more information visit https://www.eset.com/int/business/services/threat-intelligence/
Install the required cabby
library:
pip3 install -r intelmq/bots/collectors/eset/REQUIREMENTS.txt
- Feed parameters (see above)
username
: Your usernamepassword
: Your passwordendpoint
:eti.eset.com
time_delta
: The time span to look back, in seconds. Default3600
.collection
: The collection to fetch.
name:
intelmq.bots.collectors.opendxl.collectorlookup:
yespublic:
nocache (redis db):
nonedescription:
collect messages via openDXL
- Feed parameters (see above)
dxl_config_file
: location of the configuration file containing required information to connect $dxl_topic
: the name of the DXL topic to subscribe
Iterates over all blobs in all containers in an Azure storage. The Cache is required to memorize which files have already been processed (TTL needs to be high enough to cover the oldest files available!).
This bot significantly changed in a backwards-incompatible way in IntelMQ Version 2.2.0 to support current versions of the Microsoft Azure Python libraries.
name
: intelmq.bots.collectors.microsoft.collector_azurelookup
: yespublic
: nocache (redis db)
: 5description
: collect blobs from Microsoft Azure using their library
- Cache parameters (see above)
- Feed parameters (see above)
connection_string
: connection string as given by Microsoftcontainer_name
: name of the container to connect to
Iterates over all files available by this API. Make sure to limit the files to be downloaded with the parameters, otherwise you will get a lot of data!
The cache is used to remember which files have already been downloaded. Make sure the TTL is high enough, higher than not_older_than
.
name:
intelmq.bots.collectors.microsoft.collector_interflowlookup:
yespublic:
nocache (redis db):
5description:
collect files from Microsoft Interflow using their API
- Feed parameters (see above)
api_key
: API generate in their portalfile_match
: an optional regular expression to match file namesnot_older_than
: an optional relative (minutes) or absolute time (UTC is assumed) expression to determine the oldest time of a file to be downloadedredis_cache_*
and especiallyredis_cache_ttl
: Settings for the cache where file names of downloaded files are saved. The cache's TTL must always be bigger thannot_older_than
.
- Files are automatically ungzipped if the filename ends with
.gz
.
name:
intelmq.bots.collectors.stomp.collectorlookup:
yespublic:
nocache (redis db):
nonedescription:
collect messages from a stomp server
Install the stomp.py
library from PyPI:
pip3 install -r intelmq/bots/collectors/stomp/REQUIREMENTS.txt
- Feed parameters (see above)
exchange
: exchange pointport
: 61614server
: hostname e.g. "n6stream.cert.pl"ssl_ca_certificate
: path to CA filessl_client_certificate
: path to client cert filessl_client_certificate_key
: path to client cert key file
Collects tweets from target_timelines. Up to tweet_count tweets from each user and up to timelimit back in time. The tweet text is sent separately and if allowed, links to pastebin are followed and the text sent in a separate report
name:
intelmq.bots.collectors.twitter.collector_twitterlookup:
yespublic:
yescache (redis db):
nonedescription:
Collects tweets
- Feed parameters (see above)
target_timelines
: screen_names of twitter accounts to be followedtweet_count
: number of tweets to be taken from each accounttimelimit
: maximum age of the tweets collected in secondsfollow_urls
: list of screen_names for which URLs will be followedexclude_replies
: exclude replies of the followed screen_namesinclude_rts
: whether to include retweets by given screen_nameconsumer_key
: Twitter API login dataconsumer_secret
: Twitter API login dataaccess_token_key
: Twitter API login dataaccess_token_secret
: Twitter API login data
name:
intelmq.bots.collectors.api.collector_apilookup:
nopublic:
nocache (redis db):
nonedescription:
Bot for collecting data using API, you need to post JSON to /intelmq/push endpoint
example usage:
curl -X POST http://localhost:5000/intelmq/push -H 'Content-Type: application/json' --data '{"source.ip": "127.0.0.101", "classification.type": "backdoor"}'
- Feed parameters (see above)
port
: 5000
This list is not complete. Look at intelmq/bots/BOTS
or the list of parsers shown in the manager. But most parsers do not need configuration parameters.
TODO
name
:intelmq.bots.parsers.anubisnetworks.parser
lookup
: nopublic
: yescache (redis db)
: nonedescription
: parsers data from AnubisNetworks Cyberfeed Stream
The feed format changes over time. The parser supports at least data from 2016 and 2020.
Events with the Malware "TestSinkholingLoss" are ignored, as they are for the feed provider's internal purpose only and should not be processed at all.
use_malware_familiy_as_classification_identifier
: default:true
. Use themalw.family
field asclassification.type
. Iffalse
, check if the same asmalw.variant
. If it is the same, it is ignored. Otherwise saved asextra.malware.family
.
Lines starting with '#'
will be ignored. Headers won't be interpreted.
-
"columns"
: A list of strings or a string of comma-separated values with field names. The names must match the harmonization's field names. Empty column specifications and columns named"__IGNORE__"
are ignored. E.g."columns": [ "", "source.fqdn", "extra.http_host_header", "__IGNORE__" ],
is equivalent to:
"columns": ",source.fqdn,extra.http_host_header,"
The first and the last column are not used in this example. It is possible to specify multiple columns using the
|
character. E.g."columns": "source.url|source.fqdn|source.ip"
First, bot will try to parse the value as URL, if it fails, it will try to parse it as FQDN, if that fails, it will try to parse it as IP, if that fails, an error will be raised. Some use cases -
- mixed data set, e.g. URL/FQDN/IP/NETMASK `"columns": "source.url|source.fqdn|source.ip|source.network"` - parse a value and ignore if it fails `"columns": "source.url|__IGNORE__"`
-
"column_regex_search"
: Optional. A dictionary mapping field names (as given per the columns parameter) to regular expression. The field is evaluated usingre.search
. Eg. to get the ASN out ofAS1234
use:{"source.asn": "[0-9]*"}
. Make sure to properly escape any backslashes in your regular expression (See also #1579. -
"default_url_protocol"
: For URLs you can give a default protocol which will be pretended to the data. -
"delimiter"
: separation character of the CSV, e.g.","
-
"skip_header"
: Boolean, skip the first line of the file, optional. Lines starting with#
will be skipped additionally, make sure you do not skip more lines than needed! -
time_format
: Optional. If"timestamp"
,"windows_nt"
or"epoch_millis"
the time will be converted first. With the defaultnull
fuzzy time parsing will be used. -
"type"
: set theclassification.type
statically, optional -
"data_type"
: sets the data of specific type, currently only"json"
is supported value. An example```{ "columns": [ "source.ip", "source.url", "extra.tags"], "data_type": "{\"extra.tags\":\"json\"}" }``` It will ensure `extra.tags` is treated as `json`.
-
"filter_text"
: only process the lines containing or not containing specified text, to be used in conjunction withfilter_type
-
"filter_type"
: value can be whitelist or blacklist. Ifwhitelist
, only lines containing the text infilter_text
will be processed, ifblacklist
, only lines NOT containing the text will be processed.To process ipset format files use
{ "filter_text": "ipset add ", "filter_type": "whitelist", "columns": [ "__IGNORE__", "__IGNORE__", "__IGNORE__", "source.ip"] }
-
"type_translation"
: If the source does have a field with information forclassification.type
, but it does not correspond to IntelMQ's types, you can map them to the correct ones. Thetype_translation
field can hold a dictionary, or a string with a JSON dictionary which maps the feed's values to IntelMQ's. Example:{"malware_download": "malware-distribution"}
-
"columns_required"
: A list of true/false for each column. By default, it is true for every column.
name:
intelmq.bots.parsers.calidog.parser_certstreamlookup:
nopublic:
yescache (redis db):
nonedescription:
parsers data from Certificate Transparency Log
For each domain in the leaf_cert.all_domains
object one event with the domain in source.fqdn
(and source.ip
as fallback) is produced.
The seen-date is saved in time.source
and the classification type is other
.
- Feed parameters (see above)
name:
intelmq.bots.parsers.eset.parserlookup:
nopublic:
yescache (redis db):
nonedescription:
Parses data from ESET ETI TAXII server
Supported collections:
- "ei.urls (json)"
- "ei.domains v2 (json)"
name:
intelmq.bots.parsers.cymru.parser_cap_programpublic:
nocache (redis db):
nonedescription:
Parses data from Cymru's CAP program feed.
There are two different feeds available:
infected_$date.txt
("old")$certname_$date.txt
("new")
The new will replace the old at some point in time, currently you need to fetch both. The parser handles both formats.
As little information on the format is available, the mappings might not be correct in all cases.
Some reports are not implemented at all as there is no data available to check if the parsing is correct at all. If you do get errors like Report ... not implement
or similar please open an issue and report the (anonymized) example data. Thanks.
The information about the event could be better in many cases but as Cymru does not want to be associated with the report, we can't add comments to the events in the parser, because then the source would be easily identifiable for the recipient.
http://www.team-cymru.com/bogon-reference.html
name:
intelmq.bots.parsers.cymru.parser_full_bogonspublic:
nocache (redis db):
nonedescription:
Parses data from full bogons feed.
name:
intelmq.bots.parsers.github_feed.parserdescription:
Parses Feeds available publicly on GitHub (should receive fromgithub_api
collector)
name:
intelmq.bots.parsers.hibp.parser_callbackpublic:
nocache (redis db):
nonedescription:
Parses data from Have I Been Pwned feed.
Parsers the data from a Callback of a Have I Been Pwned Enterprise Subscription.
Parses breaches and pastes and creates one event per e-mail address. The e-mail address is stored in source.account
.
classification.type
is leak
and classification.identifier
is breach
or paste
.
-
"columns"
: A list of strings or a string of comma-separated values with field names. The names must match the harmonization's field names. Empty column specifications and columns named"__IGNORE__"
are ignored. E.g."columns": [ "", "source.fqdn", "extra.http_host_header", "__IGNORE__" ],
is equivalent to:
"columns": ",source.fqdn,extra.http_host_header,"
The first and the last column are not used in this example. It is possible to specify multiple columns using the
|
character. E.g."columns": "source.url|source.fqdn|source.ip"
First, bot will try to parse the value as URL, if it fails, it will try to parse it as FQDN, if that fails, it will try to parse it as IP, if that fails, an error will be raised. Some use cases -
- mixed data set, e.g. URL/FQDN/IP/NETMASK `"columns": "source.url|source.fqdn|source.ip|source.network"` - parse a value and ignore if it fails `"columns": "source.url|__IGNORE__"`
-
"ignore_values"
: A list of strings or a string of comma-separated values which will not considered while assigning to the corresponding fields given incolumns
. E.g."ignore_values": [ "", "unknown", "Not listed", ],
is equivalent to:
"ignore_values": ",unknown,Not listed,"
The following configuration will lead to assigning all values to malware.name and extra.SBL except
unknown
andNot listed
respectively."columns": [ "source.url", "malware.name", "extra.SBL", ], "ignore_values": [ "", "unknown", "Not listed", ],
Parameters columns and ignore_values must have same length
-
"attribute_name"
: Filtering table with table attributes, to be used in conjunction withattribute_value
, optional. E.g.class
,id
,style
. -
"attribute_value"
: String. To filter all tables with attributeclass='details'
use"attribute_name": "class", "attribute_value": "details"
-
"table_index"
: Index of the table if multiple tables present. Ifattribute_name
andattribute_value
given, index according to tables remaining after filtering with table attribute. Default:0
. -
"split_column"
: Padded column to be split to get values, to be used in conjunction withsplit_separator
andsplit_index
, optional. -
"split_separator"
: Delimiter string for padded column. -
"split_index"
: Index of unpadded string in returned list from splittingsplit_column
withsplit_separator
as delimiter string. Default:0
. E.g."split_column": "source.fqdn", "split_separator": " ", "split_index": 1,
With above configuration, column corresponding to
source.fqdn
with value[D] lingvaworld.ru
will be assigned as"source.fqdn": "lingvaworld.ru"
. -
"skip_table_head"
: Boolean, skip the first row of the table, optional. Default:true
. -
"default_url_protocol"
: For URLs you can give a default protocol which will be pretended to the data. Default:"http://"
. -
"time_format"
: Optional. If"timestamp"
,"windows_nt"
or"epoch_millis"
the time will be converted first. With the defaultnull
fuzzy time parsing will be used. -
"type"
: set theclassification.type
statically, optional -
"html_parser"
: The HTML parser to use, by default "html.parser", can also be e.g. "lxml", have a look at https://www.crummy.com/software/BeautifulSoup/bs4/doc/
name:
intelmq.bots.parsers.mcafee.parser_atd_filelookup:
yespublic:
nocache (redis db):
nonedescription:
parses file hash information off ATD reports
- Feed parameters (see above)
verdict_severity
: min report severity to parse
name:
intelmq.bots.parsers.mcafee.parser_atd_filelookup:
yespublic:
nocache (redis db):
nonedescription:
parses IP addresses off ATD reports
- Feed parameters (see above)
verdict_severity
: min report severity to parse
name:
intelmq.bots.parsers.mcafee.parser_atd_filelookup:
yespublic:
nocache (redis db):
nonedescription:
parses URLs off ATD reports
- Feed parameters (see above)
verdict_severity
: min report severity to parse
name
:intelmq.bots.parsers.microsoft.parser_ctip
public
: nocache (redis db)
: nonedescription
: Parses data from the Microsoft CTIP Feed
Can parse the JSON format provided by the Interflow interface (lists of dictionaries) as well as the format provided by the Azure interface (one dictionary per line). The provided data differs between the two formats/providers.
name:
intelmq.bots.parsers.misp.parserpublic:
nocache (redis db):
nonedescription:
Parses MISP events
MISP events collected by the MISPCollectorBot are passed to this parser
for processing. Supported MISP event categories and attribute types are
defined in the SUPPORTED_MISP_CATEGORIES
and MISP_TYPE_MAPPING
class
constants.
name
:intelmq.bots.parsers.n6.parser_n6stomp
public
: nocache (redis db)
: nonedescription
: Convert n6 data into IntelMQ format.
None
Test messages are ignored, this is logged with debug logging level.
Also contains a mapping for the classification (results in taxonomy, type and identifier).
The name
field is normally used as malware.name
, if that fails due to disallowed characters, these characters are removed and the original value is saved as event_description.text
. This can happen for names like "further iocs: text with invalid ’ char"
.
If an n6 message contains multiple IP addresses, multiple events are generated, resulting in events only differing in the address information.
name:
intelmq.bots.parsers.twitter.parserpublic:
nocache (redis db):
nonedescription:
Extracts URLs from text, fuzzy, aimed at parsing tweets
domain_whitelist
: domains to be filtered outsubstitutions
: semicolon delimited list of even length of pairs of substitutions (for example: '[.];.;,;.' substitutes '[.]' for '.' and ',' for '.')classification_type
: string with a valid classification type as defined in data harmonizationdefault_scheme
: Default scheme for URLs if not given. See also the next section.
The dependency url-normalize
changed it's behavior in version 1.4.0 from using http://
as default scheme to https://
. Version 1.4.1 added the possibility to specify it. Thus you can only use the default_scheme
parameter with a current version of this library >= 1.4.1, with 1.4.0 you will always get https://
as default scheme and for older versions < 1.4.0 http://
is used.
This does not affect URLs which already include the scheme.
name:
intelmq.bots.parsers.shadowserver.parserpublic:
yesdescription:
Parses different reports from Shadowserver.
feedname
: Optional, the Name of the feed, see list below for possible values.overwrite
: If an existingfeed.name
should be overwritten.
There are two possibilities for the bot to determine which feed the data belongs to in order to determine the correct mapping of the columns:
Since IntelMQ version 2.1 the parser can detect the feed based on metadata provided by the collector.
When processing a report, this bot takes extra.file_name
from the report and
looks in config.py
how the report should be parsed.
If this lookup is not possible, and the feed name is not given as parameter, the feed cannot be parsed.
The field extra.file_name
has the following structure:
%Y-%m-%d-${report_name}[-suffix].csv
where suffix can be something like country-geo
. For example, some possible filenames are 2019-01-01-scan_http-country-geo.csv
or 2019-01-01-scan_tftp.csv
. The important part is ${report_name}
, between the date and the suffix.
Since version 2.1.2 the date in the filename is optional, so filenames like scan_tftp.csv
are also detected.
If the method above is not possible and for upgraded instances, the feed can be set with the feedname
parameter.
Feed-names are derived from the subjects of the Shadowserver E-Mails.
A list of possible feeds can be found in the table below in the column "feed name".
These are the supported feed name and their corresponding file name for automatic detection:
feed name | file name |
---|---|
Accessible-ADB | scan_adb |
Accessible-AFP | scan_afp |
Accessible-ARD | scan_ard |
Accessible-Cisco-Smart-Install | cisco_smart_install |
Accessible-CoAP | scan_coap |
Accessible-CWMP | scan_cwmp |
Accessible-FTP | scan_ftp |
Accessible-Hadoop | scan_hadoop |
Accessible-HTTP | scan_http |
Accessible-RDP | scan_rdp |
Accessible-Rsync | scan_rsync |
Accessible-SMB | scan_smb |
Accessible-Telnet | scan_telnet |
Accessible-Ubiquiti-Discovery-Service | scan_ubiquiti |
Accessible-VNC | scan_vnc |
Amplification-DDoS-Victim | ddos_amplification |
Blacklisted-IP (deprecated) | blacklist |
Blocklist | blocklist |
Compromised-Website | compromised_website |
Darknet | darknet |
DNS-Open-Resolvers | scan_dns |
Drone | botnet_drone |
Drone-Brute-Force | drone_brute_force |
HTTP-Scanners | hp_http_scan |
ICS-Scanners | hp_ics_scan |
IPv6-Sinkhole-HTTP-Drone | sinkhole6_http |
Microsoft-Sinkhole | microsoft_sinkhole |
NTP-Monitor | scan_ntpmonitor |
NTP-Version | scan_ntp |
Open-Chargen | scan_chargen |
Open-DB2-Discovery-Service | scan_db2 |
Open-Elasticsearch | scan_elasticsearch |
Open-IPMI | scan_ipmi |
Open-IPP | scan_ipp |
Open-LDAP | scan_ldap |
Open-LDAP-TCP | scan_ldap_tcp |
Open-mDNS | scan_mdns |
Open-Memcached | scan_memcached |
Open-MongoDB | scan_mongodb |
Open-MQTT | scan_mqtt |
Open-MSSQL | scan_mssql |
Open-NATPMP | scan_nat_pmp |
Open-NetBIOS-Nameservice | scan_netbios |
Open-Netis | ? |
Open-Portmapper | scan_portmapper |
Open-QOTD | scan_qotd |
Open-Redis | scan_redis |
Open-SNMP | scan_snmp |
Open-SSDP | scan_ssdp |
Open-TFTP | scan_tftp |
Open-XDMCP | scan_xdmcp |
Outdated-DNSSEC-Key | outdated_dnssec_key |
Outdated-DNSSEC-Key-IPv6 | outdated_dnssec_key_v6 |
Sandbox-URL | cwsandbox_url |
Sinkhole-HTTP-Drone | sinkhole_http_drone |
Spam-URL | spam_url |
SSL-FREAK-Vulnerable-Servers | scan_ssl_freak |
SSL-POODLE-Vulnerable-Servers | scan_ssl_poodle |
Vulnerable-ISAKMP | scan_isakmp |
The parser consists of two files:
config.py
parser.py
Both files are required for the parser to work properly.
Add a new feed format and conversions if required to the file
config.py
. Don't forget to update the feed_idx
dict.
It is required to look up the correct configuration.
Look at the documentation in the bots's config.py
file for more information.
name:
intelmq.bots.parsers.shodan.parserpublic:
yesdescription:
Parses data from Shodan (search, stream etc).
The parser is by far not complete as there are a lot of fields in a big nested structure. There is a minimal mode available which only parses the important/most useful fields and also saves everything in extra.shodan
keeping the original structure. When not using the minimal mode if may be useful to ignore errors as many parsing errors can happen with the incomplete mapping.
ignore_errors
: Boolean (default true)minimal_mode
: Boolean (default false)
name:
intelmq.bots.parsers.zoneh.parserpublic:
yesdescription:
Parses data from ZoneH.
This bot is designed to consume defacement reports from zone-h.org. It expects fields normally present in CSV files distributed by email.
name:
abusixlookup:
dnspublic:
yescache (redis db):
5description:
RIPE abuse contacts resolving through DNS TXT queriesnotes
: https://abusix.com/contactdb.html
- Cache parameters (see in section common parameters)
This bot can optionally use the python module querycontacts by Abusix itself: https://pypi.org/project/querycontacts/
pip3 install querycontacts
If the package is not installed, our own routines are used.
name:
ASN lookuplookup:
local databasepublic:
yescache (redis db):
nonedescription:
IP to ASN
database
: Path to the downloaded database.
Install pyasn
module
pip3 install pyasn
- Download database and convert:
# cd /tmp/
# pyasn_util_download.py --latest
# pyasn_util_convert.py --single <downloaded_filename.bz2> ipasn.dat
Note: the '<' '>' characters only are syntactic markings, no shell redirection is necessary.
- Copy database to IntelMQ:
# mkdir /opt/intelmq/var/lib/bots/asn_lookup
# mv /tmp/ipasn.dat /opt/intelmq/var/lib/bots/asn_lookup/
# chown -R intelmq.intelmq /opt/intelmq/var/lib/bots/asn_lookup
name
: `intelmq.bots.experts.csv_converter.expertlookup
: nopublic
: yescache (redis db)
: nonedescription
: Converts an event to CSV format, saved in theoutput
field.
delimiter
: String, default","
fieldnames
: Comma-separated list of field names, e.g."time.source,classification.type,source.ip"
To use the CSV-converted data in an output bot - for example in a file output,
use the configuration parameter single_key
of the output bot and set it to output
.
name:
cymru-whoislookup:
Cymru DNSpublic:
yescache (redis db):
5description:
IP to geolocation, ASN, BGP prefix
Public documentation: https://www.team-cymru.com/IP-ASN-mapping.html#dns
- Cache parameters (see in section common parameters)
overwrite
: Overwrite existing fields. Default:True
if not given (for backwards compatibility, will change in version 3.0.0)
This bots adds the public suffix to the event, derived by a domain. See or information on the public suffix list: https://publicsuffix.org/list/ Only rules for ICANN domains are processed. The list can (and should) contain Unicode data, punycode conversion is done during reading.
Note that the public suffix is not the same as the top level domain (TLD). E.g.
co.uk
is a public suffix, but the TLD is uk
.
Privately registered suffixes (such as blogspot.co.at
) which are part of the
public suffix list too, are ignored.
name:
domain suffixlookup:
nopublic:
yescache (redis db):
-description:
extracts the domain suffix from the FQDN
field
: either"fqdn"
or"reverse_dns"
suffix_file
: path to the suffix file
A short summary how the rules are processed:
The simple ones:
com
at
gv.at
example.com
leads to com
, example.gv.at
leads to gv.at
.
Wildcards:
*.example.com
www.example.com
leads to www.example.com
.
And additionally the exceptions, together with the above wildcard rule:
!www.example.com
www.example.com
does now not lead to www.example.com
, but to example.com
.
name:
deduplicatorlookup:
redis cachepublic:
yescache (redis db):
6description:
Bot responsible for ignore duplicated messages. The bot can be configured to perform deduplication just looking to specific fields on the message.
- Cache parameters (see in section common parameters)
bypass
- true or false value to bypass the deduplicator. When set to true, messages will not be deduplicated. Default: false
filter_type
: type of the filtering which can be "blacklist" or "whitelist". The filter type will be used to define how Deduplicator bot will interpret the parameterfilter_keys
in order to decide whether an event has already been seen or not, i.e., duplicated event or a completely new event.- "whitelist" configuration: only the keys listed in
filter_keys
will be considered to verify if an event is duplicated or not. - "blacklist" configuration: all keys except those in
filter_keys
will be considered to verify if an event is duplicated or not.
- "whitelist" configuration: only the keys listed in
filter_keys
: string with multiple keys separated by comma. Please note thattime.observation
key will not be considered even if defined, because the system always ignore that key.
The bot with this configuration will detect duplication only based on source.ip
and destination.ip
keys.
"parameters": {
"redis_cache_db": 6,
"redis_cache_host": "127.0.0.1",
"redis_cache_password": null,
"redis_cache_port": 6379,
"redis_cache_ttl": 86400,
"filter_type": "whitelist",
"filter_keys": "source.ip,destination.ip",
}
The bot with this configuration will detect duplication based on all keys, except source.ip
and destination.ip
keys.
"parameters": {
"redis_cache_db": 6,
"redis_cache_host": "127.0.0.1",
"redis_cache_password": null,
"redis_cache_port": 6379,
"redis_cache_ttl": 86400,
"filter_type": "blacklist",
"filter_keys": "source.ip,destination.ip",
}
To flush the deduplicator's cache, you can use the redis-cli
tool. Enter the database used by the bot and submit the flushdb
command:
redis-cli -n 6
flushdb
name:
do_portallookup:
yespublic:
nocache (redis db):
nonedescription:
The DO portal retrieves the contact information from a DO portal instance: https://github.com/certat/do-portal/
mode
- Eitherreplace
orappend
the new abuse contacts in case there are existing ones.portal_url
- The URL to the portal, without the API-path. The used URL is$portal_url + '/api/1.0/ripe/contact?cidr=%s'
.portal_api_key
- The API key of the user to be used. Must have sufficient privileges.
name:
reducerlookup:
nonepublic:
yescache (redis db):
nonedescription:
The field reducer bot is capable of removing fields from events.
type
- either"whitelist"
or"blacklist"
keys
- Can be a JSON-list of field names (["raw", "source.account"]
) or a string with a comma-separated list of field names ("raw,source.account"
).
Only the fields in keys
will passed along.
The fields in keys
will be removed from events.
The filter bot is capable of filtering specific events.
name:
filterlookup:
nonepublic:
yescache (redis db):
nonedescription:
filter messages (drop or pass messages) FIXME
filter_key
- key from data harmonizationfilter_value
- value for the keyfilter_action
- action when a message match to the criteria (possible actions: keep/drop)filter_regex
- attribute determines if thefilter_value
shall be treated as regular expression or not. If this attribute is not empty, the bot uses python's "search" function to evaluate the filter.
not_before
- events before this time will be droppednot_after
- events after this time will be dropped
Both parameters accept string values describing absolute or relative time:
- absolute
- basically anything parseable by datetime parser, eg. "2015-09-012T06:22:11+00:00"
time.source
taken from the event will be compared to this value to decide the filter behavior- relative
- accepted string formatted like this " ", where epoch could be any of following strings (could optionally end with trailing 's'): hour, day, week, month, year
- time.source taken from the event will be compared to the value (now - relative) to decide the filter behavior
Examples of time filter definition:
"not_before" : "2015-09-012T06:22:11+00:00"
events older than the specified time will be dropped"not_after" : "6 months"
just events older than 6 months will be passed through the pipeline
_default
: default path, according to the configurationaction_other
: Negation of the default pathfilter_match
: For all events the filter matched onfilter_no_match
: For all events the filter does not match
action | match | _default |
action_other |
filter_match |
filter_no_match |
---|---|---|---|---|---|
keep | ✓ | ✓ | ✗ | ✓ | ✗ |
keep | ✗ | ✗ | ✓ | ✗ | ✓ |
drop | ✓ | ✗ | ✓ | ✓ | ✗ |
drop | ✗ | ✓ | ✗ | ✗ | ✓ |
In DEBUG
logging level, one can see that the message is sent to both matching paths, also if one of the paths is not configured. Of course the message is only delivered to the configured paths.
name:
Format Fieldlookup:
nonecache (redis db):
nonedescription:
String method operations on column values
strip_columns
- A list of strings or a string of comma-separated values with field names. The names must match the harmonization's field names. E.g.is equivalent to:"columns": [ "malware.name", "extra.tags" ],
"columns": "malware.name,extra.tags"
strip_chars
- a set of characters to remove as leading/trailing characters(default:
replace_column
- key from data harmonizationold_value
- the string to search fornew_value
- the string to replace the old value withreplace_count
- number specifying how many occurrences of the old value you want to replace(default:1
)
split_column
- key from data harmonizationsplit_separator
- specifies the separator to use when splitting the string(default:,
)
Order of operation: strip -> replace -> split
. These three methods can be combined such as first strip and then split.
This bot is capable for enriching intelmq events by lookups to a database. Currently only PostgreSQL and SQLite are supported.
If more than one result is returned, a ValueError is raised.
name:
intelmq.bots.experts.generic_db_lookup.expert
lookup:
databasepublic:
yescache (redis db):
nonedescription:
This bot is capable for enriching intelmq events by lookups to a database.
engine
:postgresql
orsqlite
database
: string, defaults to "intelmq", database name or the SQLite filenametable
: defaults to "contacts"
host
: string, defaults to "localhost"password
: stringport
: integer, defaults to 5432sslmode
: string, defaults to "require"user
: defaults to "intelmq"
match_fields
: defaults to{"source.asn": "asn"}
The value is a key-value mapping an arbitrary number intelmq field names to table column names.
The values are compared with =
only.
overwrite
: defaults tofalse
. Is applied per fieldreplace_fields
: defaults to{"contact": "source.abuse_contact"}
replace_fields
is again a key-value mapping an arbitrary number of table column names to intelmq field names
name:
gethostbynamelookup:
DNSpublic:
yescache (redis db):
nonedescription:
DNS name (FQDN) to IP
none
Converts the event to IDEA format and saves it as JSON in the field output
. All other fields are not modified.
Documentation about IDEA: https://idea.cesnet.cz/en/index
name:
intelmq.bots.experts.idea.expertlookup:
nopublic:
yescache (redis db):
nonedescription:
The bot does a best effort translation of events into the IDEA format.
test_mode
: addTest
category to mark all outgoing IDEA events as informal (meant to simplify setting up and debugging new IDEA producers) (default:true
)
name:
intelmq.bots.experts.maxmind_geoip.expertlookup:
local databasepublic:
yescache (redis db):
nonedescription:
IP to geolocation
The bot requires the MaxMind's geoip2
Python library, version 2.2.0 has been tested.
To download the database a free license key is required. More information can be found at https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/
You may want to use a shell script provided in the contrib directory to keep the database up to date: contrib/cron-jobs/update-geoip-data
database
: Path to the local database, e.g."/opt/intelmq/var/lib/bots/maxmind_geoip/GeoLite2-City.mmdb"
overwrite
: booleanuse_registered
: boolean. MaxMind has two country ISO codes: One for the physical location of the address and one for the registered location. Default isfalse
(backwards-compatibility). See also certtools#1344 for a short explanation.
Queries a MISP instance for the source.ip
and adds the MISP Attribute UUID and MISP Event ID of the newest attribute found.
name:
intelmq.bots.experts.misp.expertlookup:
yespublic:
nocache (redis db):
nonedescription:
IP address to MISP attribute and event
misp_key
: MISP Authkeymisp_url
: URL of MISP server (with trailing '/')
Generic parameters used in this bot:
http_verify_cert
: Verify the TLS certificate of the server, boolean (default:true
)
name:
intelmq.bots.experts.mcafee.expert_marlookup:
yespublic:
nocache (redis db):
nonedescription:
Queries occurrences of hashes within local environment
- Feed parameters (see above)
dxl_config_file
: location of file containing required information to connect to DXL buslookup_type
: One of:Hash
: looks upmalware.hash.md5
,malware.hash.sha1
andmalware.hash.sha256
DestSocket
: looks updestination.ip
anddestination.port
DestIP
: looks updestination.ip
DestFQDN
: looks up indestination.fqdn
name:
intelmq.bots.experts.mcafee.expert_mar_iplookup:
yespublic:
nocache (redis db):
nonedescription:
Queries occurrences of connection attempts to destination ip/port within local environment
- Feed parameters (see above)
dxl_config_file
: location of file containing required information to connect to DXL bus
name:
intelmq.bots.experts.mcafee.expert_mar_urllookup:
yespublic:
nocache (redis db):
nonedescription:
Queries occurrences of FQDN lookups within local environment
- Feed parameters (see above)
dxl_config_file
: location of file containing required information to connect to DXL bus
name:
modifylookup:
local configpublic:
yescache (redis db):
nonedescription:
modify expert bot allows you to change arbitrary field values of events just using a configuration file
configuration_path
: filenamecase_sensitive
: boolean, default: truemaximum_matches
: Maximum number of matches. Processing stops after the limit is reached. Default: no limit (null
,0
).overwrite
: Overwrite any existing fields by matching rules. Default if the parameter is given:true
, for backwards compatibility. Default will change tofalse
in version 3.0.0.
The modify expert bot allows you to change arbitrary field values of events just using a configuration file. Thus it is possible to adapt certain values or adding new ones only by changing JSON-files without touching the code of many other bots.
The configuration is called modify.conf
and looks like this:
[
{
"rulename": "Standard Protocols http",
"if": {
"source.port": "^(80|443)$"
},
"then": {
"protocol.application": "http"
}
},
{
"rulename": "Spamhaus Cert conficker",
"if": {
"malware.name": "^conficker(ab)?$"
},
"then": {
"classification.identifier": "conficker"
}
},
{
"rulename": "bitdefender",
"if": {
"malware.name": "bitdefender-(.*)$"
},
"then": {
"malware.name": "{matches[malware.name][1]}"
}
},
{
"rulename": "urlzone",
"if": {
"malware.name": "^urlzone2?$"
},
"then": {
"classification.identifier": "urlzone"
}
},
{
"rulename": "default",
"if": {
"feed.name": "^Spamhaus Cert$"
},
"then": {
"classification.identifier": "{msg[malware.name]}"
}
}
]
In our example above we have five groups labeled Standard Protocols http
,
Spamhaus Cert conficker
, bitdefender
, urlzone
and default
.
All sections will be considered, in the given order (from top to bottom).
Each rule consists of conditions and actions. Conditions and actions are dictionaries holding the field names of events and regex-expressions to match values (selection) or set values (action). All matching rules will be applied in the given order. The actions are only performed if all selections apply.
If the value for a condition is an empty string, the bot checks if the field does not exist. This is useful to apply default values for empty fields.
You can set the value of the field to a string literal or number.
In addition you can use the standard Python string format syntax
to access the values from the processed event as msg
and the match groups
of the conditions as matches
, see the bitdefender example above.
Group 0 ([0]
) contains the full matching string. See also the documentation on re.Match.group
.
Note that matches
will also contain the match groups
from the default conditions if there were any.
We have an event with feed.name = Spamhaus Cert
and malware.name = confickerab
. The expert loops over all sections in the file and eventually enters section Spamhaus Cert
. First, the default condition is checked, it matches! OK, going on. Otherwise the expert would have selected a different section that has not yet been considered. Now, go through the rules, until we hit the rule conficker
. We combine the conditions of this rule with the default conditions, and both rules match! So we can apply the action: classification.identifier
is set to conficker
, the trivial name.
Assume we have an event with feed.name = Spamhaus Cert
and malware.name = feodo
. The default condition matches, but no others. So the default action is applied. The value for classification.identifier
will be set to feodo
by {msg[malware.name]}
.
If the rule is a string, a regex-search is performed, also for numeric values (str()
is called on them). If the rule is numeric for numeric values, a simple comparison is done. If other types are mixed, a warning will be thrown.
For boolean values, the comparison value needs to be true
or false
as in JSON they are written all-lowercase.
name:
national_cert_contact_certat
lookup:
httpspublic:
yescache (redis db):
nonedescription:
https://contacts.cert.at offers an IP address to national CERT contact (and cc) mapping. See https://contacts.cert.at for more info.
filter
: (true/false) act as a filter for AT.overwrite_cc
: set to true if you want to overwrite any potentially existing cc fields in the event.
This Bot tags events with score found in recorded futures large IP risklist.
name:
recordedfuture_iprisklookup:
local databasepublic:
nocache (redis db):
nonedescription:
Record risk score associated to source and destination IP if they are present. Assigns 0 to IP addresses not in the RF list.
database
: Location of csv file obtained from recorded future API (a script is provided to download the large IP set)overwrite
: set to true if you want to overwrite any potentially existing risk score fields in the event.
For both source.ip
and destination.ip
the corresponding risk score is fetched from a local database created from Recorded Future's API. The score is recorded in extra.rf_iprisk.source
and extra.rf_iprisk.destination
. If a lookup for an IP fails a score of 0 is recorded.
See https://www.recordedfuture.com/products/api/ and speak with your recorded future representative for more information.
The list is obtained from recorded future API and needs a valid API TOKEN The large list contains all IP's with a risk score of 25 or more. If IP's are not present in the database a risk score of 0 is given
A script is supplied that may be run as intelmq to update the database. The script needs to be edited to use a valid API token.
Download database:
mkdir /opt/intelmq/var/lib/bots/recordedfuture_iprisk
cd /tmp/
curl -H "X-RFToken: [API Token]" --output rfiprisk.dat.gz "https://api.recordedfuture.com/v2/ip/risklist?format=csv%2Fsplunk&gzip=true&list=large"
bunzip rfiprisk.dat.gz
mv rfiprisk.dat /opt/intelmq/var/lib/bots/recordedfuture_iprisk/rfiprisk.dat
chown intelmq.intelmq -R /opt/intelmq/var/lib/bots/recordedfuture_iprisk
For both source.ip
and destination.ip
the PTR record is fetched and the first valid result is used for source.reverse_dns
/destination.reverse_dns
.
name:
reverse-dnslookup:
DNSpublic:
yescache (redis db):
8description:
IP to domain
- Cache parameters (see in section common parameters)
cache_ttl_invalid_response
: The TTL for cached invalid responses.overwrite
: Overwrite existing fields. Default:True
if not given (for backwards compatibility, will change in version 3.0.0)
Several RFCs define ASNs, IP Addresses and Hostnames (and TLDs) reserved for documentation.
Events or fields of events can be dropped if they match the criteria of either being reserved for documentation (e.g. AS 64496, Domain example.com
)
or belonging to a local area network (e.g. 192.168.0.0/24
). These checks can applied to URLs, IP Addresses, FQDNs and ASNs.
It is configurable if the whole event should be dropped ("policies") or just the field removed, as well as which fields should be checked.
Sources:
- https://tools.ietf.org/html/rfc1918
- https://tools.ietf.org/html/rfc2606
- https://tools.ietf.org/html/rfc3849
- https://tools.ietf.org/html/rfc4291
- https://tools.ietf.org/html/rfc5737
- https://en.wikipedia.org/wiki/IPv4
- https://en.wikipedia.org/wiki/Autonomous\_system\_(Internet)
name:
rfc1918lookup:
nonepublic:
yescache (redis db):
nonedescription:
removes events or single fields with invalid data
fields
: string, comma-separated list of fields e.g.destination.ip,source.asn,source.url
. Supported fields are:destination.asn
&source.asn
destination.fqdn
&source.fqdn
destination.ip
&source.ip
destination.url
&source.url
policy
: string, comma-separated list of policies, e.g.del,drop,drop
.drop
will cause that the the entire event to be removed if the field is ,del
causes the field to be removed.
With the example parameter values given above, this means that:
- If a
destination.ip
value is part of a reserved network block, the field will be removed (policy "del"). - If a
source.asn
value is in the range of reserved AS numbers, the event will be removed altogether (policy "drop). - If a
source.url
value contains a host with either an IP address part of a reserved network block, or a reserved domain name (or with a reserved TLD), the event will be dropped (policy "drop")
Online RIPE Abuse Contact and Geolocation Finder for IP addresses and Autonomous Systems.
name:
ripencc-abuse-contactlookup:
HTTPS APIpublic:
yescache (redis db):
10description:
IP to abuse contact
- Cache parameters (see in section common parameters)
mode
: eitherappend
(default) orreplace
query_ripe_db_asn
: Query for IPs athttp://rest.db.ripe.net/abuse-contact/%s.json
, defaulttrue
query_ripe_db_ip
: Query for ASNs athttp://rest.db.ripe.net/abuse-contact/as%s.json
, defaulttrue
query_ripe_stat_asn
: Query for ASNs athttps://stat.ripe.net/data/abuse-contact-finder/data.json?resource=%s
, defaulttrue
query_ripe_stat_ip
: Query for IPs athttps://stat.ripe.net/data/abuse-contact-finder/data.json?resource=%s
, defaulttrue
query_ripe_stat_geolocation
: Query for IPs athttps://stat.ripe.net/data/maxmind-geo-lite/data.json?resource=%s
, defaulttrue
name:
sievelookup:
nonepublic:
yescache (redis db):
nonedescription:
Filtering with a sieve-based configuration language
file
: Path to sieve file. Syntax can be validated withintelmq_sieve_expert_validator
.
The sieve bot is used to filter and/or modify events based on a set of rules. The rules are specified in an external configuration file and with a syntax similar to the Sieve language used for mail filtering.
Each rule defines a set of matching conditions on received events. Events can be matched based on keys and values in the event. If the processed event matches a rule's conditions, the corresponding actions are performed. Actions can specify whether the event should be kept or dropped in the pipeline (filtering actions) or if keys and values should be changed (modification actions).
To use this bot, you need to install the required dependencies:
pip3 install -r intelmq/bots/experts/sieve/REQUIREMENTS.txt
The following excerpts illustrate some of the basic features of the sieve file format:
if :exists source.fqdn {
keep // aborts processing of subsequent rules and forwards the event.
}
if :notexists source.abuse_contact || source.abuse_contact =~ '.*@example.com' {
drop // aborts processing of subsequent rules and drops the event.
}
if source.ip << '192.0.0.0/24' {
add! comment = 'bogon' // sets the field comment to this value and overwrites existing values
path 'other-path' // the message is sent to the given path
}
if classification.type == ['phishing', 'malware'] && source.fqdn =~ '.*\.(ch|li)$' {
add! comment = 'domainabuse'
keep
} elif classification.type == 'scanner' {
add! comment = 'ignore'
drop
} else {
remove comment
}
The sieve file contains an arbitrary number of rules of the form:
if EXPRESSION {
ACTIONS
} elif EXPRESSION {
ACTIONS
} else {
ACTIONS
}
Each rule specifies on or more expressions to match an event based on its keys and values. Event keys are specified as strings without quotes. String values must be enclosed in single quotes. Numeric values can be specified as integers or floats and are unquoted. IP addresses and network ranges (IPv4 and IPv6) are specified with quotes. Following operators may be used to match events:
-
:exists
and:notexists
match if a given key exists, for example:if :exists source.fqdn { ... }
-
==
and!=
match for equality of strings and numbers, for example:if feed.name != 'acme-security' || feed.accuracy == 100 { ... }
-
:contains
matches on substrings. -
=~
matches strings based on the given regex.!~
is the inverse regex match. -
Numerical comparisons are evaluated with
<
,<=
,>
,>=
. -
<<
matches if an IP address is contained in the specified network range:if source.ip << '10.0.0.0/8' { ... }
-
Values to match against can also be specified as list, in which case any one of the values will result in a match:
if source.ip == ['8.8.8.8', '8.8.4.4'] { ... }
In this case, the event will match if it contains a key source.ip
with
either value 8.8.8.8
or 8.8.4.4
.
With inequality operators, the behavior is the same, so it matches if any expression does not match:
if source.ip != ['8.8.8.8', '8.8.4.4'] { ... }
Events with values like 8.8.8.8
or 8.8.4.4
will match, as they are always unequal to the other value.
The result is not that the field must be unequal to all given values.
If part of a rule matches the given conditions, the actions enclosed in {
and
}
are applied. By default, all events that are matched or not matched by rules
in the sieve file will be forwarded to the next bot in the pipeline, unless the
drop
action is applied.
-
add
adds a key value pair to the event. This action only applies if the key is not yet defined in the event. If the key is already defined, the action is ignored. Example:add comment = 'hello, world'
-
add!
same as above, but will force overwrite the key in the event. -
update
modifies an existing value for a key. Only applies if the key is already defined. If the key is not defined in the event, this action is ignored. Example:update feed.accuracy = 50
-
remove
removes a key/value from the event. Action is ignored if the key is not defined in the event. Example:remove extra.comments
-
keep
sends the message to the next bot in the pipeline (same as the default behaviour), and stops sieve file processing.keep
-
path
sets the path (named queue) the message should be sent to (implicitly or with the commandkeep
. The named queue needs to configured in the pipeline, see the User Guide for more information.path 'named-queue'
-
drop
marks the event to be dropped. The event will not be forwarded to the next bot in the pipeline. The sieve file processing is interrupted upon reaching this action. No other actions may be specified besides thedrop
action within{
and}
.
Comments may be used in the sieve file: all characters after //
and until the end of the line will be ignored.
Use the following command to validate your sieve files:
$ intelmq.bots.experts.sieve.validator
usage: intelmq.bots.experts.sieve.validator [-h] sievefile
Validates the syntax of sievebot files.
positional arguments:
sievefile Sieve file
optional arguments:
-h, --help show this help message and exit
name:
taxonomylookup:
nopublic:
yescache (redis db):
nonedescription:
Adds theclassification.taxonomy
field according to the RSIT taxonomy.
Please note that there is a slight mismatch of IntelMQ's taxonomy to the upstream taxonomy, but it should not matter here much.
None.
Information on the "Reference Security Incident Taxonomy" can be found here: https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force
For brevity, "type" means classification.type
and "taxonomy" means classification.taxonomy
.
- If taxonomy is missing, and type is given, the according taxonomy is set.
- If neither taxonomy, not type is given, taxonomy is set to "other" and type to "unknown".
- If taxonomy is given, but type is not, type is set to "unknown".
name:
tor-nodeslookup:
local databasepublic:
yescache (redis db):
nonedescription:
check if IP is tor node
database
: Path to the database
Use the included script update-tor-nodes
to download the database.
This bot extracts the Host from the source.url
and destination.url
fields and
writes it to source.fqdn
or destination.fqdn
if it is a hostname, or
source.ip
or destination.ip
if it is an IP address.
name:
url2fqdnlookup:
nonepublic:
yescache (redis db):
nonedescription:
writes domain name from URL to FQDN or IP address
overwrite
: boolean, replace existing FQDN / IP address?
name:
waitlookup:
nonepublic:
yescache (redis db):
nonedescription:
Waits for a some time or until a queue size is lower than a given number.
queue_db
: Database number of the database, default2
. Converted to integer.queue_host
: Host of the database, defaultlocalhost
.queue_name
: Name of the queue to be watched, defaultnull
. This is not the name of a bot but the queue's name.queue_password
: Password for the database, defaultNone
.queue_polling_interval
: Interval to poll the list length in seconds. Converted to float.queue_port
: Port of the database, default6379
. Converted to integer.queue_size
: Maximum size of the queue, default0
. Compared by <=. Converted to integer.sleep_time
: Time to sleep before sending the event.
Only one of the two modes is possible. If a queue name is given, the queue mode is active. If the sleep_time is a number, sleep mode is active. Otherwise the dummy mode is active, the events are just passed without an additional delay.
Note that SIGHUPs and reloads interrupt the sleeping.
Sends data to an AMQP Server See https://www.rabbitmq.com/tutorials/amqp-concepts.html for more details on amqp topic exchange.
Requires the pika
python library.
name
:intelmq.bots.outputs.amqptopic.output
lookup
: to the amqp serverpublic
: yescache
: nodescription
: Sends the event to a specified topic of an AMQP server
- connection_attempts : The number of connection attempts to defined server, defaults to 3
- connection_heartbeat : Heartbeat to server, in seconds, defaults to 3600
- connection_host : Name/IP for the AMQP server, defaults to 127.0.0.1
- connection_port : Port for the AMQP server, defaults to 5672
- connection_vhost : Virtual host to connect, on an http(s) connection would be http:/IP/
- content_type : Content type to deliver to AMQP server, currently only supports "application/json"
- delivery_mode : 1 - Non-persistent, 2 - Persistent. On persistent mode, messages are delivered to 'durable' queues and will be saved to disk.
- exchange_durable : If set to True, the exchange will survive broker restart, otherwise will be a transient exchange.
- exchange_name : The name of the exchange to use
- exchange_type : Type of the exchange, e.g.
topic
,fanout
etc. - keep_raw_field : If set to True, the message 'raw' field will be sent
- password : Password for authentication on your AMQP server
- require_confirmation : If set to True, an exception will be raised if a confirmation error is received
- routing_key : The routing key for your amqptopic
single_key
: Only send the field instead of the full event (expecting a field name as string)- username : Username for authentication on your AMQP server
use_ssl
: Use ssl for the connection, make sure to also set the correct port, usually 5671 (true
/false
)- message_hierarchical_output: Convert the message to hierarchical JSON, default: false
- message_with_type : Include the type in the sent message, default: false
- message_jsondict_as_string: Convert fields of type JSONDict (extra) as string, default: false
If no authentication should be used, leave username or password empty or null
.
- Useful to send events to a RabbitMQ exchange topic to be further processed in other platforms.
If routing key or exchange name are invalid or non existent, the message is accepted by the server but we receive no confirmation. If parameter require_confirmation is True and no confirmation is received, an error is raised.
The destination exchange and queue need to exist beforehand, with your preferred settings (e.g. durable, lazy queue. If the error message says that the message is "unroutable", the queue doesn't exist.
This output bot discards all incoming messages.
name
: blackholelookup
: nopublic
: yescache
: nodescription
: discards messages
name
:intelmq.bots.outputs.elasticsearch.output
lookup
: yespublic
: yescache
: nodescription
: Output Bot that sends events to Elasticsearch
Only ElasticSearch version 7 supported.
-
elastic_host
: Name/IP for the Elasticsearch server, defaults to 127.0.0.1 -
elastic_port
: Port for the Elasticsearch server, defaults to 9200 -
elastic_index
: Index for the Elasticsearch output, defaults to intelmq -
rotate_index
: If set, will index events using the date information associated with the event. Options: 'never', 'daily', 'weekly', 'monthly', 'yearly'. Using 'intelmq' as the elastic_index, the following are examples of the generated index names:'never' --> intelmq 'daily' --> intelmq-2018-02-02 'weekly' --> intelmq-2018-42 'monthly' --> intelmq-2018-02 'yearly' --> intelmq-2018
-
http_username
: HTTP basic authentication username -
http_password
: HTTP basic authentication password -
use_ssl
: Whether to use SSL/TLS when connecting to Elasticsearch. Default: False -
http_verify_cert
: Whether to require verification of the server's certificate. Default: False -
ssl_ca_certificate
: An optional path to a certificate bundle to use for verifying the server -
ssl_show_warnings
: Whether to show warnings if the server's certificate cannot be verified. Default: True -
replacement_char
: If set, dots ('.') in field names will be replaced with this character prior to indexing. This is for backward compatibility with ES 2.X. Default: null. Recommended for ES2.X: '_' -
flatten_fields
: In ES, some query and aggregations work better if the fields are flat and not JSON. Here you can provide a list of fields to convert. Can be a list of strings (fieldnames) or a string with field names separated by a comma (,). egextra,field2
or['extra', 'field2']
Default: ['extra']
See contrib/elasticsearch/elasticmapper
for a utility for creating Elasticsearch mappings and templates.
If using rotate_index
, the resulting index name will be of the form [elastic_index]-[event date].
To query all intelmq indices at once, use an alias (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html), or a multi-index query.
The data in ES can be retrieved with the HTTP-Interface:
> curl -XGET 'http://localhost:9200/intelmq/events/_search?pretty=True'
name:
filelookup:
nopublic:
yescache (redis db):
nonedescription:
output messages (reports or events) to file
Multihreading is disabled for this bot, as this would lead to corrupted files.
encoding_errors_mode
: By default'strict'
, see for more details and options: https://docs.python.org/3/library/functions.html#open For example with'backslashreplace'
all characters which cannot be properly encoded will be written escaped with backslashes.file
: file path of output file. Missing directories will be created if possible with the mode 755.format_filename
: Boolean if the filename should be formatted (default:false
).hierarchical_output
: If true, the resulting dictionary will be hierarchical (field names split by dot).single_key
: ifnone
, the whole event is saved (default); otherwise the bot saves only contents of the specified key. In case ofraw
the data is base64 decoded.
The filename can be formatted using pythons string formatting functions if format_filename
is set. See https://docs.python.org/3/library/string.html#formatstrings
For example:
- The filename
.../{event[source.abuse_contact]}.txt
will be (for example).../[email protected]
. .../{event[time.source]:%Y-%m-%d}
results in the date of the event used as filename.
If the field used in the format string is not defined, None
will be used as fallback.
name:
fileslookup:
nopublic:
yescache (redis db):
nonedescription:
saving of messages as separate files
dir
: output directory (default/opt/intelmq/var/lib/bots/files-output/incoming
)tmp
: temporary directory (must reside on the same filesystem asdir
) (default:/opt/intelmq/var/lib/bots/files-output/tmp
)suffix
: extension of created files (default.json
)hierarchical_output
: iftrue
, use nested dictionaries; iffalse
, use flat structure with dot separated keys (default)single_key
: ifnone
, the whole event is saved (default); otherwise the bot saves only contents of the specified key
name:
intelmq.bots.outputs.mcafee.output_esm_iplookup:
yespublic:
nocache (redis db):
nonedescription:
Writes information out to McAfee ESM watchlist
- Feed parameters (see above)
esm_ip
: IP address of ESM instanceesm_user
: username of user entitled to write to watchlistesm_pw
: password of useresm_watchlist
: name of the watchlist to write tofield
: name of the IntelMQ field to be written to ESM
name:
intelmq.bots.outputs.misp.output_feed
lookup:
nopublic:
nocache (redis db):
nonedescription:
Create a directory layout in the MISP Feed format
The PyMISP library >= 2.4.119.1 is required, see REQUIREMENTS.txt.
- Feed parameters (see above)
misp_org_name
: Org name which creates the event, stringmisp_org_uuid
: Org UUID which creates the event, stringoutput_dir
: Output directory path, e.g./opt/intelmq/var/lib/bots/mispfeed-output
. Will be created if it does not exist and possible.interval_event
: The output bot creates one event per each interval, all data in this time frame is part of this event. Default "1 hour", string.
Configure the destination directory of this feed as feed in MISP, either as local location, or served via a web server. See the MISP documentation on Feeds for more information
name:
intelmq.bots.outputs.misp.output_api
lookup:
nopublic:
nocache (redis db):
nonedescription:
Connect to a MISP instance and add event as MISPObject if not there already.
The PyMISP library >= 2.4.120 is required, see REQUIREMENTS.txt.
- Feed parameters (see above)
add_feed_provider_as_tag
: boolean (usetrue
when in doubt)add_feed_name_as_tag
: boolean (usetrue
when in doubt)misp_additional_correlation_fields
: list of fields for which the correlation flags will be enabled (in addition to those which are in significant_fields)misp_additional_tags
: list of tags to set not be searched for when looking for duplicatesmisp_key
: string, API key for accessing MISPmisp_publish
: boolean, if a new MISP event should be set to "publish". Expert setting as MISP may really make it "public"! (Usefalse
when in doubt.)misp_tag_for_bot
: string, used to mark MISP eventsmisp_to_ids_fields
: list of fields for which theto_ids
flags will be setmisp_url
: string, URL of the MISP serversignificant_fields
: list of intelmq field names
The significant_fields
values
will be searched for in all MISP attribute values
and if all values are found in the same MISP event, no new MISP event
will be created.
Instead if the existing MISP events have the same feed.provider
and match closely, their timestamp will be updated.
If a new MISP event is inserted the significant_fields
and the
misp_additional_correlation_fields
will be the attributes
where correlation is enabled.
Make sure to build the IntelMQ Botnet in a way the rate of incoming events is what MISP can handle, as IntelMQ can process many more events faster than MISP (which is by design as MISP is for manual handling). Also remove the fields of the IntelMQ events with an expert bot that you do not want to be inserted into MISP.
(More details can be found in the docstring of
output_api.py
).
Saves events in a MongoDB either as hierarchical structure or flat with full key names. time.observation
and time.source
are saved as datetime objects, not as ISO formatted string.
name:
mongodblookup:
nopublic:
yescache (redis db):
nonedescription:
MongoDB is the bot responsible to send events to a MongoDB database
collection
: MongoDB collectiondatabase
: MongoDB databasedb_user
: Database user that should be used if you enabled authenticationdb_pass
: Password associated todb_user
host
: MongoDB host (FQDN or IP)port
: MongoDB port, default: 27017hierarchical_output
: Boolean (default true) as MongoDB does not allow saving keys with dots, we split the dictionary in sub-dictionaries.replacement_char
: String (default'_'
) used as replacement character for the dots in key names if hierarchical output is not used.
pip3 install pymongo>=2.7.1
The bot has been tested with pymongo versions 2.7.1, 3.4 and 3.10.1 (server versions 2.6.10 and 3.6.8).
name:
intelmq.bots.outputs.redis.output
lookup:
to the Redis serverpublic:
yescache (redis db):
nonedescription:
Output Bot that sends events to a remote Redis server/queue.
redis_db
: remote server database, e.g.: 2redis_password
: remote server passwordredis_queue
: remote server list (queue), e.g.: "remote-server-queue"redis_server_ip
: remote server IP address, e.g.: 127.0.0.1redis_server_port
: remote server Port, e.g.: 6379redis_timeout
: Connection timeout, in milliseconds, e.g.: 50000hierarchical_output
: whether output should be sent in hierarchical JSON format (default: false)with_type
: Send the__type
field (default: true)
- Can be used to send events to be processed in another system. E.g.: send events to Logstash.
- In a multi tenant installation can be used to send events to external/remote IntelMQ instance. Any expert bot queue can receive the events.
- In a complex configuration can be used to create logical sets in IntelMQ-Manager.
name:
restapilookup:
nopublic:
yescache (redis db):
nonedescription:
REST API is the bot responsible to send events to a REST API listener through POST
auth_token
: the user name / HTTP header keyauth_token_name
: the password / HTTP header valueauth_type
: one of:"http_basic_auth"
,"http_header"
hierarchical_output
: booleanhost
: destination URLuse_json
: boolean
Sends a MIME Multipart message containing the text and the event as CSV for every single event.
name:
smtplookup:
nopublic:
yescache (redis db):
nonedescription:
Sends events via SMTP
fieldnames
: a list of field names to be included in the email, comma separated string or list of stringsmail_from
: string. Supports formatting, see belowmail_to
: string of email addresses, comma separated. Supports formatting, see belowsmtp_host
: stringsmtp_password
: string or null, Password for authentication on your SMTP serversmtp_port
: portsmtp_username
: string or null, Username for authentication on your SMTP serverssl
: booleanstarttls
: booleansubject
: string. Supports formatting, see belowtext
: string or null. Supports formatting, see below
For several strings you can use values from the string using the
standard Python string format syntax.
Access the event's values with {ev[source.ip]}
and similar. Any not existing fields will result in None
.
Authentication is optional. If both username and password are given, these mechanism are tried: CRAM-MD5, PLAIN, and LOGIN.
Client certificates are not supported. If http_verify_cert
is true, TLS certificates are checked.
name:
sqllookup:
nopublic:
yescache (redis db):
nonedescription:
SQL is the bot responsible to send events to a PostgreSQL or SQLite Databasenotes
: When activating autocommit, transactions are not used: http://initd.org/psycopg/docs/connection.html#connection.autocommit
The parameters marked with 'PostgreSQL' will be sent to libpq via psycopg2. Check the [libpq parameter documentation] (https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS) for the versions you are using.
autocommit
: psycopg's autocommit mode, optional, default Trueconnect_timeout
: Database connect_timeout, optional, default 5 secondsengine
: 'postgresql' or 'sqlite'database
: PostgreSQL database or SQLite filehost
: PostgreSQL hostjsondict_as_string
: save JSONDict fields as JSON string, boolean. Default: true (like in versions before 1.1)port
: PostgreSQL portuser
: PostgreSQL userpassword
: PostgreSQL passwordsslmode
: PostgreSQL sslmode, can be'disable'
,'allow'
,'prefer'
(default),'require'
,'verify-ca'
or'verify-full'
. See postgresql docs: https://www.postgresql.org/docs/current/static/libpq-connect.html#libpq-connect-sslmodetable
: name of the database table into which events are to be inserted
Similarly to PostgreSQL, you can use intelmq_psql_initdb
to create initial SQL statements
from harmonization.conf
. The script will create the required table layout
and save it as /tmp/initdb.sql
.
Create the new database (you can ignore all errors since SQLite doesn't know all SQL features generated for PostgreSQL):
sqlite3 your-db.db
sqlite> .read /tmp/initdb.sql
You have two basic choices to run PostgreSQL:
- on the same machine as intelmq, then you could use Unix sockets if available on your platform
- on a different machine. In which case you would need to use a TCP connection and make sure you give the right connection parameters to each psql or client call.
Make sure to consult your PostgreSQL documentation about how to allow network connections and authentication in case 2.
Any supported version of PostgreSQL should work (v>=9.2 as of Oct 2016)[1].
If you use PostgreSQL server v >= 9.4, it gives you the possibility to use the time-zone formatting string "OF" for date-times and the GiST index for the CIDR type. This may be useful depending on how you plan to use the events that this bot writes into the database.
Use intelmq_psql_initdb
to create initial SQL statements
from harmonization.conf
. The script will create the required table layout
and save it as /tmp/initdb.sql
You need a PostgreSQL database-user to own the result database.
The recommendation is to use the name intelmq
.
There may already be such a user for the PostgreSQL database-cluster
to be used by other bots. (For example from setting up
the expert/certbund_contact bot.)
Therefore if still necessary: create the database-user
as postgresql superuser, which usually is done via the system user postgres
:
createuser --no-superuser --no-createrole --no-createdb --encrypted --pwprompt intelmq
Create the new database:
createdb --encoding='utf-8' --owner=intelmq intelmq-events
(The encoding parameter should ensure the right encoding on platform where this is not the default.)
Now initialize it as database-user intelmq
(in this example
a network connection to localhost is used, so you would get to test
if the user intelmq
can authenticate):
psql -h localhost intelmq-events intelmq </tmp/initdb.sql
name
: intelmq.bots.outputs.stomp.outputlookup
: yespublic
: yescache (redis db)
: nonedescription
: This collector will push data to any STOMP stream. STOMP stands for Streaming Text Oriented Messaging Protocol. See: https://en.wikipedia.org/wiki/Streaming_Text_Oriented_Messaging_Protocol
Install the stomp.py library, e.g. apt install python3-stomp.py
or pip install stomp.py
.
You need a CA certificate, client certificate and key file from the organization / server you are connecting to. Also you will need a so called "exchange point".
exchange
: The exchange to push atheartbeat
: default: 60000message_hierarchical_output
: Boolean, default: falsemessage_jsondict_as_string
: Boolean, default: falsemessage_with_type
: Boolean, default: falseport
: Integer, default: 61614server
: Host or IP address of the STOMP serversingle_key
: Boolean or string (field name), default: falsessl_ca_certificate
: path to CA filessl_client_certificate
: path to client cert filessl_client_certificate_key
: path to client cert key file
name:
intelmq.bots.outputs.tcp.outputlookup:
nopublic:
yescache (redis db):
nonedescription:
TCP is the bot responsible to send events to a TCP port (Splunk, another IntelMQ, etc..).
Multihreading is disabled for this bot.
counterpart_is_intelmq
: Boolean. If you are sending to an IntelMQ TCP collector, set this to True, otherwise e.g. with filebeat, set it to false.ip
: IP of destination serverhierarchical_output
: true for a nested JSON, false for a flat JSON (when sending to a TCP collector).port
: port of destination serverseparator
: separator of messages, e.g. "\n", optional. When sending to a TCP collector, parameter shouldn't be present. In that case, the output waits every message is acknowledged by "Ok" message the TCP collector bot implements.
name:
intelmq.bots.outputs.touch.outputlookup:
nopublic:
yescache (redis db):
nonedescription:
Touches a file for every event received.
path
: Path to the file to touch.
name:
intelmq.bots.outputs.udp.outputlookup:
nopublic:
yescache (redis db):
nonedescription:
Output Bot that sends events to a remote UDP server.
Multihreading is disabled for this bot.
field_delimiter
: If the format is 'delimited' this will be added between fields. String, default:"|"
format
: Can be'json'
or'delimited'
. The JSON format outputs the event 'as-is'. Delimited will deconstruct the event and print each field:value separated by the field delimit. See examples bellow.header
: Header text to be sent in the UDP datagram, string.keep_raw_field
: boolean, default: falseudp_host
: Destination's server's Host name or IP addressudp_port
: Destination port
Consider the following event:
{"raw": "MjAxNi8wNC8yNV8xMTozOSxzY2hpenppbm8ub21hcmF0aG9uLmNvbS9na0NDSnVUSE0vRFBlQ1pFay9XdFZOSERLbC1tWFllRk5Iai8sODUuMjUuMTYwLjExNCxzdGF0aWMtaXAtODUtMjUtMTYwLTExNC5pbmFkZHIuaXAtcG9vbC5jb20uLEFuZ2xlciBFSywtLDg5NzI=", "source": {"asn": 8972, "ip": "85.25.160.114", "url": "http://schizzino.omarathon.com/gkCCJuTHM/DPeCZEk/WtVNHDKl-mXYeFNHj/", "reverse_dns": "static-ip-85-25-160-114.inaddr.ip-pool.com"}, "classification": {"type": "malware"}, "event_description": {"text": "Angler EK"}, "feed": {"url": "http://www.malwaredomainlist.com/updatescsv.php", "name": "Malware Domain List", "accuracy": 100.0}, "time": {"observation": "2016-04-29T10:59:34+00:00", "source": "2016-04-25T11:39:00+00:00"}}
With the following Parameters:
- field_delimiter : |
- format : json
- Header : header example
- keep_raw_field : true
- ip : 127.0.0.1
- port : 514
Resulting line in syslog:
Apr 29 11:01:29 header example {"raw": "MjAxNi8wNC8yNV8xMTozOSxzY2hpenppbm8ub21hcmF0aG9uLmNvbS9na0NDSnVUSE0vRFBlQ1pFay9XdFZOSERLbC1tWFllRk5Iai8sODUuMjUuMTYwLjExNCxzdGF0aWMtaXAtODUtMjUtMTYwLTExNC5pbmFkZHIuaXAtcG9vbC5jb20uLEFuZ2xlciBFSywtLDg5NzI=", "source": {"asn": 8972, "ip": "85.25.160.114", "url": "http://schizzino.omarathon.com/gkCCJuTHM/DPeCZEk/WtVNHDKl-mXYeFNHj/", "reverse_dns": "static-ip-85-25-160-114.inaddr.ip-pool.com"}, "classification": {"type": "malware"}, "event_description": {"text": "Angler EK"}, "feed": {"url": "http://www.malwaredomainlist.com/updatescsv.php", "name": "Malware Domain List", "accuracy": 100.0}, "time": {"observation": "2016-04-29T10:59:34+00:00", "source": "2016-04-25T11:39:00+00:00"}}
With the following Parameters:
- field_delimiter : |
- format : delimited
- Header : IntelMQ-event
- keep_raw_field : false
- ip : 127.0.0.1
- port : 514
Resulting line in syslog:
Apr 29 11:17:47 localhost IntelMQ-event|source.ip: 85.25.160.114|time.source:2016-04-25T11:39:00+00:00|feed.url:http://www.malwaredomainlist.com/updatescsv.php|time.observation:2016-04-29T11:17:44+00:00|source.reverse_dns:static-ip-85-25-160-114.inaddr.ip-pool.com|feed.name:Malware Domain List|event_description.text:Angler EK|source.url:http://schizzino.omarathon.com/gkCCJuTHM/DPeCZEk/WtVNHDKl-mXYeFNHj/|source.asn:8972|classification.type:malware|feed.accuracy:100.0
name:
intelmq.bots.outputs.xmpp.collectorlookup:
yespublic:
yescache (redis db):
nonedescription:
The XMPP Output is capable of sending Messages to XMPP Rooms and as direct messages.
The Sleekxmpp - Library needs to be installed on your System
pip3 install -r intelmq/bots/collectors/xmpp/REQUIREMENTS.txt
xmpp_user
: The username of the XMPP-Account the output shall use (part before the @ sign)xmpp_server
: The domain name of the server of the XMPP-Account (part after the @ sign)xmpp_password
: The password of the XMPP-Accountxmpp_to_user
: The username of the receiverxmpp_to_server
: The domain name of the receiverxmpp_room
: The room which has to be joined by the output (full address [email protected])xmpp_room_nick
: The username / nickname the output shall use within the room.xmpp_room_password
: The password which might be required to join a roomuse_muc
: If this parameter istrue
, the bot will join the roomxmpp_room
.ca_certs
: A path to a file containing the CA's which should be used