Skip to content
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
## 3.2.0
- Feat: ECS compatibility [#55](https://github.com/logstash-plugins/logstash-input-imap/pull/55)
* added (optional) `headers_target` configuration option
* added (optional) `attachments_target` configuration option
- Fix: plugin should not close `$stdin`, while stoping
- Fix: make sure the 'Date' header is skipped regardless of the `lowercase_headers` setting

## 3.1.0
- Adds an option to recursively search the message parts for attachment and inline attachment filenames. If the save_attachments option is set to true, the content of attachments is included in the `attachments.data` field. The attachment data can then be used by the Elasticsearch Ingest Attachment Processor Plugin.
[#48](https://github.com/logstash-plugins/logstash-input-imap/pull/48)
Expand Down
50 changes: 48 additions & 2 deletions docs/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,15 @@ Read mails from IMAP server
Periodically scan an IMAP folder (`INBOX` by default) and move any read messages
to the trash.

[id="plugins-{type}s-{plugin}-ecs"]
==== Compatibility with the Elastic Common Schema (ECS)

The plugin includes sensible defaults that change based on <<plugins-{type}s-{plugin}-ecs_compatibility,ECS compatibility mode>>.
When ECS compatibility is disabled, mail headers and attachments are targeted at the root level.
When targeting an ECS version, headers and attachments target `@metadata` sub-fields unless configured otherwise in order
to avoid conflict with ECS fields.
See <<plugins-{type}s-{plugin}-headers_target>>, and <<plugins-{type}s-{plugin}-attachments_target>>.

[id="plugins-{type}s-{plugin}-options"]
==== Imap Input Configuration Options

Expand All @@ -34,12 +43,15 @@ This plugin supports the following configuration options plus the <<plugins-{typ
[cols="<,<,<",options="header",]
|=======================================================================
|Setting |Input type|Required
| <<plugins-{type}s-{plugin}-attachments_target>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-check_interval>> |<<number,number>>|No
| <<plugins-{type}s-{plugin}-content_type>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-delete>> |<<boolean,boolean>>|No
| <<plugins-{type}s-{plugin}-ecs_compatibility>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-expunge>> |<<boolean,boolean>>|No
| <<plugins-{type}s-{plugin}-fetch_count>> |<<number,number>>|No
| <<plugins-{type}s-{plugin}-folder>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-headers_target>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-host>> |<<string,string>>|Yes
| <<plugins-{type}s-{plugin}-lowercase_headers>> |<<boolean,boolean>>|No
| <<plugins-{type}s-{plugin}-password>> |<<password,password>>|Yes
Expand All @@ -58,6 +70,16 @@ input plugins.

&nbsp;

[id="plugins-{type}s-{plugin}-attachments_target"]
===== `attachments_target`

* Value type is <<string,string>>
* Default value depends on whether <<plugins-{type}s-{plugin}-ecs_compatibility>> is enabled:
** ECS Compatibility disabled: no default value for this setting
** ECS Compatibility enabled: `"[@metadata][input][imap][attachments]"

The name of the field under which mail attachments information will be added, if <<plugins-{type}s-{plugin}-save_attachments>> is set.

[id="plugins-{type}s-{plugin}-check_interval"]
===== `check_interval`

Expand All @@ -72,8 +94,7 @@ input plugins.
* Value type is <<string,string>>
* Default value is `"text/plain"`

For multipart messages, use the first part that has this
content-type as the event message.
For multipart messages, use the first part that has this content-type as the event message.

[id="plugins-{type}s-{plugin}-delete"]
===== `delete`
Expand All @@ -83,6 +104,21 @@ content-type as the event message.



[id="plugins-{type}s-{plugin}-ecs_compatibility"]
===== `ecs_compatibility`

* Value type is <<string,string>>
* Supported values are:
** `disabled`: does not use ECS-compatible field names (for example, `From` header field is added to the event)
** `v1`, `v8`: avoids field names that might conflict with Elastic Common Schema (for example, the `From` header is added as metadata)
* Default value depends on which version of Logstash is running:
** When Logstash provides a `pipeline.ecs_compatibility` setting, its value is used as the default
** Otherwise, the default value is `disabled`.

Controls this plugin's compatibility with the {ecs-ref}[Elastic Common Schema (ECS)].
The value of this setting affects the _default_ value of <<plugins-{type}s-{plugin}-headers_target>> and
<<plugins-{type}s-{plugin}-attachments_target>>.

[id="plugins-{type}s-{plugin}-expunge"]
===== `expunge`

Expand All @@ -107,6 +143,16 @@ content-type as the event message.



[id="plugins-{type}s-{plugin}-headers_target"]
===== `headers_target`

* Value type is <<string,string>>
* Default value depends on whether <<plugins-{type}s-{plugin}-ecs_compatibility>> is enabled:
** ECS Compatibility disabled: no default value for this setting
** ECS Compatibility enabled: `"[@metadata][input][imap][headers]"

The name of the field under which mail headers will be added.

[id="plugins-{type}s-{plugin}-host"]
===== `host`

Expand Down
90 changes: 67 additions & 23 deletions lib/logstash/inputs/imap.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,22 @@
require "logstash/namespace"
require "logstash/timestamp"
require "stud/interval"
require "socket" # for Socket.gethostname
require 'fileutils'

require 'logstash/plugin_mixins/ecs_compatibility_support'
require 'logstash/plugin_mixins/ecs_compatibility_support/target_check'
require 'logstash/plugin_mixins/validator_support/field_reference_validation_adapter'

# Read mails from IMAP server
#
# Periodically scan an IMAP folder (`INBOX` by default) and move any read messages
# to the trash.
class LogStash::Inputs::IMAP < LogStash::Inputs::Base

include LogStash::PluginMixins::ECSCompatibilitySupport(:disabled, :v1, :v8 => :v1)

extend LogStash::PluginMixins::ValidatorSupport::FieldReferenceValidationAdapter

config_name "imap"

default :codec, "plain"
Expand All @@ -24,15 +33,23 @@ class LogStash::Inputs::IMAP < LogStash::Inputs::Base

config :folder, :validate => :string, :default => 'INBOX'
config :fetch_count, :validate => :number, :default => 50
config :lowercase_headers, :validate => :boolean, :default => true
config :check_interval, :validate => :number, :default => 300

config :lowercase_headers, :validate => :boolean, :default => true

config :headers_target, :validate => :field_reference # ECS default: [@metadata][input][imap][headers]

config :delete, :validate => :boolean, :default => false
config :expunge, :validate => :boolean, :default => false

config :strip_attachments, :validate => :boolean, :default => false
config :save_attachments, :validate => :boolean, :default => false

# For multipart messages, use the first part that has this
# content-type as the event message.
# Legacy default: [attachments]
# ECS default: [@metadata][input][imap][attachments]
config :attachments_target, :validate => :field_reference

# For multipart messages, use the first part that has this content-type as the event message.
config :content_type, :validate => :string, :default => "text/plain"

# Whether to use IMAP uid to track last processed message
Expand All @@ -41,6 +58,28 @@ class LogStash::Inputs::IMAP < LogStash::Inputs::Base
# Path to file with last run time metadata
config :sincedb_path, :validate => :string, :required => false

def initialize(*params)
super

if original_params.include?('headers_target')
@headers_target = normalize_field_ref(@headers_target)
else
@headers_target = '[@metadata][input][imap][headers]' if ecs_compatibility != :disabled
end

if original_params.include?('attachments_target')
@attachments_target = normalize_field_ref(@attachments_target)
else
@attachments_target = ecs_compatibility != :disabled ? '[@metadata][input][imap][attachments]' : '[attachments]'
end
end

def normalize_field_ref(target)
# so we can later event.set("#{target}[#{name}]", ...)
Comment thread
kares marked this conversation as resolved.
target.match?(/\A[^\[\]]+\z/) ? "[#{target}]" : target
end
private :normalize_field_ref

def register
require "net/imap" # in stdlib
require "mail" # gem 'mail'
Expand All @@ -63,14 +102,16 @@ def register
# Ensure that the filepath exists before writing, since it's deeply nested.
FileUtils::mkdir_p datapath
@sincedb_path = File.join(datapath, ".sincedb_" + Digest::MD5.hexdigest("#{@user}_#{@host}_#{@port}_#{@folder}"))
@logger.debug? && @logger.debug("Generated sincedb path", sincedb_path: @sincedb_path)
end
if File.directory?(@sincedb_path)
raise ArgumentError.new("The \"sincedb_path\" argument must point to a file, received a directory: \"#{@sincedb_path}\"")
end
@logger.info("Using \"sincedb_path\": \"#{@sincedb_path}\"")
Comment thread
yaauie marked this conversation as resolved.

if File.exist?(@sincedb_path)
if File.directory?(@sincedb_path)
raise ArgumentError.new("The \"sincedb_path\" argument must point to a file, received a directory: \"#{@sincedb_path}\"")
end
@logger.debug? && @logger.debug("Found existing sincedb path", sincedb_path: @sincedb_path)
@uid_last_value = File.read(@sincedb_path).to_i
@logger.info("Loading \"uid_last_value\": \"#{@uid_last_value}\"")
@logger.debug? && @logger.debug("Loaded from sincedb", uid_last_value: @uid_last_value)
end

@content_type_re = Regexp.new("^" + @content_type)
Expand Down Expand Up @@ -145,7 +186,7 @@ def check_mail(queue)
# Always save @uid_last_value so when tracking is switched from
# "NOT SEEN" to "UID" we will continue from first unprocessed message
if @uid_last_value
@logger.info("Saving \"uid_last_value\": \"#{@uid_last_value}\"")
@logger.debug? && @logger.debug("Saving to sincedb", uid_last_value: @uid_last_value)
File.write(@sincedb_path, @uid_last_value)
end
end
Expand All @@ -164,7 +205,8 @@ def parse_attachments(mail)

def parse_mail(mail)
# Add a debug message so we can track what message might cause an error later
@logger.debug? && @logger.debug("Working with message_id", :message_id => mail.message_id)
@logger.debug? && @logger.debug("Processing mail", message_id: mail.message_id)

# TODO(sissel): What should a multipart message look like as an event?
# For now, just take the plain-text part and set it as the message.
if mail.parts.count == 0
Expand All @@ -186,32 +228,35 @@ def parse_mail(mail)
# Add fields: Add message.header_fields { |h| h.name=> h.value }
mail.header_fields.each do |header|
# 'header.name' can sometimes be a Mail::Multibyte::Chars, get it in String form
name = @lowercase_headers ? header.name.to_s.downcase : header.name.to_s
name = header.name.to_s

# Assume we already processed the 'date' above.
next if name == "Date"

name = name.downcase if @lowercase_headers

# Call .decoded on the header in case it's in encoded-word form.
# Details at:
# https://github.com/mikel/mail/blob/master/README.md#encodings
# http://tools.ietf.org/html/rfc2047#section-2
value = transcode_to_utf8(header.decoded.to_s)

# Assume we already processed the 'date' above.
next if name == "Date"

case (field = event.get(name))
targeted_name = "#{@headers_target}[#{name}]"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example: pre-normalizing the @headers_target isn't strictly necessary, as the syntax allows for Composite References

Suggested change
targeted_name = "#{@headers_target}[#{name}]"
targeted_name = "[#{@headers_target}][#{name}]"

@kares kares Nov 23, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, would still rather keep the 'canonical' form here - subjectively find it less confusing than the composite.
we end up using what the user set with the exception of the bare foo form being wrapped ([foo]), the normalization code is similar to what we have in the field reference validator.

it's also clearer this way to handle the case when headers_target is nil -> [name]

@kares kares Nov 23, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this brings up a good question though, when user explicitly sets a headers_target => ''
think we should treat it as a way to disabled the target -> user does not want headers (already present in #56 but I think it makes sense to ship here) instead of treating it as if headers_target was not set and setting stuff at top level.

or does this sound confusing?
(would give us a way for users to opt-out of attachments_target once we ship email.* field support in the future)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that setting headers_target => '' should read as an explicit instruction to not place the headers anywhere, and distinctly different than not supplying a headers_target instruction. We will need to call this out in the docs, and will need to add specs to ensure that it passes validation.

When the email RFC gets finalized, we will need a separate way to opt into the RFC format, and when this future mode is active we can also inject better defaults for related options.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that setting headers_target => '' should read as an explicit instruction to not place the headers anywhere, and distinctly different than not supplying a headers_target instruction. We will need to call this out in the docs, and will need to add specs to ensure that it passes validation.

✔️ (pending 6.x compatibility headers_target => '' logstash-plugins/logstash-mixin-validator_support#2)

there's one extra change since last review - not skipping the date header (despite the header being duplicate information as it's also used for event.timestamp = LogStash::Timestamp.new(mail.date.to_time)) - initially considered it a bug but the date header was present with the plugins defaults due:

  • lowercase_headers being on by default
  • the plugin was checking the header name to == "Date" after the lower-case operation

so while it looks like a bug the "fix" (to properly skip the header regardless or lowercase_headers setting) might break pipelines thus an alternate resolution to never skip the date headers seems more appropriate.

case (field = event.get(targeted_name))
when String
# promote string to array if a header appears multiple times
# (like 'received')
event.set(name, [field, value])
# promote string to array if a header appears multiple times (like 'received')
event.set(targeted_name, [field, value])
when Array
field << value
event.set(name, field)
event.set(targeted_name, field)
when nil
event.set(name, value)
event.set(targeted_name, value)
end
end

# Add attachments
if attachments && attachments.length > 0
event.set('attachments', attachments)
event.set(@attachments_target, attachments)
end

decorate(event)
Expand All @@ -221,7 +266,6 @@ def parse_mail(mail)

def stop
Stud.stop!(@run_thread)
$stdin.close

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬 whoops. Good catch.

end

private
Expand Down
5 changes: 3 additions & 2 deletions logstash-input-imap.gemspec
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Gem::Specification.new do |s|

s.name = 'logstash-input-imap'
s.version = '3.1.0'
s.version = '3.2.0'
s.licenses = ['Apache License (2.0)']
s.summary = "Reads mail from an IMAP server"
s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
Expand All @@ -21,11 +21,12 @@ Gem::Specification.new do |s|

# Gem dependencies
s.add_runtime_dependency "logstash-core-plugin-api", ">= 1.60", "<= 2.99"
s.add_runtime_dependency 'logstash-mixin-ecs_compatibility_support', '~> 1.3'
s.add_runtime_dependency 'logstash-mixin-validator_support', '~> 1.0'
s.add_runtime_dependency 'logstash-codec-plain'
s.add_runtime_dependency 'mail', '~> 2.6.3'
s.add_runtime_dependency 'mime-types', '2.6.2'
s.add_runtime_dependency 'stud', '~> 0.0.22'

s.add_development_dependency 'logstash-devutils'
s.add_development_dependency 'insist'
end
Loading