Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
cc31da1
WIP: fingerprint processor
ycombinator Oct 22, 2019
ecc4772
Implementing SHA256 fingerprinter
ycombinator Oct 23, 2019
0d2af8c
Sort source fields
ycombinator Oct 23, 2019
00a3575
Refactoring
ycombinator Oct 23, 2019
a6afbc9
Add TODO
ycombinator Oct 23, 2019
7521e11
Convert time fields to UTC
ycombinator Oct 29, 2019
d8b5958
Removing unnecessary function
ycombinator Oct 29, 2019
a6b6156
Adding SHA1
ycombinator Oct 29, 2019
85ef943
WIP: add encoding
ycombinator Oct 29, 2019
8cda4be
Cleanup
ycombinator Oct 29, 2019
3c75d3b
Running mage fmt
ycombinator Oct 29, 2019
da29e8d
More methods + consolidating tests
ycombinator Oct 29, 2019
52e5110
Fleshing out tests
ycombinator Oct 30, 2019
92af70c
Adding test for target field
ycombinator Oct 30, 2019
b1981e8
Adding documentation
ycombinator Oct 30, 2019
3a2825c
Adding CHANGELOG entry
ycombinator Oct 30, 2019
9a0be57
Fixing test
ycombinator Oct 30, 2019
b2ecab6
Converting tests to map
ycombinator Oct 30, 2019
217e318
Isolating tests
ycombinator Oct 30, 2019
6479e84
Use io.Writer to stream in fields
ycombinator Oct 30, 2019
661f891
Implement ignore_missing setting
ycombinator Oct 31, 2019
ce27088
Replace table with definition list
ycombinator Oct 31, 2019
2d17110
Adding `ignore_missing` to doc
ycombinator Oct 31, 2019
ba390ad
using io.Fprintf
ycombinator Oct 31, 2019
de73d0d
Use common.StringSet
ycombinator Oct 31, 2019
a6be2ab
Adding typed errors
ycombinator Nov 1, 2019
c65da9a
Adding more typed errors
ycombinator Nov 1, 2019
5f577f4
Adding license header
ycombinator Nov 1, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
- Add `keep_null` setting to allow Beats to publish null values in events. {issue}5522[5522] {pull}13928[13928]
- Add shared_credential_file option in aws related config for specifying credential file directory. {issue}14157[14157] {pull}14178[14178]
- GA the `script` processor. {pull}14325[14325]
- Add `fingerprint` processor. {issue}11173[11173] {pull}14205[14205]

*Auditbeat*

Expand Down
21 changes: 21 additions & 0 deletions libbeat/docs/processors-using.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -1184,6 +1184,27 @@ The following settings are supported:
empty array (`[]`) or an empty object (`{}`) are considered
empty values. Default is `false`.

[[fingerprint]]
=== Generate a fingerprint of an event

The `fingerprint` processor generates a fingerprint of an event based on a
specified subset of its fields.

[source,yaml]
-----------------------------------------------------
processors:
- fingerprint:
fields: ["field1", "field2", ...]
-----------------------------------------------------

The following settings are supported:

`fields`:: List of fields to use as the source for the fingerprint.
`ignore_missing`:: (Optional) Whether to ignore missing fields. Default is `false`.
`target_field`:: (Optional) Field in which the generated fingerprint should be stored. Default is `fingerprint`.
`method`:: (Optional) Algorithm to use for computing the fingerprint. Must be one of: `md5`, `sha1`, `sha256`, `sha384`, `sha512`. Default is `sha256`.
`encoding`:: (Optional) Encoding to use on the fingerprint value. Must be one of `hex`, `base32`, or `base64`. Default is `hex`.

[[include-fields]]
=== Keep fields from events

Expand Down
36 changes: 36 additions & 0 deletions libbeat/processors/fingerprint/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
// Licensed to Elasticsearch B.V. under one or more contributor
// license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright
// ownership. Elasticsearch B.V. licenses this file to you under
// the Apache License, Version 2.0 (the "License"); you may
// not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

package fingerprint

// Config for fingerprint processor.
type Config struct {
Method hashMethod `config:"method"` // Hash function to use for fingerprinting
Fields []string `config:"fields" validate:"required"` // Source fields to compute fingerprint from
TargetField string `config:"target_field"` // Target field for the fingerprint
Encoding encodingMethod `config:"encoding"` // Encoding to use for target field value
IgnoreMissing bool `config:"ignore_missing"` // Ignore missing fields?
}
Comment thread
urso marked this conversation as resolved.
Outdated

func defaultConfig() Config {
return Config{
Method: hashes["sha256"],
TargetField: "fingerprint",
Encoding: encodings["hex"],
IgnoreMissing: false,
}
}
46 changes: 46 additions & 0 deletions libbeat/processors/fingerprint/encode.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
// Licensed to Elasticsearch B.V. under one or more contributor
// license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright
// ownership. Elasticsearch B.V. licenses this file to you under
// the Apache License, Version 2.0 (the "License"); you may
// not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

package fingerprint

import (
"encoding/base32"
"encoding/base64"
"encoding/hex"
"strings"
)

type encodingMethod func([]byte) string

var encodings = map[string]encodingMethod{
"hex": hex.EncodeToString,
"base32": base32.StdEncoding.EncodeToString,
"base64": base64.StdEncoding.EncodeToString,
}

// Unpack creates the encodingMethod from the given string
func (e *encodingMethod) Unpack(str string) error {
str = strings.ToLower(str)

m, found := encodings[str]
if !found {
return makeErrUnknownEncoding(str)
}

*e = m
return nil
}
79 changes: 79 additions & 0 deletions libbeat/processors/fingerprint/errors.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
// Licensed to Elasticsearch B.V. under one or more contributor
// license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright
// ownership. Elasticsearch B.V. licenses this file to you under
// the Apache License, Version 2.0 (the "License"); you may
// not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

package fingerprint

import (
"errors"
"fmt"
)

var errNoFields = errors.New("must specify at least one field")

type (
errUnknownEncoding struct{ encoding string }
errUnknownMethod struct{ method string }
errConfigUnpack struct{ cause error }
errComputeFingerprint struct{ cause error }
errMissingField struct {
field string
cause error
}
errNonScalarField struct{ field string }
)

func makeErrUnknownEncoding(encoding string) errUnknownEncoding {
return errUnknownEncoding{encoding}
}
func (e errUnknownEncoding) Error() string {
return fmt.Sprintf("invalid encoding [%s]", e.encoding)
}

func makeErrUnknownMethod(method string) errUnknownMethod {
return errUnknownMethod{method}
}
func (e errUnknownMethod) Error() string {
return fmt.Sprintf("invalid fingerprinting method [%s]", e.method)
}

func makeErrConfigUnpack(cause error) errConfigUnpack {
return errConfigUnpack{cause}
}
func (e errConfigUnpack) Error() string {
return fmt.Sprintf("failed to unpack %v processor configuration: %v", processorName, e.cause)
}

func makeErrComputeFingerprint(cause error) errComputeFingerprint {
return errComputeFingerprint{cause}
}
func (e errComputeFingerprint) Error() string {
return fmt.Sprintf("failed to compute fingerprint: %v", e.cause)
}

func makeErrMissingField(field string, cause error) errMissingField {
return errMissingField{field, cause}
}
func (e errMissingField) Error() string {
return fmt.Sprintf("failed to find field [%v] in event: %v", e.field, e.cause)
}

func makeErrNonScalarField(field string) errNonScalarField {
return errNonScalarField{field}
}
func (e errNonScalarField) Error() string {
return fmt.Sprintf("cannot compute fingerprint using non-scalar field [%v]", e.field)
}
111 changes: 111 additions & 0 deletions libbeat/processors/fingerprint/fingerprint.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
// Licensed to Elasticsearch B.V. under one or more contributor
// license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright
// ownership. Elasticsearch B.V. licenses this file to you under
// the Apache License, Version 2.0 (the "License"); you may
// not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

package fingerprint

import (
"fmt"
"hash"
"io"
"time"

"github.com/elastic/beats/libbeat/beat"
"github.com/elastic/beats/libbeat/common"
"github.com/elastic/beats/libbeat/processors"
jsprocessor "github.com/elastic/beats/libbeat/processors/script/javascript/module/processor"
)

func init() {
processors.RegisterPlugin("fingerprint", New)
jsprocessor.RegisterPlugin("Fingerprint", New)
}

const processorName = "fingerprint"

type fingerprint struct {
config Config
fields []string
hash hash.Hash
}

// New constructs a new fingerprint processor.
func New(cfg *common.Config) (processors.Processor, error) {
config := defaultConfig()
if err := cfg.Unpack(&config); err != nil {
return nil, makeErrConfigUnpack(err)
}

fields := common.MakeStringSet(config.Fields...)

p := &fingerprint{
config: config,
hash: config.Method(),
fields: fields.ToSlice(),
}

return p, nil
}

// Run enriches the given event with fingerprint information
func (p *fingerprint) Run(event *beat.Event) (*beat.Event, error) {
hashFn := p.hash
hashFn.Reset()

err := p.writeFields(hashFn, event.Fields)
if err != nil {
return nil, makeErrComputeFingerprint(err)
}

hash := hashFn.Sum(nil)
encodedHash := p.config.Encoding(hash)

if _, err = event.PutValue(p.config.TargetField, encodedHash); err != nil {
return nil, makeErrComputeFingerprint(err)
}

return event, nil
}

func (p *fingerprint) String() string {
return fmt.Sprintf("%v=[method=[%v]]", processorName, p.config.Method)
}

func (p *fingerprint) writeFields(to io.Writer, eventFields common.MapStr) error {
for _, k := range p.fields {
v, err := eventFields.GetValue(k)
if err != nil {
if p.config.IgnoreMissing {
continue
}
return makeErrMissingField(k, err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have an option to ignore missing fields in case we have at least one field present?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but is this the same as your suggestion in #14205 (comment) or something different?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost. My suggestion originally was to add support to ignore missing fields. But apparently we can have other error types as well. Would it make sense to treat those other types as 'missing' as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So if we can't "get" a field for whatever reason, we treat it as missing and then, if the missing_fields option is set, ignore it. Hmm, I think this makes sense but let me just look into what other types of errors (besides common.ErrKeyNotFound) might be returned here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other error that can be returned here is if we try to get a value for a nested field (e.g. a.b.c), but the ancestor path to the field (a.b) does not resolve to a map. To me this also feels like a missing field as, even in this case, we still could not find a.b.c for the user. So I'm okay with collapsing the two error cases into one and adding ignore_missing handling to it.


i := v
switch vv := v.(type) {
case map[string]interface{}, []interface{}, common.MapStr:
return makeErrNonScalarField(k)
case time.Time:
// Ensure we consistently hash times in UTC.
i = vv.UTC()
}

fmt.Fprintf(to, "|%v|%v", k, i)
}

io.WriteString(to, "|")
return nil
}
Comment thread
urso marked this conversation as resolved.
Outdated
Loading