-
Notifications
You must be signed in to change notification settings - Fork 451
Merge custom and core multi_fields array #982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
eae6bf5
9898651
f0746ae
d1953cf
2ca5584
23cc12c
cb0e5b9
6eb01c0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -171,6 +171,26 @@ def nest_fields(field_array): | |
| return schema_root | ||
|
|
||
|
|
||
| def array_of_dicts_to_set(array_vals): | ||
| ret_set = set() | ||
| for dict_val in array_vals: | ||
| ret_set.add(frozenset(dict_val.items())) | ||
| return ret_set | ||
|
|
||
|
|
||
| def set_of_sets_to_array(set_vals): | ||
| ret_list = [] | ||
| for set_info in set_vals: | ||
| ret_list.append(dict(set_info)) | ||
| return sorted(ret_list, key=lambda k: k['name']) | ||
|
|
||
|
|
||
| def dedup_and_merge_lists(list_a, list_b): | ||
| list_a_set = array_of_dicts_to_set(list_a) | ||
| list_b_set = array_of_dicts_to_set(list_b) | ||
| return set_of_sets_to_array(list_a_set | list_b_set) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor issue I stumbled across while testing this out. Not sure it would be a blocker to merging, but worth noting the behavior. The union will remove exact duplicate items: But if the sets are not exact duplicates, it could lead to duplicate field names: schema include file: ---
- name: file
title: File
group: 2
short: Fields describing files.
description: >
Custom file
fields:
- name: path
multi_fields:
- name: caseless
type: keyword
normalizer: lowercase
- name: text
type: keyword <= I imagine this would only happen by accident 😃Resulting intermediate state: multi_fields:
- flat_name: file.path.caseless
ignore_above: 1024
name: caseless
normalizer: lowercase
type: keyword
- flat_name: file.path.text
ignore_above: 1024
name: text
type: keyword
- flat_name: file.path.text
name: text
norms: false
type: text
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh good catch, what do we think the expected behavior should be in this scenario? I could put in a check to ensure that two of the same
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO we should dedupe on
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @webmat do you have any thoughts? I recall back in #864, logic was removed from the tooling to allow
Perhaps we simply make sure to note that users need to be aware of introducing such duplicates fields?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @madirey. We should keep it simple and only ensure we have unique multi-field names. The But to take a concrete example, let's say someone has tuned a normalizer that works well for user agent strings, I want them to be able to replace the default multi_fields:
- name: text
norms: false
type: text
normalizer: ua_normalizer I think I have a preference with merging the pre-existing multi-field definitions of the same name, as this is more in line with how everything else is handled with custom fields. And it has the bonus of allowing a more terse custom definition: - name: text
normalizer: ua_normalizer |
||
|
|
||
|
|
||
| def merge_fields(a, b): | ||
| """Merge ECS field sets with custom field sets.""" | ||
| a = copy.deepcopy(a) | ||
|
|
@@ -184,6 +204,14 @@ def merge_fields(a, b): | |
| a[key].setdefault('field_details', {}) | ||
| a[key]['field_details'].setdefault('normalize', []) | ||
| a[key]['field_details']['normalize'].extend(b[key]['field_details'].pop('normalize')) | ||
| if 'multi_fields' in b[key]['field_details']: | ||
| a[key].setdefault('field_details', {}) | ||
| a[key]['field_details'].setdefault('multi_fields', set()) | ||
|
ebeahan marked this conversation as resolved.
Outdated
|
||
| a[key]['field_details']['multi_fields'] = dedup_and_merge_lists( | ||
| a[key]['field_details']['multi_fields'], b[key]['field_details']['multi_fields']) | ||
| # if we don't do this then the update call below will overwrite a's field_details, with the original | ||
| # contents of b, which undoes our merging the multi_fields | ||
| del b[key]['field_details']['multi_fields'] | ||
| a[key]['field_details'].update(b[key]['field_details']) | ||
| # merge schema details | ||
| if 'schema_details' in b[key]: | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.