Skip to content

Conversation

@mergify
Copy link
Contributor

@mergify mergify bot commented Sep 17, 2024

Proposed commit message

  • Rework the input's invalid JSON sanitization feature to support configurable sanitizers
  • Migrate existing sanitizers to the new format
  • Add a new replace_all sanitizer
  • Add a new invalid_json_messages_total input metric to count the number of messages that contain invalid JSON strings.

The replace_all sanitizer replaces all the occurrences of substring matching the regex expression pattern with a fixed literal string replacement.

Here is a sample configuration:

sanitizers:
- type: replace_all
  spec:
    pattern: '\[\s*([^\[\]{},\s]+(?:\s+[^\[\]{},\s]+)*)\s*\]'
    replacement: "{}"

For example, if the diagnostic settings send the following message:

{
    "AppImage": "orcas/postgres_standalone_16_u18:38.1.240825",
    "AppType": "PostgreSQL",
    "properties": [
        218 B blob data
    ]
}

With the previous sample configuration, the input will replace the invalid JSON, updating the message to the following version:

{
    "AppImage": "orcas/postgres_standalone_16_u18:38.1.240825",
    "AppType": "PostgreSQL",
    "properties": {}
}

The replace_all sanitizer aims to restore JSON syntax validity by replacing invalid, unfixable JSON with literal values that make sense for the context.

The invalid JSON sanitization became a thing when we faced Azure services producing malformed JSON documents.

Here is a list of previous occurrences of this problem:

In all cases, we first reached out to Azure support, but they didn't fix the invalid JSON.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs


This is an automatic backport of pull request #40742 done by [Mergify](https://mergify.com).

- Rework the input's invalid JSON sanitization feature to support configurable sanitizers
- Migrate existing sanitizers to the new format
- Add a new `replace_all` sanitizer
- Add a new `invalid_json_messages_total` input metric to count the number of messages that contain invalid JSON strings.

The `replace_all` sanitizer replaces all the occurrences of substring matching the regex expression `pattern` with a fixed literal string `replacement`.

Here is a sample configuration:

```yaml
sanitizers:
- type: replace_all
  spec:
    pattern: '\[\s*([^\[\]{},\s]+(?:\s+[^\[\]{},\s]+)*)\s*\]'
    replacement: "{}"
```

For example, if the diagnostic settings send the following message:

```json
{
    "AppImage": "orcas/postgres_standalone_16_u18:38.1.240825",
    "AppType": "PostgreSQL",
    "properties": [
        218 B blob data
    ]
}
```

With the previous sample configuration, the input will replace the invalid JSON, updating the message to the following version:

```json
{
    "AppImage": "orcas/postgres_standalone_16_u18:38.1.240825",
    "AppType": "PostgreSQL",
    "properties": {}
}
```

The `replace_all` sanitizer aims to restore JSON syntax validity by replacing invalid, unfixable JSON with literal values that make sense for the context.

The invalid JSON sanitization became a thing when we faced Azure services producing malformed JSON documents.

Here is a list of previous occurrences of this problem:

- #34092
- https://github.com/elastic/azuremarketplacedev/issues/190

In all cases, we first contacted Azure support, but they didn't fix the invalid JSON.

---------

Co-authored-by: Gabriel Pop <[email protected]>
(cherry picked from commit 168b13a)
@mergify mergify bot added the backport label Sep 17, 2024
@mergify mergify bot requested a review from a team as a code owner September 17, 2024 10:35
@mergify mergify bot assigned zmoog Sep 17, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 17, 2024
@botelastic
Copy link

botelastic bot commented Sep 17, 2024

This pull request doesn't have a Team:<team> label.

@zmoog zmoog merged commit dd47d36 into 8.x Sep 17, 2024
@zmoog zmoog deleted the mergify/bp/8.x/pr-40742 branch September 17, 2024 14:56
@khushijain21 khushijain21 mentioned this pull request Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport needs_team Indicates that the issue/PR needs a Team:* label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants