Skip to content

Commit

Permalink
[extension/filestorage] replace path-unsafe characters in component n…
Browse files Browse the repository at this point in the history
…ames (#20896)

Fixes #3148

This change was initially created to only fix #20731 by replacing the
slash `/` characters in component names with a tilde `~`. After that, we
decided that it is a breaking change, and so it's better to fix other
characters as well in a single breaking change.
  • Loading branch information
andrzej-stencel authored Oct 3, 2023
1 parent 949f3e6 commit 91e56e4
Show file tree
Hide file tree
Showing 5 changed files with 170 additions and 2 deletions.
16 changes: 16 additions & 0 deletions .chloggen/replace-slashes-in-component-names.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: breaking

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: extension/filestorage

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Replace path-unsafe characters in component names

# One or more tracking issues related to the change
issues: [3148]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
38 changes: 38 additions & 0 deletions extension/storage/filestorage/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,3 +95,41 @@ processors:
exporters:
nop:
```

## Feature Gates

See the [Collector feature gates](https://github.com/open-telemetry/opentelemetry-collector/blob/main/featuregate/README.md#collector-feature-gates) for an overview of feature gates in the collector.

### `extension.filestorage.replaceUnsafeCharacters`

When enabled, characters that are not safe in file names are replaced in component name using the extension before creating the file name to store data by the extension.

For example, for a Filelog receiver named `filelog/logs/json`, the data is stored:

- in path `receiver_filelog_logs/json` with the feature flag disabled (note that this is a file named `json` inside directory named `receiver_filelog_logs`).
- in file `receiver_filelog_logs~007Ejson` with the feature flag enabled.

This replacement is done to prevent surprising behavior or errors in the File Storage extension.

The feature replaces all usafe characters with a tilde `~` and the character's [Unicode number][unicode_chars] in hex.
The only safe characters are: uppercase and lowercase ASCII letters `A-Z` and `a-z`, digits `0-9`, dot `.`, hyphen `-`, underscore `_`.

Changing the state of this feature gate may change the path to the file that the extension is writing component's data to. This may lead to loss of the data stored in the original path.

Before enabling this feature gate, ideally make sure that all component names that use the File Storage extension have names that only contain safe characters.
In case you want to keep using unsafe characters in your component names, you may want to rename the files used for storage before enabling this feature gate.
For example, `mv ./receiver_filelog_logs/json ./receiver_filelog_logs~007Ejson`.

For more details, see the following issues:

- [File storage extension - invalid file name characters must be encoded #3148](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/3148)
- [[filestorage] receiver name sanitization #20731](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/20731)

The schedule for this feature gate is:

- Introduced in v0.87.0 (October 2023) as `alpha` - disabled by default.
- Moved to `beta` in January 2024 - enabled by default.
- Moved to `stable` in April 2024 - cannot be disabled.
- Removed three releases after `stable`.

[unicode_chars]: https://en.wikipedia.org/wiki/List_of_Unicode_characters
52 changes: 51 additions & 1 deletion extension/storage/filestorage/extension.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,23 @@ import (
"context"
"fmt"
"path/filepath"
"strings"

"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/extension"
"go.opentelemetry.io/collector/extension/experimental/storage"
"go.opentelemetry.io/collector/featuregate"
"go.uber.org/zap"
)

var replaceUnsafeCharactersFeatureGate = featuregate.GlobalRegistry().MustRegister(
"extension.filestorage.replaceUnsafeCharacters",
featuregate.StageAlpha,
featuregate.WithRegisterDescription("When enabled, characters that are not safe in file paths are replaced in component name using the extension. For example, the data for component `filelog/logs/json` will be stored in file `receiver_filelog_logs~007Ejson` and not in `receiver_filelog_logs/json`."),
featuregate.WithRegisterReferenceURL("https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/3148"),
featuregate.WithRegisterFromVersion("v0.87.0"),
)

type localFileStorage struct {
cfg *Config
logger *zap.Logger
Expand Down Expand Up @@ -49,7 +59,10 @@ func (lfs *localFileStorage) GetClient(_ context.Context, kind component.Kind, e
} else {
rawName = fmt.Sprintf("%s_%s_%s_%s", kindString(kind), ent.Type(), ent.Name(), name)
}
// TODO sanitize rawName

if replaceUnsafeCharactersFeatureGate.IsEnabled() {
rawName = sanitize(rawName)
}
absoluteName := filepath.Join(lfs.cfg.Directory, rawName)
client, err := newClient(lfs.logger, absoluteName, lfs.cfg.Timeout, lfs.cfg.Compaction)

Expand Down Expand Up @@ -84,3 +97,40 @@ func kindString(k component.Kind) string {
return "other" // not expected
}
}

// sanitize replaces characters in name that are not safe in a file path
func sanitize(name string) string {
// Replace all unsafe characters with a tilde followed by the unsafe character's Unicode hex number.
// https://en.wikipedia.org/wiki/List_of_Unicode_characters
// For example, the slash is replaced with "~002F", and the tilde itself is replaced with "~007E".
// We perform replacement on the tilde even though it is a safe character to make sure that the sanitized component name
// never overlaps with a component name that does not reqire sanitization.
var sanitized strings.Builder
for _, character := range name {
if isSafe(character) {
sanitized.WriteString(string(character))
} else {
sanitized.WriteString(fmt.Sprintf("~%04X", character))
}
}
return sanitized.String()
}

func isSafe(character rune) bool {
// Safe characters are the following:
// - uppercase and lowercase letters A-Z, a-z
// - digits 0-9
// - dot `.`
// - hyphen `-`
// - underscore `_`
switch {
case character >= 'a' && character <= 'z',
character >= 'A' && character <= 'Z',
character >= '0' && character <= '9',
character == '.',
character == '-',
character == '_':
return true
}
return false
}
64 changes: 64 additions & 0 deletions extension/storage/filestorage/extension_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@ import (
"sync"
"testing"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/extension/experimental/storage"
"go.opentelemetry.io/collector/extension/extensiontest"
"go.opentelemetry.io/collector/featuregate"
)

func TestExtensionIntegrity(t *testing.T) {
Expand Down Expand Up @@ -185,6 +187,68 @@ func TestTwoClientsWithDifferentNames(t *testing.T) {
require.Equal(t, myBytes2, data)
}

func TestSanitize(t *testing.T) {
testCases := []struct {
name string
componentName string
sanitizedName string
}{
{
name: "safe characters",
componentName: `.UPPERCASE-lowercase_1234567890`,
sanitizedName: `.UPPERCASE-lowercase_1234567890`,
},
{
name: "name with a slash",
componentName: `logs/json`,
sanitizedName: `logs~002Fjson`,
},
{
name: "name with a tilde",
componentName: `logs~json`,
sanitizedName: `logs~007Ejson`,
},
{
name: "popular unsafe characters",
componentName: `tilde~slash/backslash\colon:asterisk*questionmark?quote'doublequote"angle<>pipe|exclamationmark!percent%space `,
sanitizedName: `tilde~007Eslash~002Fbackslash~005Ccolon~003Aasterisk~002Aquestionmark~003Fquote~0027doublequote~0022angle~003C~003Epipe~007Cexclamationmark~0021percent~0025space~0020`,
},
}

for _, testCase := range testCases {
t.Run(testCase.name, func(t *testing.T) {
assert.Equal(t, testCase.sanitizedName, sanitize(testCase.componentName))
})
}
}

func TestComponentNameWithUnsafeCharacters(t *testing.T) {
err := featuregate.GlobalRegistry().Set("extension.filestorage.replaceUnsafeCharacters", true)
require.NoError(t, err)

tempDir := t.TempDir()

f := NewFactory()
cfg := f.CreateDefaultConfig().(*Config)
cfg.Directory = tempDir

extension, err := f.CreateExtension(context.Background(), extensiontest.NewNopCreateSettings(), cfg)
require.NoError(t, err)

se, ok := extension.(storage.Extension)
require.True(t, ok)

client, err := se.GetClient(
context.Background(),
component.KindReceiver,
newTestEntity("my/slashed/component*"),
"",
)

require.NoError(t, err)
require.NotNil(t, client)
}

func TestGetClientErrorsOnDeletedDirectory(t *testing.T) {
ctx := context.Background()

Expand Down
2 changes: 1 addition & 1 deletion extension/storage/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ require (
go.opentelemetry.io/collector/component v0.86.0
go.opentelemetry.io/collector/confmap v0.86.0
go.opentelemetry.io/collector/extension v0.86.0
go.opentelemetry.io/collector/featuregate v1.0.0-rcv0015
go.uber.org/zap v1.26.0
)

Expand All @@ -31,7 +32,6 @@ require (
github.com/mitchellh/reflectwalk v1.0.2 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
go.opentelemetry.io/collector/config/configtelemetry v0.86.0 // indirect
go.opentelemetry.io/collector/featuregate v1.0.0-rcv0015 // indirect
go.opentelemetry.io/collector/pdata v1.0.0-rcv0015 // indirect
go.opentelemetry.io/otel v1.18.0 // indirect
go.opentelemetry.io/otel/metric v1.18.0 // indirect
Expand Down

0 comments on commit 91e56e4

Please sign in to comment.