Skip to content

NET-5186 Fix issue where consul-dataplane attempts to write to a read-only file system location#253

Merged
nathancoleman merged 11 commits intomainfrom
setcap-envoy-only
Aug 31, 2023
Merged

NET-5186 Fix issue where consul-dataplane attempts to write to a read-only file system location#253
nathancoleman merged 11 commits intomainfrom
setcap-envoy-only

Conversation

@nathancoleman
Copy link
Member

@nathancoleman nathancoleman commented Aug 30, 2023

Running setcap on the consul-dataplane binary results in a process that cannot see the TMPDIR envar intended to influence where the envoy bootstrap config is written to. This is due to a previously-uknown-to-us side effect of glibc, described here.

As a result, it attempts to write to the /tmp directory instead of /consul/connect-inject -- configured for injected sidecars here -- and gets a read-only file system error.

We previously considered dropping setcap for the consul-dataplane binary; however, the envoy binary that is forked from dataplane cannot itself have a capability that the consul-dataplane process does not have.

Instead, I've opted to keep setcap net_bind_service for both envoy and consul-dataplane. We can work around the fact that TMPDIR is invisible to consul-dataplane by not writing to the file system at all and using envoy --config-yaml <yaml or json string> where we were previously using envoy --config-path <string>.

@nathancoleman nathancoleman added pr/no-changelog This PR does not introduce a user-facing change that should be reflected in the changelog backport/1.1 Changes are backported to 1.1 backport/1.2 labels Aug 30, 2023
nathancoleman

This comment was marked as outdated.

The net_bind_service capability is lost when running the envoy process from a consul-dataplane process that does not have the capability
…s.TempDir

This amounts to using `envoy --config-yaml` -- which also accepts JSON -- instead of `envoy --config-path`
@nathancoleman nathancoleman marked this pull request as ready for review August 31, 2023 17:39
@nathancoleman nathancoleman requested a review from a team as a code owner August 31, 2023 17:39
Copy link
Member Author

@nathancoleman nathancoleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal review

args := append(
[]string{
"--config-path", cfgPath,
"--config-yaml", cfgYaml,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alongside the setcap changes applied in #238 , this is the meat of the change. Instead of writing the config to disk at os.TempDir() and then passing the filepath to the envoy process, the config yaml -- or JSON in our case -- is passed directly. JSON is allowed here according to the envoy docs.

"resource_api_version": "V3"
},
}
}`)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I aimed to test with some more robust json config vs. just "hello world" to ensure that the config retains its integrity when being passed through as a command line argument.

Comment on lines +87 to +88
require.Empty(t, envoyOut.String())
require.Empty(t, envoyErr.String())
Copy link
Member Author

@nathancoleman nathancoleman Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the process hit an error case, such as a missing flag expected by fake-envoy, you would never know about it in this test other than seeing the output file never being written.

With this change, we check for the output file with just an assert -- which allows the test to continue on -- and then make assertions about the output from Envoy so that errors become obvious when you're developing tests.

require.Eventually(t, func() bool {
return p.cmd.Process.Signal(syscall.Signal(0)) == os.ErrProcessDone
err := p.cmd.Process.Signal(syscall.Signal(0))
return errors.Is(err, os.ErrProcessDone)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was something the linter suggested as an improvement in order to support wrapped errors

set -e

config_path=""
config_yaml=""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fake envoy process will just stash the value of the --config-yaml argument it receives and write it to a file later so that the calling test can make assertions on it. The changes in this test are just around handling --config-yaml instead of --config-path.

@nathancoleman nathancoleman changed the title NET-5186 Fix issue where TMPDIR envar is not visible from consul-dataplane binary NET-5186 Fix issue where consul-dataplane attempts to write to a read-only file system location Aug 31, 2023
@nathancoleman nathancoleman requested a review from skpratt August 31, 2023 17:59
@nathancoleman
Copy link
Member Author

Applying the do not merge label while I do some deeper testing. Would love to get reviews on this in the meantime though and have no indication at time of writing that this solution won't work in all cases.

Copy link
Contributor

@curtbushko curtbushko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/1.1 Changes are backported to 1.1 pr/no-changelog This PR does not introduce a user-facing change that should be reflected in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants