Skip to content

Commit

Permalink
feat(cmd/tracerunner): $container_pid variable for pods
Browse files Browse the repository at this point in the history
Signed-off-by: Lorenzo Fontana <[email protected]>
  • Loading branch information
fntlnz committed Dec 24, 2018
1 parent b77dcae commit b4a9c9c
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 103 deletions.
25 changes: 14 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,33 +38,36 @@ kubectl trace run ip-180-12-0-152.ec2.internal -f read.bt

**Run a program against a Pod**

In this case we are running our tracing program in the same pid namespace (`man 7 pid_namespaces`) of the default container of the pod named
`caturday-566d99889-8glv9`. Our trace program will be also sharing the same rootfs with that container.

That pod has a Go program in it that is at `/caturday`, that program has a function called `main.counterValue` in it that returns an integer
every time it is called.

The purpose of this program is to load an `uretprobe` on the `/caturday` binary so that every time thhe `main.counterValue` function is called
we get the return value out.

Since `kubectl trace` for pods is just an helper to resolve the context of a container's Pod, you will always be in the root namespaces
but in this case you will have a variable `$container_pid` containing the pid of the root process in that container on the root pid namespace.

What you do then is that you get the `/caturday` binary via `/proc/$container_pid/exe`, like this:

```
kubectl trace run -e 'uretprobe:/caturday:"main.counterValue" { printf("%d\n", retval) }' pod/caturday-566d99889-8glv9 -a -n caturday
kubectl trace run -e 'uretprobe:/proc/$container_pid/exe:"main.counterValue" { printf("%d\n", retval) }' pod/caturday-566d99889-8glv9 -a -n caturday
```

**Important note** The fact that the trace programs runs in the same pid namespace and under the same chroot **doesn't mean** that the trace program
is contained in that container, so if you have another caturday binary running from the same image (or ELF binary) in the same machine you will be dumping results from both.

### Running against a Pod vs against a Node

In general, you run kprobes/kretprobes, tracepoints, software, hardware and profile events against nodes using the `node/node-name` syntax or just use the
node name, node is the default.

When you want to actually probe an userspace program with an uprobe/uretprobe or use an user-level stattic tracepoint (usdt) your best
When you want to actually probe an userspace program with an uprobe/uretprobe or use an user-level static tracepoint (usdt) your best
bet is to run it against a pod using the `pod/pod-name` syntax.

It's always important to remember that running a program against a pod, as of now, is just a facilitator to find the binary you want to probe, you are in the same root filesystem so if your binary is in `/mybinary` it's easier to find it from there. You could do the same thing when running in a Node by
knowing the pid of your binary to get it from the proc filesystem like `/proc/12345/exe` but that would require extra machine access to actually find
the pid. So, running against a pod **doesn't mean** that your bpftrace program will be contained in that pod but just that it will run from the same root filesystem.
It's always important to remember that running a program against a pod, as of now, is just a facilitator to find the process id for the binary you want to probe
on the root process namespace.

You could do the same thing when running in a Node by knowing the pid of your process yourself after entering in the node via another medium, e.g: ssh.

So, running against a pod **doesn't mean** that your bpftrace program will be contained in that pod but just that it will pass to your program some
knowledge of the context of a container, in this case only the root process id is supported via the `$container_pid` variable.


### More bpftrace programs
Expand Down
112 changes: 20 additions & 92 deletions pkg/cmd/tracerunner.go
Original file line number Diff line number Diff line change
@@ -1,24 +1,18 @@
package cmd

import (
"encoding/hex"
"fmt"
"io"
"math/rand"
"io/ioutil"
"os"
"os/exec"
"path"
"path/filepath"
"strings"
"syscall"

"github.com/fntlnz/mountinfo"
"github.com/spf13/cobra"
"golang.org/x/sys/unix"
)

const runFolder = "/var/run"

type TraceRunnerOptions struct {
podUID string
containerName string
Expand Down Expand Up @@ -70,93 +64,33 @@ func (o *TraceRunnerOptions) Complete(cmd *cobra.Command, args []string) error {
}

func (o *TraceRunnerOptions) Run() error {
if o.inPod == false {
c := exec.Command(o.bpftraceBinaryPath, o.programPath)
c.Stdout = os.Stdout
c.Stdin = os.Stdin
c.Stderr = os.Stderr
return c.Run()
}

pid, err := findPidByPodContainer(o.podUID, o.containerName)
if err != nil {
return err
}
if pid == nil {
return fmt.Errorf("pid not found")
}
if len(*pid) == 0 {
return fmt.Errorf("invalid pid found")
}

// pid found, enter its process namespace
pidns := path.Join("/proc", *pid, "/ns/pid")
pidnsfd, err := syscall.Open(pidns, syscall.O_RDONLY, 0666)
if err != nil {
return fmt.Errorf("error retrieving process namespace %s %v", pidns, err)
}
defer syscall.Close(pidnsfd)
syscall.RawSyscall(unix.SYS_SETNS, uintptr(pidnsfd), 0, 0)

rootfs := path.Join("/proc", *pid, "root")
bpftracebinaryName, err := temporaryFileName("bpftrace")
if err != nil {
return err
}
temporaryProgramName := fmt.Sprintf("%s-%s", bpftracebinaryName, "program.bt")

binaryPathProcRootfs := path.Join(rootfs, bpftracebinaryName)
if err := copyFile(o.bpftraceBinaryPath, binaryPathProcRootfs, 0755); err != nil {
return err
}

programPathProcRootfs := path.Join(rootfs, temporaryProgramName)
if err := copyFile(o.programPath, programPathProcRootfs, 0644); err != nil {
return err
}

if err := syscall.Chroot(rootfs); err != nil {
os.Remove(binaryPathProcRootfs)
return err
programPath := o.programPath
if o.inPod == true {
pid, err := findPidByPodContainer(o.podUID, o.containerName)
if err != nil {
return err
}
if pid == nil {
return fmt.Errorf("pid not found")
}
if len(*pid) == 0 {
return fmt.Errorf("invalid pid found")
}
f, err := ioutil.ReadFile(programPath)
r := strings.Replace(string(f), "$container_pid", *pid, -1)
if err := ioutil.WriteFile(programPath, []byte(r), 0755); err != nil {
return err
}
programPath = path.Join(os.TempDir(), "program-container.bt")
}

defer os.Remove(bpftracebinaryName)

c := exec.Command(bpftracebinaryName, temporaryProgramName)

c := exec.Command(o.bpftraceBinaryPath, o.programPath)
c.Stdout = os.Stdout
c.Stdin = os.Stdin
c.Stderr = os.Stderr

return c.Run()
}

func copyFile(src, dest string, mode os.FileMode) error {
in, err := os.Open(src)
if err != nil {
return fmt.Errorf("bpftrace binary not found in host: %v", err)
}
defer in.Close()

out, err := os.OpenFile(dest, os.O_RDWR|os.O_CREATE|os.O_TRUNC, mode)

if err != nil {
return fmt.Errorf("unable to create file in destination: %v", err)
}
defer out.Close()

if _, err = io.Copy(out, in); err != nil {
return fmt.Errorf("unable to copy file to destination: %v", err)
}

err = out.Sync()

if err != nil {
return err
}
return nil
}

func findPidByPodContainer(podUID, containerName string) (*string, error) {
d, err := os.Open("/proc")

Expand Down Expand Up @@ -200,9 +134,3 @@ func findPidByPodContainer(podUID, containerName string) (*string, error) {

return nil, fmt.Errorf("no process found for specified pod and container")
}

func temporaryFileName(prefix string) (string, error) {
randBytes := make([]byte, 16)
rand.Read(randBytes)
return filepath.Join(runFolder, prefix+hex.EncodeToString(randBytes)), nil
}

0 comments on commit b4a9c9c

Please sign in to comment.