Skip to content

Commit

Permalink
Merge pull request #9534 from gyuho/test-tls
Browse files Browse the repository at this point in the history
functional-tester: enable TLS, phase 1
  • Loading branch information
gyuho authored Apr 6, 2018
2 parents 72ba557 + a0b094c commit c91a61b
Show file tree
Hide file tree
Showing 12 changed files with 1,976 additions and 533 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG-3.4.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ See [code changes](https://github.com/coreos/etcd/compare/v3.3.0...v3.4.0) and [
- Futhermore, when `--auto-compaction-mode=periodic --auto-compaction-retention=30m` and writes per minute are about 1000, `v3.3.0`, `v3.3.1`, and `v3.3.2` compact revision 30000, 33000, and 36000, for every 3-minute, while `v3.3.3` *or later* compacts revision 30000, 60000, and 90000, for every 30-minute.
- Improve [lease expire/revoke operation performance](https://github.com/coreos/etcd/pull/9418), address [lease scalability issue](https://github.com/coreos/etcd/issues/9496).
- Make [Lease `Lookup` non-blocking with concurrent `Grant`/`Revoke`](https://github.com/coreos/etcd/pull/9229).
- Improve functional tester coverage: use [proxy layer to run network fault tests in CIs](https://github.com/coreos/etcd/pull/9081), enable [TLS](https://github.com/coreos/etcd/issues/8943), add [liveness mode](https://github.com/coreos/etcd/issues/9230), [shuffle test sequence](https://github.com/coreos/etcd/issues/9381).
- Improve [functional tester](https://github.com/coreos/etcd/tree/master/tools/functional-tester) coverage: use [proxy layer to run network fault tests in CI](https://github.com/coreos/etcd/pull/9081), enable [TLS both for server and client](https://github.com/coreos/etcd/pull/9534), add [liveness mode](https://github.com/coreos/etcd/issues/9230), and [shuffle test sequence](https://github.com/coreos/etcd/issues/9381).

### Breaking Changes

Expand Down
14 changes: 1 addition & 13 deletions tools/functional-tester/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,7 @@

etcd functional test suite tests the functionality of an etcd cluster with a focus on failure resistance under high pressure. It sets up an etcd cluster and inject failures into the cluster by killing the process or isolate the network of the process. It expects the etcd cluster to recover within a short amount of time after fixing the fault.

etcd functional test suite has two components: etcd-agent and etcd-tester. etcd-agent runs on every test machines and etcd-tester is a single controller of the test. etcd-tester controls all the etcd-agent to start etcd clusters and simulate various failure cases.

## Requirements

The environment of the cluster must be stable enough, so etcd test suite can assume that most of the failures are generated by itself.

## etcd agent

etcd agent is a daemon on each machines. It can start, stop, restart, isolate and terminate an etcd process. The agent exposes these functionality via HTTP RPC.

## etcd tester

etcd functional tester control the progress of the functional tests. It calls the RPC of the etcd agent to simulate various test cases. For example, it can start a three members cluster by sending three start RPC calls to three different etcd agents. It can make one of the member failed by sending stop RPC call to one etcd agent.
etcd functional test suite has two components: etcd-agent and etcd-tester. etcd-agent runs on every test machine, and etcd-tester is a single controller of the test. tester controls agents: start etcd process, stop, terminate, inject failures, and so on.

### Run locally

Expand Down
161 changes: 150 additions & 11 deletions tools/functional-tester/agent/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,11 @@ package agent
import (
"errors"
"fmt"
"io/ioutil"
"net/url"
"os"
"os/exec"
"path/filepath"
"syscall"
"time"

Expand Down Expand Up @@ -72,6 +74,7 @@ func (srv *Server) handleInitialStartEtcd(req *rpcpb.Request) (*rpcpb.Response,
return &rpcpb.Response{
Success: false,
Status: fmt.Sprintf("%q is not valid; last server operation was %q", rpcpb.Operation_InitialStartEtcd.String(), srv.last.String()),
Member: req.Member,
}, nil
}

Expand All @@ -84,16 +87,22 @@ func (srv *Server) handleInitialStartEtcd(req *rpcpb.Request) (*rpcpb.Response,
}
srv.lg.Info("created base directory", zap.String("path", srv.Member.BaseDir))

if err = srv.createEtcdFile(); err != nil {
if err = srv.saveEtcdLogFile(); err != nil {
return nil, err
}

srv.creatEtcdCmd()

err = srv.startEtcdCmd()
if err != nil {
if err = srv.saveTLSAssets(); err != nil {
return nil, err
}
if err = srv.startEtcdCmd(); err != nil {
return nil, err
}
srv.lg.Info("started etcd", zap.String("command-path", srv.etcdCmd.Path))
if err = srv.loadAutoTLSAssets(); err != nil {
return nil, err
}

// wait some time for etcd listener start
// before setting up proxy
Expand All @@ -104,10 +113,12 @@ func (srv *Server) handleInitialStartEtcd(req *rpcpb.Request) (*rpcpb.Response,

return &rpcpb.Response{
Success: true,
Status: "successfully started etcd!",
Status: "start etcd PASS",
Member: srv.Member,
}, nil
}

// TODO: support TLS
func (srv *Server) startProxy() error {
if srv.Member.EtcdClientProxy {
advertiseClientURL, advertiseClientURLPort, err := getURLAndPort(srv.Member.Etcd.AdvertiseClientURLs[0])
Expand All @@ -133,7 +144,7 @@ func (srv *Server) startProxy() error {
}

if srv.Member.EtcdPeerProxy {
advertisePeerURL, advertisePeerURLPort, err := getURLAndPort(srv.Member.Etcd.InitialAdvertisePeerURLs[0])
advertisePeerURL, advertisePeerURLPort, err := getURLAndPort(srv.Member.Etcd.AdvertisePeerURLs[0])
if err != nil {
return err
}
Expand Down Expand Up @@ -200,7 +211,7 @@ func (srv *Server) stopProxy() {
}
}

func (srv *Server) createEtcdFile() error {
func (srv *Server) saveEtcdLogFile() error {
var err error
srv.etcdLogFile, err = os.Create(srv.Member.EtcdLogPath)
if err != nil {
Expand All @@ -225,6 +236,128 @@ func (srv *Server) creatEtcdCmd() {
srv.etcdCmd.Stderr = srv.etcdLogFile
}

func (srv *Server) saveTLSAssets() error {
// if started with manual TLS, stores TLS assets
// from tester/client to disk before starting etcd process
// TODO: not implemented yet
if !srv.Member.Etcd.ClientAutoTLS {
if srv.Member.Etcd.ClientCertAuth {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.ClientCertAuth is %v", srv.Member.Etcd.ClientCertAuth)
}
if srv.Member.Etcd.ClientCertFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.ClientCertFile is %q", srv.Member.Etcd.ClientCertFile)
}
if srv.Member.Etcd.ClientKeyFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.ClientKeyFile is %q", srv.Member.Etcd.ClientKeyFile)
}
if srv.Member.Etcd.ClientTrustedCAFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.ClientTrustedCAFile is %q", srv.Member.Etcd.ClientTrustedCAFile)
}
}
if !srv.Member.Etcd.PeerAutoTLS {
if srv.Member.Etcd.PeerClientCertAuth {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.PeerClientCertAuth is %v", srv.Member.Etcd.PeerClientCertAuth)
}
if srv.Member.Etcd.PeerCertFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.PeerCertFile is %q", srv.Member.Etcd.PeerCertFile)
}
if srv.Member.Etcd.PeerKeyFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.PeerKeyFile is %q", srv.Member.Etcd.PeerKeyFile)
}
if srv.Member.Etcd.PeerTrustedCAFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.PeerTrustedCAFile is %q", srv.Member.Etcd.PeerTrustedCAFile)
}
}

// TODO
return nil
}

func (srv *Server) loadAutoTLSAssets() error {
// if started with auto TLS, sends back TLS assets to tester/client
if srv.Member.Etcd.ClientAutoTLS {
// in case of slow disk
time.Sleep(time.Second)

fdir := filepath.Join(srv.Member.Etcd.DataDir, "fixtures", "client")

srv.lg.Info(
"loading client TLS assets",
zap.String("dir", fdir),
zap.String("endpoint", srv.EtcdClientEndpoint),
)

certPath := filepath.Join(fdir, "cert.pem")
if !fileutil.Exist(certPath) {
return fmt.Errorf("cannot find %q", certPath)
}
certData, err := ioutil.ReadFile(certPath)
if err != nil {
return fmt.Errorf("cannot read %q (%v)", certPath, err)
}
srv.Member.ClientCertData = string(certData)

keyPath := filepath.Join(fdir, "key.pem")
if !fileutil.Exist(keyPath) {
return fmt.Errorf("cannot find %q", keyPath)
}
keyData, err := ioutil.ReadFile(keyPath)
if err != nil {
return fmt.Errorf("cannot read %q (%v)", keyPath, err)
}
srv.Member.ClientKeyData = string(keyData)

srv.lg.Info(
"loaded client TLS assets",
zap.String("peer-cert-path", certPath),
zap.Int("peer-cert-length", len(certData)),
zap.String("peer-key-path", keyPath),
zap.Int("peer-key-length", len(keyData)),
)
}
if srv.Member.Etcd.ClientAutoTLS {
// in case of slow disk
time.Sleep(time.Second)

fdir := filepath.Join(srv.Member.Etcd.DataDir, "fixtures", "peer")

srv.lg.Info(
"loading client TLS assets",
zap.String("dir", fdir),
zap.String("endpoint", srv.EtcdClientEndpoint),
)

certPath := filepath.Join(fdir, "cert.pem")
if !fileutil.Exist(certPath) {
return fmt.Errorf("cannot find %q", certPath)
}
certData, err := ioutil.ReadFile(certPath)
if err != nil {
return fmt.Errorf("cannot read %q (%v)", certPath, err)
}
srv.Member.PeerCertData = string(certData)

keyPath := filepath.Join(fdir, "key.pem")
if !fileutil.Exist(keyPath) {
return fmt.Errorf("cannot find %q", keyPath)
}
keyData, err := ioutil.ReadFile(keyPath)
if err != nil {
return fmt.Errorf("cannot read %q (%v)", keyPath, err)
}
srv.Member.PeerKeyData = string(keyData)

srv.lg.Info(
"loaded peer TLS assets",
zap.String("peer-cert-path", certPath),
zap.Int("peer-cert-length", len(certData)),
zap.String("peer-key-path", keyPath),
zap.Int("peer-key-length", len(keyData)),
)
}
return nil
}

// start but do not wait for it to complete
func (srv *Server) startEtcdCmd() error {
return srv.etcdCmd.Start()
Expand All @@ -233,12 +366,17 @@ func (srv *Server) startEtcdCmd() error {
func (srv *Server) handleRestartEtcd() (*rpcpb.Response, error) {
srv.creatEtcdCmd()

srv.lg.Info("restarting etcd")
err := srv.startEtcdCmd()
if err != nil {
var err error
if err = srv.saveTLSAssets(); err != nil {
return nil, err
}
if err = srv.startEtcdCmd(); err != nil {
return nil, err
}
srv.lg.Info("restarted etcd", zap.String("command-path", srv.etcdCmd.Path))
if err = srv.loadAutoTLSAssets(); err != nil {
return nil, err
}

// wait some time for etcd listener start
// before setting up proxy
Expand All @@ -251,7 +389,8 @@ func (srv *Server) handleRestartEtcd() (*rpcpb.Response, error) {

return &rpcpb.Response{
Success: true,
Status: "successfully restarted etcd!",
Status: "restart etcd PASS",
Member: srv.Member,
}, nil
}

Expand Down Expand Up @@ -293,7 +432,7 @@ func (srv *Server) handleFailArchive() (*rpcpb.Response, error) {
}
srv.lg.Info("archived data", zap.String("base-dir", srv.Member.BaseDir))

if err = srv.createEtcdFile(); err != nil {
if err = srv.saveEtcdLogFile(); err != nil {
return nil, err
}

Expand Down
20 changes: 18 additions & 2 deletions tools/functional-tester/rpcpb/etcd_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,19 @@ var etcdFields = []string{

"ListenClientURLs",
"AdvertiseClientURLs",
"ClientAutoTLS",
"ClientCertAuth",
"ClientCertFile",
"ClientKeyFile",
"ClientTrustedCAFile",

"ListenPeerURLs",
"InitialAdvertisePeerURLs",
"AdvertisePeerURLs",
"PeerAutoTLS",
"PeerClientCertAuth",
"PeerCertFile",
"PeerKeyFile",
"PeerTrustedCAFile",

"InitialCluster",
"InitialClusterState",
Expand Down Expand Up @@ -72,12 +83,17 @@ func (cfg *Etcd) Flags() (fs []string) {
default:
panic(fmt.Errorf("field %q (%v) cannot be parsed", name, fv.Type().Kind()))
}

fname := field.Tag.Get("yaml")

// TODO: remove this
if fname == "initial-corrupt-check" {
fname = "experimental-" + fname
}
fs = append(fs, fmt.Sprintf("--%s=%s", fname, sv))

if sv != "" {
fs = append(fs, fmt.Sprintf("--%s=%s", fname, sv))
}
}
return fs
}
63 changes: 42 additions & 21 deletions tools/functional-tester/rpcpb/etcd_config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,34 +21,55 @@ import (

func TestEtcdFlags(t *testing.T) {
cfg := &Etcd{
Name: "s1",
DataDir: "/tmp/etcd-agent-data-1/etcd.data",
WALDir: "/tmp/etcd-agent-data-1/etcd.data/member/wal",
HeartbeatIntervalMs: 100,
ElectionTimeoutMs: 1000,
ListenClientURLs: []string{"127.0.0.1:1379"},
AdvertiseClientURLs: []string{"127.0.0.1:13790"},
ListenPeerURLs: []string{"127.0.0.1:1380"},
InitialAdvertisePeerURLs: []string{"127.0.0.1:13800"},
InitialCluster: "s1=127.0.0.1:13800,s2=127.0.0.1:23800,s3=127.0.0.1:33800",
InitialClusterState: "new",
InitialClusterToken: "tkn",
SnapshotCount: 10000,
QuotaBackendBytes: 10740000000,
PreVote: true,
InitialCorruptCheck: true,
Name: "s1",
DataDir: "/tmp/etcd-agent-data-1/etcd.data",
WALDir: "/tmp/etcd-agent-data-1/etcd.data/member/wal",

HeartbeatIntervalMs: 100,
ElectionTimeoutMs: 1000,

ListenClientURLs: []string{"https://127.0.0.1:1379"},
AdvertiseClientURLs: []string{"https://127.0.0.1:13790"},
ClientAutoTLS: true,
ClientCertAuth: false,
ClientCertFile: "",
ClientKeyFile: "",
ClientTrustedCAFile: "",

ListenPeerURLs: []string{"https://127.0.0.1:1380"},
AdvertisePeerURLs: []string{"https://127.0.0.1:13800"},
PeerAutoTLS: true,
PeerClientCertAuth: false,
PeerCertFile: "",
PeerKeyFile: "",
PeerTrustedCAFile: "",

InitialCluster: "s1=https://127.0.0.1:13800,s2=https://127.0.0.1:23800,s3=https://127.0.0.1:33800",
InitialClusterState: "new",
InitialClusterToken: "tkn",

SnapshotCount: 10000,
QuotaBackendBytes: 10740000000,

PreVote: true,
InitialCorruptCheck: true,
}

exp := []string{
"--name=s1",
"--data-dir=/tmp/etcd-agent-data-1/etcd.data",
"--wal-dir=/tmp/etcd-agent-data-1/etcd.data/member/wal",
"--heartbeat-interval=100",
"--election-timeout=1000",
"--listen-client-urls=127.0.0.1:1379",
"--advertise-client-urls=127.0.0.1:13790",
"--listen-peer-urls=127.0.0.1:1380",
"--initial-advertise-peer-urls=127.0.0.1:13800",
"--initial-cluster=s1=127.0.0.1:13800,s2=127.0.0.1:23800,s3=127.0.0.1:33800",
"--listen-client-urls=https://127.0.0.1:1379",
"--advertise-client-urls=https://127.0.0.1:13790",
"--auto-tls=true",
"--client-cert-auth=false",
"--listen-peer-urls=https://127.0.0.1:1380",
"--initial-advertise-peer-urls=https://127.0.0.1:13800",
"--peer-auto-tls=true",
"--peer-client-cert-auth=false",
"--initial-cluster=s1=https://127.0.0.1:13800,s2=https://127.0.0.1:23800,s3=https://127.0.0.1:33800",
"--initial-cluster-state=new",
"--initial-cluster-token=tkn",
"--snapshot-count=10000",
Expand Down
Loading

0 comments on commit c91a61b

Please sign in to comment.