Skip to content

Race condition in dispatcher code #2182

@jtlisi

Description

@jtlisi

I noticed a mild race condition in the dispatch package.

I wrote a small test in the dispatch package to demonstrate a race condition in the dispatch code:

Test Added to `dispatch/dispatch_test.go:

func TestDispatcherRace(t *testing.T) {
	logger := log.NewNopLogger()
	marker := types.NewMarker(prometheus.NewRegistry())
	alerts, err := mem.NewAlerts(context.Background(), marker, time.Hour, logger)
	if err != nil {
		t.Fatal(err)
	}
	defer alerts.Close()

	timeout := func(d time.Duration) time.Duration { return time.Duration(0) }
	dispatcher := NewDispatcher(alerts, nil, nil, marker, timeout, logger, NewDispatcherMetrics(prometheus.NewRegistry()))
	go dispatcher.Run()
	dispatcher.Stop()
}

Test Output:

$ go test ./dispatch -run TestDispatcherRace -race
==================
WARNING: DATA RACE
Write at 0x00c0002e0580 by goroutine 26:
  github.com/prometheus/alertmanager/dispatch.(*Dispatcher).Run()
      /Users/jtlisi/tmp/alertmanager/dispatch/dispatch.go:112 +0x1d7

Previous read at 0x00c0002e0580 by goroutine 20:
  github.com/prometheus/alertmanager/dispatch.TestDispatcherRace()
      /Users/jtlisi/tmp/alertmanager/dispatch/dispatch.go:254 +0x677
  testing.tRunner()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:909 +0x199

Goroutine 26 (running) created at:
  github.com/prometheus/alertmanager/dispatch.TestDispatcherRace()
      /Users/jtlisi/tmp/alertmanager/dispatch/dispatch_test.go:537 +0x58e
  testing.tRunner()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:909 +0x199

Goroutine 20 (finished) created at:
  testing.(*T).Run()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:960 +0x651
  testing.runTests.func1()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:1202 +0xa6
  testing.tRunner()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:909 +0x199
  testing.runTests()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:1200 +0x521
  testing.(*M).Run()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:1117 +0x2ff
  main.main()
      _testmain.go:58 +0x223
==================
FAIL
FAIL	github.com/prometheus/alertmanager/dispatch	0.271s
FAIL

Although it will not happen as things stand, in theory simultaneous call to Stop() and Run() will cause a panic. The fix is simple. The operations on the cancel field in the Stop and Run functions require a lock.

Let me know if I'm mistaken. Otherwise, I can open a small PR with the fix.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions