Skip to content

Run functions resiliently in Go, catching and restarting panics

License

Notifications You must be signed in to change notification settings

VividCortex/robustly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

robustly

Robustly runs code resiliently, recovering from occasional errors. It also gives you the ability to probabilistically inject panics into your application, configuring them at runtime at crash sites of your choosing. We use it at VividCortex to ensure that unexpected problems don't disable our agent programs.

build

Getting Started

go get github.com/VividCortex/robustly

Now import the following in your code:

import (
    "github.com/VividCortex/robustly"
)

func main() {
    go robustly.Run(func() { somefunc() })
}

func somefunc() {
    for {
        // do something here that may panic
    }
}

API Documentation

View the GoDoc generated documentation here.

Robustly's Purpose

Robustly is designed to help make Go programs more resilient to errors you don't discover until they're in the field. It is not a general-purpose approach and shouldn't be overused, but in specific conditions it can be valuable.

cat

Imagine, for example, that you are writing a program designed to process events at a high rate, such as 50,000 per second. The program is stateful, and its value comes from observing the event stream for relatively long periods, such as several minutes, to learn its behavior. Now imagine that you introduce a subtle bug into the program, which will happen extremely rarely -- once in a million. Although rare, this bug will cause a panic and crash the program.

Your program will be completely useless for its intended purpose, because you're likely to hit a once-in-a-million error every 20 seconds. Handling such errors, especially when the program will take some time and effort to fix and redeploy, can make the program 99.9999% useful again.

Robustly is targeted towards this type of use case. Its design is inspired by the net/http server's code, where each HTTP request is handled in a goroutine that can crash without crashing the entire server.

When Robustly handles a crash, it immediately restarts the offending code. It keeps track of how fast the code crashes, and if it crashes too quickly for too long, it gives up and crashes the whole program. This way once-in-a-million errors can be restarted without getting into infinite loops.

Using Run

To use Run, simply wrap around the function call that represents the entry point to the code you wish to catch and restart:

robustly.Run(func() { /* your code here */ }, nil)

To use the optional settings of Run, pass Run a pointer to a RunOptions struct.

// RunOptions is a struct to hold the optional arguments to Run.
type RunOptions struct {
    RateLimit  float64       // the rate limit in crashes per second
    Timeout    time.Duration // the timeout (after which Run will stop trying)
    PrintStack bool          // whether to print the panic stacktrace or not
    RetryDelay time.Duration // inject a delay before retrying the run
}

Default options are shown below:

robustly.Run(func() { /* your code here */ }, &robustly.RunOptions{
    RateLimit:  1.0,
    Timeout:    time.Second,
    PrintStack: false,
    RetryDelay: 0 * time.Nanosecond,
})

Using Crash

Robustly also includes Crash(), a way to inject panics into your code at runtime. To use it, select places where you'd like to cause crashes, and add the following line of code:

robustly.Crash()

Configure crash sites with CrashSetup(). Pass it a comma-separated string of crash sites, which are colon-separated file:line:probability specifications. Probability should range between 0 and 1. If you pass the special spec "VERBOSE", it will enable printouts of all crash sites that are located in your code.

The idea is to match the crash sites configured in the setup with those actually present in your code. For example, if you have added a crash site in the code at line 53 of client.go, and you'd like to crash there, as well as at line 18 of server.go:

    client.go:53:.003,server.go:18:.02

That will cause a crash .003 of the time at client.go line 53, and .02 of the time at server.go line 18.

If you are using robustly.Run() to make your code resilient to errors, it is a very good idea to deliberately inject errors and make sure they are indeed handled. You can easily miss a detail such as a potentially crashing function that is called as a goroutine.

Contributing

We only accept pull requests for minor fixes or improvements. This includes:

  • Small bug fixes
  • Typos
  • Documentation or comments

Please open issues to discuss new features. Pull requests for new features will be rejected, so we recommend forking the repository and making changes in your fork for your use case.

License

This program is (c) VividCortex 2013, and is licensed under the MIT license. Please see the LICENSE file.

About

Run functions resiliently in Go, catching and restarting panics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages