Morsing's Blog

9 April 2014

Effective error handling in Go.

Introduction

One of the things the things that Go gets a lot of criticism for is how errors are handled. While it might seem daunting to have to explicitly inspect every error, there are steps you can take to defend yourself against erroneous error handling.

Indented flow is for errors.

When writing Go code, prefer the form

f, err := os.Open(path)
if err != nil {
    // handle error
}
// do stuff

over

f, err := os.Open(path)
if err == nil {
    // do stuff
}
// handle error

This way, the error free case will read as a straight line down the page.

Define your errors

One of the first steps to knowing how to handle an error is knowing what the error is. If your package can somehow cause an error, your users could be interested in knowing that you caused it. To do this, you just need to implement the error interface, which can be something as simple as this:

type Error string

func (e Error) Error() string { return string(e) }

Users of your package can now tell if your package caused an error by doing a type assertion

result, err := yourpackage.Foo()
if ype, ok := err.(yourpackage.Error); ok {
    // use ype to handle error
}

This can also be used as a way to expose structured error information to your users.

type ParseError struct {
    File  *File
    Error string
}

func (oe *OpenError) Error() string {
    // format error string here
}

func ParseFiles(files []*File) error {
    for _, f := range files {
        err := f.parse()
        if err != nil {
            return &OpenError{
                File:  f,
                Error: err.Error(),
            }
        }
    }
}

This way, your users can now tell which exact file failed to parse.

You should be careful about wrapping errors though. When you wrap an error, information can be lost.

var c net.Conn
f, err := DownloadFile(c, path)
switch e := err.(type) {
default:
    // this will get executed if err == nil
case net.Error:
    // close connection, not valid anymore
    c.Close()
    return e
case error:
    // if err is non-nil
    return err
}
// do other things.

If you wrap net.Error, this code will not see that it was the network which failed and reuse the invalid connection.

A good rule of thumb is that if your package uses an outside interface, don't wrap errors generated by calls to them. Your user might care more about their errors than yours.

Errors as state.

Some times you might want to hold on to an error, either because you can delay reporting it or because you know you'll report it again soon.

A good example of the first case is the bufio package. When a bufio.Reader encounters an error, it will hold on to error until the buffer has been emptied. Only then will it report it.

A good example of the second case is go/loader. When called with parameters that cause it to error, it will hold on to the error since it is likely that it will be called again with the same parameters.

Use functions to avoid repetition

If you have a piece of error handling that is repeated, you can make a function out of it.

func handleError(c net.Conn, err error) {
    // repeated error handling
}

func DoStuff(c net.Conn) error {
    f, err := downloadFile(c, path)
    if err != nil {
        handleError(c, err)
        return err
    }
    
    f, err := doOtherThing(c)
    if err != nil {
        handleError(c, err)
        return err
    }
}

An alternative way of writing this is

func handleError(c net.Conn, err error) {
    if err == nil {
        return
    }
    // repeated error handling
}

func DoStuff(c net.Conn) error {
    defer func() { handleError(c, err) }()
    f, err := downloadFile(c, path)
    if err != nil {
        return err
    }
    
    f, err := doOtherThing(c)
    if err != nil {
        return err
    }
}

That's all.

That's all there really is to it.

By Daniel Morsing

8 April 2014

Machine code and garbage collection

So you're building a language.

If you're constructing a programming language, chances are that you want to use machine code somewhere in your implementation. It might be because you want to use C modules, it might be that you're using a JITing, or you might just want to compile ahead-of-time to a binary.

If your language is garbage-collected, there are a number of things you must consider. This is an attempt to mention just some of them.

Some background

But first, let's take a brief look at how garbage collectors work, or more specifically, how one type of garbage collectors work. You have your heap with a collection of objects. These objects can have references to other objects. A tracing garbage collector stop the running program at some point during its execution. It will then go through the already known set of objects, usually global variables and objects on the stack. It'll mark these objects as reachable, find all the references to other objects within them and mark them as reachable as well. This is called the mark phase.

Once it has run out of objects to scan, it will take all objects that it didn't reach and collect them. This is called the sweep phase.

The fact that a garbage collector only collects objects it didn't mark as reachable has some serious implications. It means that if you didn't recognize a reference somewhere, you will end up collecting items that are still live.

The problem

The big problem with machine code and garbage collection is that machine code does not know what a reference is. To the machine code, memory is just a large array of numbers and it has no idea what's reference and what's not. Since we'll be collecting all objects that we didn't reach, missing a reference means potentially freeing memory that isn't ready to be free. We'll need to find some way of getting around this problem.

One strategy is to treat all memory as references. This is what's called a conservative garbage collector. By doing this, we can be sure that no matter what, you will not miss a reference.

The downside with this tactic is that you can have spurious references. Somewhere in your data, you could have a string of bytes which looks like a pointer. The garbage collector would then keep the memory around, even though it could be freed. This is how the Boehm garbage collector for C works.

Even if you implement this scheme, there are still things to watch out for. Since compilers have become very good at optimizing and keeping often referenced variables in registers, you need to make sure that you're looking at the registers as well. This problem manifested itself as a bug in Ruby modules, where it would free memory that was only referenced by register.

Another strategy is to disable garbage collection while your native code is running. Once you're back in your interpreter, the native code would have left you in a state that your interpreter can understand and you can start your garbage collection. If you're implementing a dynamically typed language, you will have type information at hand anyway and you can use this to your advantage by just looking up where the references are.

The disadvantage is that your garbage collection is delayed until a return from your native code. This is fine for something like JITed code where you'll eventually return into your intepreter, but implementing event loops in C is a problem since you never return. This is also a problem if you're running more than one thread in your implementation. Since any one thread executing machine code can stall your collection, multiple threads increase the chance that you'll have to wait even longer before you can garbage collect.

Yet another strategy is to build type information for your machine code that the garbage collector can use. This usually manifests itself as 2 data structures. One is a bitmap of the stack frame, showing the garbage collector where pointers on the stack are for any given instruction. The other is bytecode telling the garbage collector where to find references. Using this combination of data structures, the garbage collector is able to precisely figure out where the references are.

Having this data means that you'll have to calculate it somehow. If you're compiling ahead-of-time, this shouldn't be a big problem. You just make sure that you emit the data when handling your types. However, if you're trying to integrate C modules, the normal C compilers will not help you. You'll either have to use conservative garbage collection or implement your own C compiler, just to get this information.

What's next

So far, I've only just scratched the surface. There are many more things to consider like how to make sure that all your threads can be stopped for collection, interrupting your machine code only when the heap is in a consistent state and how to make your compiler generate machine code that won't lose references.

I am by no means an expert on this subject. You shouldn't use any of this advice to actually build a garbage collector. But I do have an appreciation of how much work goes into building one and after this, I hope you do too.

By Daniel Morsing

8 September 2013

The Go netpoller

Introduction

I'm bored again or I have something more important to do, so it's time for another blog post about the Go runtime. This time I'm gonna take a look at how Go handles network I/O.

Blocking

In Go, all I/O is blocking. The Go ecosystem is built around the idea that you write against a blocking interface and then handle concurrency through goroutines and channels rather than callbacks and futures. An example is the HTTP server in the "net/http" package. Whenever it accepts a connection, it will create a new goroutine to handle all the requests that will happen on that connection. This construct means that the request handler can be written in a very straightforward manner. First do this, then do that. Unfortunately, using the blocking I/O provided by the operating system isn't suitable for constructing our own blocking I/O interface.

In my previous post about the Go runtime, I covered how the Go scheduler handles syscalls. To handle a blocking syscall, we need a thread that can be blocked inside the operating system. If we were to build our blocking I/O on top of the OS' blocking I/O, we'd spawn a new thread for every client stuck in a syscall. This becomes really expensive once you have 10,000 client threads, all stuck in a syscall waiting for their I/O operation to succeed.

Go gets around this problem by using the asynchronous interfaces that the OS provides, but blocking the goroutines that are performing I/O.

The netpoller

The part that converts asynchronous I/O into blocking I/O is called the netpoller. It sits in its own thread, receiving events from goroutines wishing to do network I/O. The netpoller uses whichever interface the OS provides to do polling of network sockets. On Linux, it uses epoll, on the BSDs and Darwin, it uses kqueue and on Windows it uses IoCompletionPort. These interfaces all have in common that they provide user space a way to efficiently poll for the status of network I/O.

Whenever you open or accept a connection in Go, the file descriptor that backs it is set to non-blocking mode. This means that if you try to do I/O on it and the file descriptor isn't ready, it will return an error code saying so. Whenever a goroutine tries to read or write to a connection, the networking code will do the operation until it receives such an error, then call into the netpoller, telling it to notify the goroutine when it is ready to perform I/O again. The goroutine is then scheduled out of the thread it's running on and another goroutine is run in its place.

When the netpoller receives notification from the OS that it can perform I/O on a file descriptor, it will look through its internal data structure, see if there are any goroutines that are blocked on that file and notify them if there are any. The goroutine can then retry the I/O operation that caused it to block and succeed in doing so.

If this is sounding a lot like using the old select and poll Unix system calls to do I/O, it's because it is. But instead of looking up a function pointer and a struct containing a bunch of state variables, the netpoller looks up a goroutine that can be scheduled in. This frees you from managing all that state, rechecking whether you received enough data on the last go around and juggling function pointers like you would do with traditional Unix networking I/O.

30 June 2013

The Go scheduler

Introduction

One of the big features for Go 1.1 is the new scheduler, contributed by Dmitry Vyukov. The new scheduler has given a dramatic increase in performance for parallel Go programs and with nothing better to do, I figured I'd write something about it.

Most of what's written in this blog post is already described in the original design doc. It's a fairly comprehensive document, but pretty technical.

All you need to know about the new scheduler is in that design document but this post has pictures, so it's clearly superior.

What does the Go runtime need with a scheduler?

But before we look at the new scheduler, we need to understand why it's needed. Why create a userspace scheduler when the operating system can schedule threads for you?

The POSIX thread API is very much a logical extension to the existing Unix process model and as such, threads get a lot of the same controls as processes. Threads have their own signal mask, can be assigned CPU affinity, can be put into cgroups and can be queried for which resources they use. All these controls add overhead for features that are simply not needed for how Go programs use goroutines and they quickly add up when you have 100,000 threads in your program.

Another problem is that the OS can't make informed scheduling decisions, based on the Go model. For example, the Go garbage collector requires that all threads are stopped when running a collection and that memory must be in a consistent state. This involves waiting for running threads to reach a point where we know that the memory is consistent.

When you have many threads scheduled out at random points, chances are that you're going to have to wait for a lot of them to reach a consistent state. The Go scheduler can make the decision of only scheduling at points where it knows that memory is consistent. This means that when we stop for garbage collection, we only have to wait for the threads that are being actively run on a CPU core.

Our Cast of Characters

There are 3 usual models for threading. One is N:1 where several userspace threads are run on one OS thread. This has the advantage of being very quick to context switch but cannot take advantage of multi-core systems. Another is 1:1 where one thread of execution matches one OS thread. It takes advantage of all of the cores on the machine, but context switching is slow because it has to trap through the OS.

Go tries to get the best of both worlds by using a M:N scheduler. It schedules an arbitrary number of goroutines onto an arbitrary number of OS threads. You get quick context switches and you take advantage of all the cores in your system. The main disadvantage of this approach is the complexity it adds to the scheduler.

To acomplish the task of scheduling, the Go Scheduler uses 3 main entities:

The triangle represents an OS thread. It's the thread of execution managed by the OS and works pretty much like your standard POSIX thread. In the runtime code, it's called M for machine.

The circle represents a goroutine. It includes the stack, the instruction pointer and other information important for scheduling goroutines, like any channel it might be blocked on. In the runtime code, it's called a G.

The rectangle represents a context for scheduling. You can look at it as a localized version of the scheduler which runs Go code on a single thread. It's the important part that lets us go from a N:1 scheduler to a M:N scheduler. In the runtime code, it's called P for processor. More on this part in a bit.

Here we see 2 threads (M), each holding a context (P), each running a goroutine (G). In order to run goroutines, a thread must hold a context.

The number of contexts is set on startup to the value of the GOMAXPROCS environment variable or through the runtime function GOMAXPROCS(). Normally this doesn't change during execution of your program. The fact that the number of contexts is fixed means that only GOMAXPROCS are running Go code at any point. We can use that to tune the invocation of the Go process to the individual computer, such at a 4 core PC is running Go code on 4 threads.

The greyed out goroutines are not running, but ready to be scheduled. They're arranged in lists called runqueues. Goroutines are added to the end of a runqueue whenever a goroutine executes a go statement. Once a context has run a goroutine until a scheduling point, it pops a goroutine off its runqueue, sets stack and instruction pointer and begins running the goroutine.

To bring down mutex contention, each context has its own local runqueue. A previous version of the Go scheduler only had a global runqueue with a mutex protecting it. Threads were often blocked waiting for the mutex to unlocked. This got really bad when you had 32 core machines that you wanted to squeeze as much performance out of as possible.

The scheduler keeps on scheduling in this steady state as long as all contexts have goroutines to run. However, there are a couple of scenarios that can change that.

Who you gonna (sys)call?

You might wonder now, why have contexts at all? Can't we just put the runqueues on the threads and get rid of contexts? Not really. The reason we have contexts is so that we can hand them off to other threads if the running thread needs to block for some reason.

An example of when we need to block, is when we call into a syscall. Since a thread cannot both be executing code and be blocked on a syscall, we need to hand off the context so it can keep scheduling.

Here we see a thread giving up its context so that another thread can run it. The scheduler makes sure there are enough threads to run all contexts. M1 in the illustration above might be created just for the purpose of handling this syscall or it could come from a thread cache. The syscalling thread will hold on to the goroutine that made the syscall since it's technically still executing, albeit blocked in the OS.

When the syscall returns, the thread must try and get a context in order to run the returning goroutine. The normal mode of operation is to steal a context from one of the other threads. If it can't steal one, it will put the goroutine on a global runqueue, put itself on the thread cache and go to sleep.

The global runqueue is a runqueue that contexts pull from when they run out of their local runqueue. Contexts also periodically check the global runqueue for goroutines. Otherwise the goroutines on global runqueue could end up never running because of starvation.

This handling of syscalls is why Go programs run with multiple threads, even when GOMAXPROCS is 1. The runtime uses goroutines that call syscalls, leaving threads behind.

Stealing work

Another way that the steady state of the system can change is when a context runs out of goroutines to schedule to. This can happen if the amount of work on the contexts' runqueues is unbalanced. This can cause a context to end up exhausting it's runqueue while there is still work to be done in the system. To keep running Go code, a context can take goroutines out of the global runqueue but if there are no goroutines in it, it'll have to get them from somewhere else.

That somewhere is the other contexts. When a context runs out, it will try to steal about half of the runqueue from another context. This makes sure there is always work to do on each of the contexts, which in turn makes sure that all threads are working at their maximum capacity.

Where to go?

There are many more details to the scheduler, like cgo threads, the LockOSThread() function and integration with the network poller. These are outside the scope of this post, but still merit study. I might write about these later. There are certainly plenty of interesting constructions to be found in the Go runtime library.

By Daniel Morsing

See the index for more articles.