X. Load Balancing

Remember when I called a reverse proxy "basically just a load balancer"? Ours doesn't balance any load. Let's fix that!

A Quick Review

We use a Mux to match the incoming request to a Target. The Mux can match incoming requests against the domain used, the port requested on, the URI, and more.

Currently, our Target object lets use define a single upstream (backend) server via the AddTarget() method.

What we want is to allow a Target to have multiple upstreams. The Target can decide how to distribute incoming requests amonst those upstream servers (aka Load Balancing).

Watch Your Language

We've been calling an upstream server a bunch of things interchangably so far (upstream, backend, target). We'll need to firm this language up a bit.

So, our firmed up language:

Mux: Matches an incoming request to a Target
Target: For a matched request, route the request to an available Upstream
Upstreams: A collection of backend servers (each being an "upstream") a Target might send a request to

The part that's new here is that we're allowing a Target to have multiple Upstreams. The Target will be responsible for deciding which Upstream to send a request to.

Refactoring Targets

So we need the ability to send to multiple Upstreams for a matched Target.

Before we do that, let's organize the code a bit more.

I decided the Target struct should be the object responsible for choosing which Upstream to send to. Since we'll be adding logic to our Target struct, let's refactor a bit to put the Target "stuff" into its own file.

We'll add file target.go.

.
├── go.mod
├── go.sum
├── main.go
└── reverseproxy
    ├── listener.go
    ├── reverseproxy.go
    └── target.go

Then we take type Target struct {...} out of reverseproxy.go and plop it into target.go:

// File target.go

package reverseproxy

import (
    "github.com/gorilla/mux"
    "net/url"
    "sync"
)

type Target struct {
    router   *mux.Router
    upstream *url.URL
}

So far, so good. We didn't really change anything yet!

Multiple Upstreams

We're going to add to the Target to accomplish two things:

Allow for multiple upstreams
Be able to load balance amongst available upstreams

First, we'll change the upstream property to be upstreams (plural) and just make it a slice of *url.URL's. We'll add some other items as well to help with load balancing.

Then we can add a method to the Target struct to help us select an upstream server. We'll hard code a round-robin strategy - no need to abstract different strategies for now.

Here's the updated struct and its shiny, new SelectUpstream() method:

package reverseproxy

import (
    "github.com/gorilla/mux"
    "net/url"
    "sync"
)

type Target struct {
    router       *mux.Router

    // NOTE: New properties here:
    upstreams    []*url.URL
    lastUpstream int
    lock         sync.Mutex
}

// SelectUpstream will load balance amongst available
// targets using a round-robin algorithm
func (t *Target) SelectUpstream() *url.URL {
    count := len(t.upstreams)
    if count == 1 {
        return t.upstreams[0]
    }

    t.lock.Lock()
    defer t.lock.Unlock()

    next := t.lastUpstream + 1
    if next >= count {
        next = 0
    }

    t.lastUpstream = next

    return t.upstreams[next]
}

We added method SelectUpstream(). This will just return a target of our choosing. The Target struct now has property upstreams (plural), which replaced upstream (singular).

Our SelectUpstream() method returns the first upstream (*url.URL) if we only defined one. No load balancing in that case!

Otherwise we do some boring logic to ensure we loop through the given upstreams without accidentally panicing with a index out of range error.

We track the last upstream that we sent a request to via lastUpstream, and we use a Mutex to safely increment said lastUpstream variable for when requests are coming on concurrently.

You can also use atomic ints for that case, which be a bit faster. However this SO answer scared me off of them. However an atomic might be preferred here.

Not too bad, logic-wise!

Defining the Upstreams

Let's next update the easy thing - we'll change our main.go file to define one or more upstreams when we create a Target.

Instead of just passing a string "http://localhost:8000", we'll pass slices of strings []string{"http://localhost:8000", ...}:

// Plenty of stuff omitted for brevity

func main() {
    r := &reverseproxy.ReverseProxy{}

    // Handle URI /foo
    a := mux.NewRouter()
    a.Host("fid.dev").Path("/foo")
    // Add a single upstream
    r.AddTarget([]string{"http://localhost:8000"}, a)

    // Handle anything else
    // Add multiple upstreams
    r.AddTarget([]string{
        "http://localhost:8001",
        "http://localhost:8002",
        "http://localhost:8003",
    }, nil)
}

Where as before we would send AddTarget a string, now we just send a slice of strings []string. This way we can define one or more upstream servers.

Also not too bad, logic-wise!

Directing Requests

Now we're finally ready to update our code and actually do the load balancing.

This is a change in reverseproxy.go. Since the Director is responsible for directing where an incoming request is proxied to, it seems like the right place to add our load balancing logic.

We'll update the Director function:

// Director returns a function for use in http.ReverseProxy.Director.
// The function matches the incoming request to a specific target and
// sets the request object to be sent to the matched upstream server.
func (r *ReverseProxy) Director() func(req *http.Request) {
    return func(req *http.Request) {
        for _, t := range r.targets {
            match := &mux.RouteMatch{}
            if t.router.Match(req, match) {
                var targetQuery = upstream.RawQuery

                // Call our new SelectUpstream method
                upstream := t.SelectUpstream()

                // Send requests to that selected upsteam
                req.URL.Scheme = upstream.Scheme
                req.URL.Host = upstream.Host
                req.URL.Path, req.URL.RawPath = joinURLPath(upstream, req.URL)
                if targetQuery == "" || req.URL.RawQuery == "" {
                    req.URL.RawQuery = targetQuery + req.URL.RawQuery
                } else {
                    req.URL.RawQuery = targetQuery + "&" + req.URL.RawQuery
                }
                if _, ok := req.Header["User-Agent"]; !ok {
                    // explicitly disable User-Agent so it's not set to default value
                    req.Header.Set("User-Agent", "")
                }
                break
            }
        }
    }
}

Instead of referencing t.upstream, we had our target select an upstream and directed the request to that!

The relevant part:

// Call our new method
upstream := t.SelectUpstream()

// Direct our request to the given upstream
req.URL.Scheme = upstream.Scheme
req.URL.Host = upstream.Host
req.URL.Path, req.URL.RawPath = joinURLPath(upstream, req.URL)

Still not too bad! We're cruising along here.

We Did It!

If we build and run the reverse proxy, we'll see that requests are bounced between the 3 backend servers localhost:8001-8003 for incoming requests.

The separate backend for requests matching fid.dev/foo continues to work and send to the one backend server localhost:8000.

If we make requests to something else (without any upstreams being present), the error messages will show the load balancing working:

2022/11/22 18:57:55 http: proxy error: dial tcp [::1]:8002: connect: connection refused
2022/11/22 18:57:57 http: proxy error: dial tcp [::1]:8003: connect: connection refused
2022/11/22 18:57:58 http: proxy error: dial tcp [::1]:8001: connect: connection refused
2022/11/22 18:57:58 http: proxy error: dial tcp [::1]:8002: connect: connection refused
2022/11/22 18:57:58 http: proxy error: dial tcp [::1]:8003: connect: connection refused
... and so on ...

What else are we missing here? Health checks!

Let's see how to test our upstreams health next. We'll start with "passive" health checks.