V. Single Host Reverse Proxy

It's time to get to more interesting things. The first reverse proxy we'll make is not of our making at all.

It turns out that the stdlib has a a decent implementation of a reverse proxy: proxy := httputil.NewSingleHostReverseProxy(backend). As the name implies, it just handles a single host for the backend. It's not very load-balancer-y but it is very reverse-proxy-y.

Let's take a quick look at how to use it:

package main

import (
    "log"
    "net/http"
    "net/http/httputil"
    "net/url"
)

func main() {
    // The backend is another HTTP service listening
    // for requests. Our reverse proxy will send (proxy)
    // requests to it.
    backend, err := url.Parse("http://localhost:8000")

    if err != nil {
        log.Fatal(err)
    }

    // The proxy is a Handler - it has a ServeHTTP method
    proxy := httputil.NewSingleHostReverseProxy(backend)

    // We listen for requests on port 80
    srv := http.Server{Addr: ":80", Handler: proxy}

    srv.ListenAndServe()
}

This reverse proxy accepts http, http/2, TLS, and gRPC connections. There's a bunch of stuff going on in there! You can check out the http.httputil.reverseproxy.go file from stdlib to see more.

Here's a few things it handles:

  1. HTTP Trailer headers (needed for gRPC in conjunction with http/2)
  2. Hop-by-hop headers
  3. HTTP streaming
  4. Upgrading HTTP connections (hijacking the TCP connection for Websocket use)

If we run the above code and send requests to http://localhost, it will attempt to proxy any request to localhost:8000. It will return 502 Bad Gateway if you don't have any web server listening on localhost:8000.

Digging In

Let's see what the NewSingleHostReverseProxy is doing.

Its main job is to return an instance of httputil.ReverseProxy. The ReverseProxy isn't itself limited to proxying to a single host, but Golang's stdlib only offers this "example" of a quick single-upstream reverse proxy. More on that later.

If we take a look at the httputil.ReverseProxy struct, there's a few interesting things. Let's take a look at the 2 most interesting.

Director

First, the Director function.

type ReverseProxy struct {
    // Director must be a function which modifies
    // the request into a new request to be sent
    // using Transport. Its response is then copied
    // back to the original client unmodified.
    // Director must not access the provided Request
    // after returning.
    Director func(*http.Request)

    // snip
}

The Director function takes the incoming HTTP request (well, a copy of it) and modifies it. It's modified in a way that makes it ready to be sent to the backend/upstream server. Part of this is setting the requests host, schema (etc) to that of the upstream server so it works when the request is sent to it.

The modified request is eventually sent over to the backend server, but that's not the Director's responsibility. The Director is just responsible for modying the incoming request.

For example, if our proxy (running locally) is sent a request via curl http://localhost:80, and we've configured a backend/upstream server ("target") of http://localhost:8000, then the Directory will take the incoming request copy, and set the Host to localhost:8000. Later, the configured Transport will use that request information to connect to the upstream server and send that request.

Default Director Function

Looking at the NewSingleHostReverseProxy() method, we see that all it does is define a Director function and returns a new ReverseProxy object with that Director.

func NewSingleHostReverseProxy(target *url.URL) *ReverseProxy {
    targetQuery := target.RawQuery
    director := func(req *http.Request) {
        req.URL.Scheme = target.Scheme
        req.URL.Host = target.Host
        req.URL.Path, req.URL.RawPath = joinURLPath(target, req.URL)
        if targetQuery == "" || req.URL.RawQuery == "" {
            req.URL.RawQuery = targetQuery + req.URL.RawQuery
        } else {
            req.URL.RawQuery = targetQuery + "&" + req.URL.RawQuery
        }
        if _, ok := req.Header["User-Agent"]; !ok {
            // explicitly disable User-Agent so it's not set to default value
            req.Header.Set("User-Agent", "")
        }
    }
    return &ReverseProxy{Director: director}
}

Pretty simple, if we continue to ignore all the Reverse Proxy code we didn't look at.

Transport

As mentioned, the Director function "directs" the copied request to the correct location - our backend target http://localhost:8080.

The thing that sends the request to the backend (and returns the response) is named Transport:

type ReverseProxy struct {
    // snip

    // The transport used to perform proxy requests.
    // If nil, http.DefaultTransport is used.
    Transport http.RoundTripper

    // snip
}

The Transport is of type http.RoundTripper, which is yet another interface:

type RoundTripper interface {
    // snipped a big ole comment
    RoundTrip(*Request) (*Response, error)
}

Since we didn't define a Transport, the DefaultTransport is used. The code within it is too complex to paste here - it has a bunch of responsibilities. It's interesting to look at, I suggest you do!

The basics of it, however, are that is makes a round trip! It figures out where the request needs to go, sends it, and then gets the response. How it sends it involves making a TCP connection, handling TLS, and more. Receiving the response may involve waiting for a streamed response to complete.

Round Tripper

The main logic for that is in the Transport's roundTrip() method. The Transport struct's method RoundTrip() is kinda/sorta hidden in http/roundtrip.go, which doesn't compile in JS/WASM contexts (hence method roundTrip() - lower case - being where the real logic is. RoundTrip() just calls roundTrip()).

The roundTrip() method makes some checks, gets a persistent connection object (also defined in that same file), and then calls roundTrip() on that connection. The persistent connection is a connection to the upstream server.

This is actually the more complex logic - concurrently writing the request to the upstream server while also reading for a response that may come before the full request is even sent.

We Don't Need to Care Yet

In any case, there's a lot going on in there! We don't need to really care right now, but I think some advanced features might have us digging into the Transport.

Next, let's make our own reverse proxy! We'll start simple, and then add some nifty features on top of what the stdlib provides for us.