Special Processes in OTP

Thomas Fisher ·

On a recent Elixir project, I needed to test some asynchronous behaviour. Doing so led me to learn about the basics of special processes in OTP.

Our project was using Phoenix Channels and had a need to keep track of all connected socket processes. We could have used the upcoming Phoenix presence feature, but we didn’t quite need all of its features. We settled on using a simple GenServer process that would monitor each connected socket.

Here are the relevant bits of code:

The server allows a calling process (in our case, the connected socket) to subscribe to it or to get a list of all subscribers. It uses Erlang’s process monitors to watch its subscribers and removes a subscriber from its state when one goes down.

I set out to test this server. In particular, I wanted to test the cleanup code that runs when the monitor goes down.

This test doesn’t work consistently – it fails on the last assertion. The problem is due to a race condition. Here’s a diagram of what we expected to happen:

Race-Condition

Unfortunately, we can’t predict the order that the down message and the second “subscribers” message are received in. Monitoring test_child within our test doesn’t work; we don’t know the order that monitors will fire in – and using Process.sleep/1 is a bit of a hack.

There’s a better option – the sys module. Before we get there, though, let’s take a quick detour to look at how processes work, and what makes an ordinary process different from an OTP-compliant, “special” process.

Common Processes

Let’s review the basic building blocks for processes in Elixir. Using spawn, it’s incredibly easy to start one:

Of course, a process isn’t particularly useful when it doesn’t communicate outside of itself. We can use message passing to accomplish this:

Often, we’d like a process to run indefinitely, rather than doing a fixed amount of work. We can use tail recursion for that:

At this point, we’ve got a functioning server process, but isn’t particularly robust. Not surprisingly, there are a ton of details that we need to implement to get asynchronous programming right.

OTP to the rescue!

Luckily, Erlang provides the amazing Open Telecom Platform (OTP) framework to help us with these types of problems. It builds upon common processes to create sophisticated, robust, generic behaviours, so we can focus on implementing our application logic, rather than the nitty-gritty details of writing a concurrent program.

OTP’s design principles introduce the concept of a special process. A special process can be supervised, debugged, and upgraded in a standard way. GenServer and Supervisor already conform to this specification, but the sys and proc_lib modules let us make our custom process conform to it as well.

Let’s take a look at how we can use these modules to make our process work with OTP:

Getting started

First off, we can’t start a special process by using the built-in functions spawn or spawn_link. OTP provides a module called proc_lib intended for this purpose:

We call :proc_lib.start_link, and pass the module, function, and arguments for the initial call for the new process. Since we’re doing a synchronous start (the parent process is waiting for us to complete initialization), we need to call :proc_lib.init_ack to signal that we’re ready. We’re also calling :sys.debug_options to create something called a debug structure – we’ll come back to this later.

Let’s adapt the main process loop to handle system messages:

Our loop’s receive now matches two patterns – system messages are received as {:system, from, request} tuples. We don’t have to interpret these messages – :sys.handle_system_msg does that for us. Note that this function does not return – the sys module will call back into our code via its system_continue or system_terminate functions, as appropriate.

Debugging

The sys module exposes a number of functions that let us debug a special process. These functions send system messages to our process and update the debug structure we created earlier. If we’d like to take advantage of this debugging, we have to add a bit more code to our receive:

The :sys.handle_debug function instruments our incoming and outgoing messages, and the configurable debugging structure gives users of our process flexibility in debugging it.

Now, a user of our special process can debug it using the sys module at runtime. Here’s an example iex session:

Now that we’ve come this far, there’s a disclaimer: we only did this to see how special processes work. If we wanted to put our server into production use, we’d be far better-off using Elixir’s GenServer module.

Coming full-circle

Let’s revisit our original problem. We were using a GenServer to track subscribed processes. It turns out that GenServers, like all OTP behaviours, follow the special process specification. We can see this in action by using sys to debug it:

We can use this debug functionality to fix the race condition in our server test. sys lets us install a custom handler that will fire with every event our GenServer logs. We can use that handler to send a message to our test process, signalling that it’s safe to continue. Here’s our new test case:

Our custom debug function lets us detect when the server receives the down message!

Wrapping up

Elixir is a pleasure to work with – it’s straightforward to learn, productive, and a joy to write – but underneath everything you do lies OTP, which is deeply nuanced. To learn more about it, I recommend reading Designing for Scalability with Erlang/OTP. If you’d like to explore the code from this post, it’s available on GitHub.