Taking Advantage of Multi-Processor Environments in Node.js

Posted on by in Web

Node.js has more than proven itself capable of handling
multiple events concurrently such as server connections, and all without
exposing us to the complexities of threading. Still, this locks
our apps down to a single process with a single thread of execution
consuming a single event queue. On a machine with a single processor, this
is no big loss; there is only one active process in any case.

But we live in a multi-core world now and out of the box Node does not take advantage of this,
though it certainly has the ability to.tldr »

The “Problem”

To illustrate why this may be a problem for some applications, let’s
turn to a multi-player game system we recently released.

vimtronner is a vim trainer built atop Node.js and
Socket.io that allows multiple players
to remotely connect to a server and compete against each other. More
importantly, it can host many games at the same time. Each game uses
setInterval to update its state inform all its players of
changes every 100ms.

Except that is not entirely true.

As my colleague Erin explained
in his post on the JavaScript Event Loop
there is only a SINGLE queue of events that our single-threaded process
works its way through. The setTimeout and setInterval function don’t actually
RUN their callbacks at the specified time intervals. They simply ENQUEUE them
at that time, an important distinction.

When there are no events in the queue ahead of it, the callback is executed
more or less immediately, so there does not seem to be a problem. This
is effectively the situation when only a single game is running on a server.

But imagine if each game of vimtronner takes 10ms to update and broadcast its state
(which would be generous). When two games are running, it will take 20ms to process
through both games, leaving 80ms before the next updates are re-queued. At three games
this becomes 30ms, at four 40, and so on.

At 10 games we hit our problem point. The time taken to update all
the games matches the time interval before new update callbacks are
events. If just one more game is started, the time before each game next gets updated
will be DELAYED by 10ms from the expected 100ms. This worsens as more games
are added so a server running 20 games will take 200ms to update all the games before
it is even able to process the next set of update events. ALL games are slowed down by half!

This also does not even take into account the other events that are queued in the
system from players joining and leaving games, asking for game lists, or
even responding to simple controls from socket events.

Games don’t actually interact with one another so it makes no sense
at all that they should block each other. Ideally each game should
have its own event loop and queue. Additionally, we want to
minimize the impact taken handling socket events. And on a multi-core
box dedicated to running just the server, we are wasting the processing
power that will allow us to fulfill those needs.

So how can we maximize multi-processor environments to
parallelize tasks? Node’s about page directly supplies
the answer:

You can start new processes via child_process.fork() these other
processes will be scheduled in parallel. For load balancing incoming
connections across multiple processes use the cluster module.

Let’s take a look at how and when to use these two modules.

Cluster to parallelize the SAME Flow of Execution

We’ll begin with the cluster module. Introduced around version 0.8,
its stated purpose is to handle heavy workload by launching a cluster of
Node processes. Additionally, these processes can share the same server
ports, making it ideal for web applications.

The use of this module is very easy, revolving around determining if the
current Node process is the “master” who can launch “workers” with a
call to cluster.fork(), or one of many “workers” who are all expected to carry
out the same work. This is illustrated in the code below.

Let’s write a program that calls the above named
cluster_example.coffee:

I can then run it on my quad-core MacBook Pro and get the following output:

Reading through the lines we can see 8 workers were launched. But more
importantly notice the repeated output surrounding the master and worker
declaration lines. The “Before the fork” and “After the fork” came from
the launch code itself, but more interesting is the repeated “Launching
cluster”. This was from the MAIN example code not the launcher. It tells us that
when we fork a cluster, we are running through the SAME program from the
BEGINNING.

This is what makes the cluster module ideal for parallelization of the
SAME work across many Node processes. The code will go through the same initialization.
You could introduce variation aside from the
differing “master” vs “worker” behavior into the mix if you felt like it, but this
would go against its intended purpose.

You can see this in the common example of load balancing connections on a
Node server instance:

Each worker process will start up a server and listen to the same port,
a further feature of the cluster module.

Child Process a DIFFERENT Flow of Execution

Reading how cluster works, you will discover it
sits atop the other module we are interested in:
child_process.

The module supplies a number of methods to coordinate the launching of processes
and communication between them. While the exec and
spawn methods
allow calling external commands, of interest to us is again the
fork
function. When we call it, we pass the full path to a Node module we wish to run,
as seen in this code below:

As before let’s write some example program that calls this launch code:

Running this results in the following output:

Unlike the cluster example, we DON’T see the repetition of the “Launching”
message, or the “Before” and “After” messages surrounding the fork.
Child processes launched this way BEGIN with the referenced module itself. We
don’t go through any of the same code as the parent process, unless
explicitly required by the called module. Basically it’s the way to go when we want to run
processes independently with different initialization and concerns.

This does not mean there is no way for the parent and child processes to
coordinate with each other. There are standard mechanisms like piped
streams or external messaging queues. But forked Node processes have an
additional avenue; a built in Inter-Process Communication channel.
Simple values and objects can be passed through this channel via the send functions
on either the child_process instance or the
process module for either the parent or child process respectively.
These objects arrive as 'message' events on the other side.

This is illustrated in the “processified” version of the
vimtronner game we released a month ago. Instead of the
server managing games directly like it use to, it forks a child process
for each game, sends configuration into it and waits for messages back.

Likewise, a new ‘game_process’ module now wraps a game instance,
responding to events from players sent from the server process and
sending back game events.

A final note. In addition to sending an object, the send allows the
transmission of handles like TCP servers and sockets between process. It
is through this mechanism that the cluster functionality was created.

The Downside

There are some issues to keep in mind when taking advantage of the
forking functionality. While Node processes are considered “lightweight”
they do consume resources when starting up:

These child Nodes are still whole new instances of V8. Assume at least
30ms startup and 10mb memory for each new Node. That is, you cannot
create many thousands of them.

Likewise, we while we can certainly run more things in parallel we are
still ultimately CPU bound. A multi-core processor can only run as many
processes as threads of execution it can throw at.

Finally, when clustering servers, we
must be aware that though the cluster can handle connections to the same
endpoint, each worker is only aware of the connection it handles. So if
two connections come in that are suppose to interact with other but are
handled by different workers, the interaction can never take place
without the support of other systems like Redis message queues or shared
storage.

tl;dr

  • Use either the child_process or the cluster modules to take
    advantage of multi-processer environments.
  • Use cluster when you want to parallelize the SAME flow of execution
    and server listening.
  • Use child_process when you want DIFFERENT flows of execution
    working together.
  • Take advantage of built in Inter-Process Communication to pass
    objects between the processes.

Feedback

  Comments: 22


  1. Timely write-up for me, thanks Rudy

    > A final note. In addition to sending an object, the send allows the transmission of handles like TCP servers and sockets between process. It is through this mechanism that the cluster functionality was created.

    Can you expand on this? At what point in the socket setup are you passing it to the child process and what are you passing? If namespacing, does that happen in the child, or in the parent process? I’m also reading that it can be passed, but also happen “by magic” if requested connection info matches.


    • Thanks for the comment and the question!

      When you are using child_process alone, you have to yourself pass the sockets from the parent process where the server is to the child via a send. And by “socket” we mean TCP socket or any system IO handles (file reader, etc.) That is why they can be passed between processes. Socket.io websockets are actually higher order objects that can’t be passed; I’ve tried this with no luck. :-/

      The only “magic” involved is when you fork via the cluster module. In that case, each worker is written as if they are starting their own server. But when you read the documentation what Node.js is actually doing behind the scenes is checking if any worker process has spun up a server listening on the same port and if they have simply distributes some of the connection events back to it.


  2. started reading this. then saw it was in coffeescript. waste of time


  3. Thank you for the nice text! It’s pleasure to read it!

    I could also mention, that there is problem with load balancing between workers. In current versions of Node only few workers actually are under load, and other get nearly nothing. It will be solved in Node v0.12 (http://strongloop.com/strongblog/whats-new-in-node-js-v0-12-cluster-round-robin-load-balancing/).

    And maybe I can promote here ;) nice guys from russian search engine company Yandex, who develop good alternative for cluster.js which can help to solve this problem. https://github.com/nodules/luster


  4. Fucking coffeescript, jeez


    • I know, right? I mean it’s so easy to read and can easily be pre-compiled into Javascript, it must be “the worst”. And let’s not ignore the fact that my using Coffeescript HAS NOTHING TO DO WITH WHAT I AM TALKING ABOUT.

      Thanks for being a great example of missing the forest for the trees.


  5. You should put up the compiled JS code too, to see the kind of monstrosity writing JS like Jeremy Ashkenas did 4 years ago does to your code.


    • Thanks for showing me the error of my ways in working in a language that is four years old, continually updated, and is part of the Rails stack. Next time I will be sure to write my work up in LiveScript or ClosureScript. ;-)


  6. Rudy is salty about coffeescript comments. Whatever Rudy, I’ve never written coffeescript but can still read your post and tell what is going on, do what you like.

  7. Pieter Michels


    This is one of the best guide regarding the subject.
    We’ve implemented the technique in an existing game in about an hour.
    Works like a charm, and all of our cores are in use. Finally :)

    Thanks!


  8. Never mind the jokers raving about coffeescript, you helped me out a lot. Could have been written in Sanskrit and I still would have gotten the gist of it.


  9. Please excuse my ignorance with the following question. Would it possibly be a better design to have all child processes handle all games? Child procs could communicate with each other and the clients via inter proc communication and share the work evenly. Then you wouldn’t be spawning a process per game, but a process per CPU.


  10. To all the people complaining about Coffeescript: 1. do yourself a favor, try it! 2. if can’t or don’t have time just convert it on http://js2coffee.org/. It will take not more than one second!


  11. Jeez… A lot of Coffescript haters! Personally I don’t like Coffeescript much but that doesn’t prevent me from finding this article f-ing great! Really good explanations on the concept! Thank you. For me the Coffescript was distracting (at first) but not more than I actually understood the concept and I really feel I can benefit from this article. So +1, like and whatnot from me :)


    • Grrrrrr! That’s….really easy to read…and probably to type… grrr…


      • I wouldn’t use this article to copy paste from. But that is not its purpose either. Again I don’t like Coffescript, but the CONCEPTS described here are great. Besides it is better to write your code yourself so you actually learn from it instead of just copy/pasting stuff. So take it as a challenge rather than bitching about the language chosen her in this article… I know I do…

Your feedback