Node.js, Part II: Spelunking in the Code

Mike Perham ·

In my last post, I gave a quick overview of Node.js and showed you how to install and smoke test it. Now let’s dive deeper and learn what it provides and how it works.

Google’s V8 JavaScript engine provides the underlying VM for executing JavaScript code. Since V8 is typically embedded in a browser, it does not expose any real notion of FileSystems, Processes and I/O streams that are fundamental to a server-side application. Node.js provides core APIs that wrap the standard POSIX functions available on all modern Unixy systems, along with higher-level APIs for HTTP/S support.

Node.js is designed to be an asynchronous system: your application code executes only on a single thread and no blocking I/O is allowed. Instead, Node.js uses an event loop to process I/O events via the libev C library.

Let’s dive into the Node code to see how it initializes itself. This writeup is current as of 0.4.2 or 3/14/2011 but obviously things will change over time.

Process Initialization

Node initializes itself in the node::Start method in node.cc. node::Start performs the housekeeping necessary to use V8 and libev in a well-behaved process. A brief walkthru:

  • Line 2240-2275: Parse the cmd line arguments, dividing them up between node and v8.
  • Line 2280: Setup signal handlers so Ctrl-C will actually terminate the Node.js process.
  • Line 2290-2330: Configure various libev watchers, include the Idle detector which uses a simple hueristic to determine when it would be best for V8 to run garbage collection.
  • Line 2332: Initialize V8!
  • Line 2338-2364: Debugging support
  • Line 2367-2374: Call node::Load to bootstrap the Node.js APIs, see below
  • Line 2385: The key line: pass control to libev event loop. This function will not return until the application code decides to exit the process based on some event or SIGINT or SIGTERM are received.
  • Line 2389-2405: Process is exiting. Execute any “exit” event callbacks in the application code and return.

Bootstrap

node::Load performs the actual Node.js API bootstrap in three stages:

  • Stage 1: Using C++, it sets up a process object and fills it with various constants and methods, so your JavaScript code can access “process.argv”, “process.env”, etc. As far as I can tell, the most fundamental part is the Process.binding function which allows JS code to lazily bootstrap the C++ bindings. For instance, this line to fetch the ‘fs’ binding actually causes the code within node_file.cc to expose various POSIX file functions to the JavaScript module.
  • Stage 2: It executes the JavaScript in src/node.js. This code sets up the various I/O streams (stdin, stdout and stderr) and creates the JavaScript module loading system which knows how to find JavaScript files in lib/ when require(module) is called.
  • Stage 3: Node.js executes the user’s code, via debugger, repl or user-provided script file. Remember this script doesn’t block so it will load quickly and return back to node::Load.

Once the user’s code has been loaded, we should have set up any listening sockets, timers, etc. The entire node::Load function unwinds and we enter the event loop, waiting for I/O or Timer events to execute code.

Thanks for staying with me this far – I hope you learned something new. Next up, I’ll show you how to configure and run a full stack application on node.js using MongoDB and Express!