Notes from Mike Perham

Configuration for Rails, the Right Way

By on in Everything Else

I still see people promoting various gems and plugins to handle miscellaneous configuration elements for your application. One little known secret is that Rails 3 allows you to define your own configuration elements trivially.

In this case, I wanted to use the nifty wkhtmltopdf utility to create a PDF. I was able to call the binary just fine with Homebrew on OSX but found that I had to use a custom binary checked into git for our production environment on Heroku. So I created a configuration variable to store where wkhtmltopdf could be found in the current environment.

First, we define a default value for all environments in config/application.rb:

module Configurator
  class Application < Rails::Application
    # By default, let OSX resolve the path to the binary
    config.wkhtmltopdf = "wkhtmltopdf"
  end
end

Then we override the default setting as necessary in config/environments/:

Configurator::Application.configure do
  # Settings specified here will take precedence over those in config/application.rb

  # Point Heroku explicitly to the binary we need to use
  config.wkhtmltopdf = "#{Rails.root}/bin/wkhtmltopdf"
end

Lastly, we access the configuration element in our code:

  cmd = [Configurator::Application.config.wkhtmltopdf, url, tmpfile.path]

Yes, that’s it. Just use Rails’s environment support and config to store your own configuration elements. They’re trivial to set, trivial to access and require no third-party gems or custom text files.


Up and Running with Clojure

By on in Web

For the last three years or so, Clojure has been a language that I admired from afar: the design of the language is wonderful but I’ve never really used it to build anything and haven’t looked closely at the language in a while. Recently we had a Carbon Five tech showdown between Node.js and Ruby to see which system could pump out “Hello World” as fast and as consistently as possible. Since then we’ve added a Go version that impressed us a lot.

But we’re missing a JVM-based entry which gives me a great excuse to dive into the Clojure world and learn how things work.

Continue reading …


A Modern Guide to Threads

By on in Web

I spoke recently at Rubyconf 2011 on some advanced topics in threading. What surprised me was how little experience people had with threads so I decided to write this post to give people a little more background on threads. Matz actually recommends not using threads (see below for why) and I think this is a big reason why Rubyists tend not to understand threading.
Continue reading …


Improving Resque’s memory efficiency

By on in Web

Resque is a very popular message queueing system for Rails applications.  Here’s how I recently improved the memory efficiency of a Carbon Five customer’s resque processing farm by 68x!

The Problem

This customer has an existing investment in Resque and is a heavy user of a third-party Java API so they need to run their resque workers using JRuby.  Unfortunately this gets them the worst of all worlds:

  • JRuby does not fork so they don’t get the benefit of memory isolation by working in a child process
  • The JVM is relatively demanding in terms of memory
  • Resque is single threaded

To scale to the levels they wanted to get to, they projected that they would need to run hundreds of Resque processes, each consuming 512MB.  Insanity! If your problem requires a lot of concurrency to solve, using lots of large processes is a terrible idea.

The Solution


Figure 1 – The improvement is obvious. I’m still trying to figure out how to provision 2.25 machines though.

To fix this I spent a few days modifying Resque to use multiple threads when on JRuby. The changes were relatively straightforward; Resque was already thread safe so no brain surgery was required. There were three major changes:

  1. Modify the main processing method so it spawns N threads, each of which run the work loop
  2. Modify the redis connection to use a connection pool so the connection does not become a point of contention with lots of threads
  3. Modify the signal handling so Resque can shutdown gracefully (applications cannot use SIGQUIT in JRuby)

Before my changes they were running 9 machines with a total of 135 processes with 512MB of RAM each for each test run. Subsequent testing has shown that a single JRuby process with 135 threads and 1GB of RAM performs just as quickly so they’ve gone from 68GB to 1GB and needing several machines to just one. Now instead of a large processing farm, they just need a small garden. :-)

You can find my modified Resque project on github.


Think Globally, Stage Locally

By on in Ops

Or: how to create and deploy to a staging environment running locally!

Staging: an environment that duplicates production as closely as possible to find any lingering bugs before you update production. Most of the Rails community develops on OSX but deploys to Linux; this can be fragile since it is common to forget Linux-specific environment changes necessary for your app. At Carbon Five, most of our customers can afford to maintain a dedicated staging environment but for smaller projects, I wanted to have my own Linux staging environment without the cost of a real slice or EC2 instance.

In this post, I will show you how to create a Linux VM with Vagrant and use capistrano to deploy to your vagrant VM. My coworker Jared recently posted an nice intro to Vagrant, a great project by Mitchell Hashimoto to simplify and automate the use of virtual machines during development. You should read his post first as I’m not going to cover Chef, which I highly recommend for automating the provisioning of your VM. Using Chef means that your staging and production boxes can be virtually identical by using the exact same recipes to build them.

Let’s assume you have a Rails 3 app for which you want to create a staging VM. We’ll install Vagrant and configure it in the project like so:

  cd myapp
  # NOTE: Make sure you've installed VirtualBox first!
  gem install vagrant
  # Downloads a blank Ubuntu 11.04 64-bit image
  # Or find your own box on http://vagrantbox.es
  vagrant box add ubuntu-1104-server-amd64 http://dl.dropbox.com/u/7490647/talifun-ubuntu-11.04-server-amd64.box
  # Adds a Vagrantfile to your Rails app which talks to the new image
  vagrant init ubuntu-1104-server-amd64
  # Starts the new VM
  vagrant up
  # Adds SSH details to your SSH config so Capistrano can deploy directly to your VM
  vagrant ssh-config >> ~/.ssh/config
  # Logs into your new VM
  vagrant ssh
  # Perform a lot of Chef recipe work
  # ...Left as an exercise to the reader...

This whole process, minus the box download, should take a minute or two. Remember that you will need to do a bunch of Chef work to install the stack your application needs (e.g. Unicorn, JRuby, etc). Once that that is done, let’s work on the Capistrano configuration. In this case, I’m using the capistrano-ext gem to add multiple environment support so we can deploy to production or our new staging VM:

# config/deploy.rb
set :stages, %w(staging production)
set :default_stage, "staging"
require 'capistrano/ext/multistage'
require "bundler/capistrano"

set :user, 'vagrant'
set :application, "myapp"
set :deploy_to, "/home/#{user}/#{application}"
set :repository,  "git@github.com:acmeco/#{application}"

set :scm, :git
set :branch, "master"
set :deploy_via, :remote_cache
ssh_options[:forward_agent] = true

And in config/deploy/staging.rb:

# 'vagrant' = the hostname of the new VM
role :web, "vagrant"
role :app, "vagrant"
role :db,  "vagrant", :primary => true
set :rails_env, 'staging'

The secret sauce was in the vagrant ssh-config command, which configured ssh so it knows how to log into your new Vagrant VM. Now all we need to do is a simple cap staging deploy and Capistrano will use ssh to connect to the VM and have the VM pull your latest changes from your github repo.

Once deployed, you can tell Vagrant to forward your application’s port in the VM to a localhost port. In my case, I have Unicorn running on port 5000 in the VM, forwarded to port 8080 on localhost in OSX. With this configuration in my Vagrantfile, I can browse to http://localhost:8080 to hit my Rails app running in the VM.

  config.vm.forward_port "unicorn", 5000, 8080

Final note: if you have trouble contacting github, make sure you are running ssh-agent to handle key requests from the VM. This will allow the vagrant user in the VM to act as you when contacting github: run ssh-agent && ssh-add on your local machine (NOT in the VM).

Most of this post is typical Vagrant and capistrano configuration. With just a few simple tricks, we can tie the two together for great victory and hopefully more stability for your site. Good luck!


Asynchronous Processing with girl_friday

By on in Web

I want to introduce you to my new gem, girl_friday. The problem: current asynchronous processing tools with Ruby are too inefficient and too complex.

Efficiency

It’s sad to admit but commonly with Ruby if you want to process 5 messages at the same time you have to spin up 5 processes, each of which boots your application and loads your code into memory; if each process is 100MB, that’s 500MB, most of which is redundant code.

Threads are the best answer, long term. Threads are hard to get right if you are managing them yourself but they are simpler to use than Fibers and give us real parallelism in JRuby and the upcoming Rubinius 2.0 release. Ruby 1.9’s threading isn’t quite as good but it is still useful for typical IO-heavy, server-side systems. girl_friday uses Actors for safe and simple concurrency on top of Ruby threads. With actors, we get the benefits of threads with fewer drawbacks! Since we are using multiple threads in a single process, the memory overhead is far less than booting another process.

Complexity

With girl_friday, your queue processing happens in-process with the rest of your application. You don’t need a separate project, deployment, process monitoring and alerts, etc. If your application is running, so is girl_friday.

Usage

You define your queues and how to process incoming messages when your application starts:

    # config/initializers/girl_friday.rb
    EMAIL_QUEUE = GirlFriday::WorkQueue.new(:user_email) do |msg|
      UserMailer.registration_email(msg).deliver
    end

then just push a message hash with your data onto the queue to be processed:

    EMAIL_QUEUE << { :name => @user.name, :email => @user.email }

Dead simple by design.

Design

Each queue in girl_friday is composed of a supervisor actor and a set of worker actors. The supervisor actor is the only one that manages the internal state of a queue. It receives work to perform and hands work to workers as they become available to process more work. If you are interested in the nitty-gritty detail, here’s the WorkQueue class to peruse.

Advanced Options

girl_friday has a number of nice options built-in already:

  • Send worker errors to Hoptoad Notifier or a custom error processor
  • Persist jobs to Redis so a restart does not lose queued jobs
  • Asynchronous callbacks (call a block with the result when the message is processed)
  • Runtime metrics for monitoring
  • Clean shutdown (stop processing new jobs)

See the girl_friday wiki for more specifics about each of these options.

Caveats and the Future

girl_friday supports Ruby 1.9.2, Rubinius 1.2.3 and JRuby 1.6.0 and above. Moving forward, I’d like to see a web UI added for girl_friday, like Resque’s web UI. If you want to help out, please fork the Github project and send pull requests!


Concurrency with Actors

By on in Web

Programming concurrent code with threads and shared state is hard to get right.  Actors are an attempt to build a safer concurrency model for application developers to use.  Erlang uses the actor model as the basis for its concurrency and while Ruby doesn’t have actors built into it, actors can be layered on top of Ruby threads.  In this post, I want to introduce you to actors with some examples.

The core idea behind actors is message passing. Your concurrent code does not share variables, but rather sends messages to each other and those messages contain a copy of the state to be processed. Since nothing is shared, you don’t need locks and don’t have race conditions. Simple idea but powerful in practice! Rubinius has an actor API built into it, let’s take a look at some examples using that API.
Continue reading …


Node.js, Part III: Full Stack Application

By on in Web

In my previous posts, I introduced you to Node.js and walked through a bit of its codebase. Now I want to get a simple, but non-trivial Node.js application running. My biggest problem with Node.js so far has been the lack of substantial examples: if I see one more Hello World or Echo Server, I’ll flip my lid (side note: I found the same thing for Ruby’s EventMachine so I created my evented repo). By far the best resource I’ve found for learning Node.js has been DailyJS, highly recommended!

So I’ve spent the last few weeks building Chrono.js, an application metrics server. Chrono is still under development so it’s really not appropriate for general usage but it makes for a decent example app.
Continue reading …


Node.js Overview

By on in Web

I was a Java guy for 10 years and I’ve been a Rubyist for the last 5 years. Over the years, I’ve tried to develop expertise in a particular area of technology that will both pay the bills and make me happy as a programmer while also watching for upcoming changes in the tech world. I often find myself diving into a particular technology just to get my hands dirty and get a feel for its strengths and weaknesses. As my JavaScript skills have always been weak, I’ve decided to deep dive into Node.js to understand what it does well and improve my JavaScript skills at the same time.

For this post, I’m just going to cover the basics; I’ll follow up soon with deeper posts.
Continue reading …