Start Testing Your Migrations. (Right Now)

Migrations are a necessary part of any Rails project. At some point, you need to modify the structure of your database, normalize data that exists within those tables, or refactor the architecture entirely. The migrations for these tasks can vary in complexity from simple “add_column” statements to extreme SQL with nested joins and subselects.

To be blunt, you can fark your data pretty easily by fat-fingering a migration. So how do we avoid this?

In our daily routine as Agile developers, we follow a simple workflow to avoid these situations. You write a test before you touch a single piece of code, then watch it fail, then write code and repeat until your tests pass. So why not start writing tests for your migrations?

Suggested Workflow

I currently use RSpec for testing. We have a default structure in our spec folder for models, controllers, etc. Let’s try to come up with a pattern for migrations that mimics the existing behavior. To make it more straightforward, let’s also use a real-world example: Adding a column to the “users” table called “favorite_color”:

  1. Create your migrations folder for specs (if it doesn’t exist):

    mkdir spec/migrations

  2. Create an empty migration:

    rails g migration AddFavoriteColorToUsers
    # => db/migrate/20110127192508_add_favorite_color_to_users.rb

  3. Get the current migration version (you will use this in your test):

    rails runner 'puts ActiveRecord::Migrator.current_version'
    # => 20110126212851

  4. Create your migration spec:

    touch spec/migrations/add_favorite_color_to_users_spec.rb

    • Load the required files:
              load 'spec/spec_helper.rb'
              load 'db/migrate/20110127192508_add_favorite_color_to_users.rb'
            
    • Setup your spec:
              describe AddFavoriteColorToUsers do
                before do
                  @my_migration_version = '20110127192508'
                  @previous_migration_version = '20110126212851'
                end
                pending "describe up"
                pending "describe down"
              end
            
    • Describe your “up” migration:
              describe ".up" do
                before do
                  ActiveRecord::Migrator.migrate @previous_migration_version
                  puts "Testing up migration for #{@my_migration_version} - resetting to #{ActiveRecord::Migrator.current_version}"
                end
      
                it "adds the 'favorite_color' column to the users table" do
                  expect { 
                    AddFavoriteColorToUsers.up 
                    User.reset_column_information
                  }.to change { User.columns }
                  User.columns.map(&:name).should include("favorite_color")
                end
              end
            
    • Describe your “down” migration:
              describe ".down" do                                                                                                                
                before do
                  ActiveRecord::Migrator.migrate @my_migration_version
                  puts "Testing down migration for #{@my_migration_version} - resetting to #{ActiveRecord::Migrator.current_version}"
                end
      
                it "removes the 'favorite_color' column from the users table" do
                  expect {
                    AddFavoriteColorToUsers.down
                    User.reset_column_information
                  }.to change { User.columns }
                  User.columns.map(&:name).should_not include("favorite_color")
                end
              end
            
  5. Get on TDD with Migrations! (Red,Green,Refactor)

Of course, you’ll need to dig into ActiveRecord internals to know how to write your tests. Then again, there’s nothing wrong with a deeper understanding of the framework we all use every day.

At this point, it may seem pointless to write a spec for this migration. Perhaps you’re right. All we’re doing is adding and removing a column from the database. But migrations aren’t always so straightforward. Sometimes your migrations execute advanced SQL for data normalization. Sometimes, they do things like persisting tertiary relationship ids to local columns, deleting stale records based off of nested selects and multiple joins, or removing duplicated records that have made their way into your production database.

Let’s dig in a little deeper, and see if I can’t convince you to start testing your migrations, right now.

Example Scenario: Delete Stale Records

Let’s take the following scenario and look at a migration that is just dying for a test.

Delete Alerts for Comments on Soft-Deleted Blog Post

Here, we have a blog application written in Rails (how original of me) with the following properties:

  • As an Author I can create a Blog Post
  • As a Reader I can comment on that Blog Post
  • As an Author I receive an Alert when someone comments on my Blog Post
  • As an Administrator I can “Soft Delete” any Blog Post

The bug we find in this application stems from the soft deletion of the blog post and their associated alerts. Users are seeing alerts for comments on soft-deleted blog posts after we introduced this change. We’ve written some new code to ensure that alerts on “deleted” posts are destroyed, but we also need to write a migration to remove the stale alerts from the database. This is made a little more complicated by the fact that the “alerts” table backs a polymorphic STI “Alert” model.

The Test (RSpec and Mocha)

load 'spec/spec_helper.rb'
load 'db/migrate/20110127220314_delete_alerts_for_deleted_posts.rb'


describe DeleteAlertsForDeletedPosts do
  before do
    @my_version = "20110127220314"
    @previous_version = "20110126212851"
    
    Alert.delete_all
    Post.delete_all
  end

  describe ".up" do
    before do
      ActiveRecord::Migrator.migrate @previous_version
    end

    context "given some deleted posts and some active posts" do
      before do
        5.times { Factory(:soft_deleted_post) }
        5.times { Factory(:post) }

        @deleted_posts = Post.where :deleted => true
        @active_posts = Post.where :deleted => false

        @deleted_posts.should have(5).items
        @active_posts.should have(5).items
      end

      context "when there are alerts for all the posts" do
        before do
          @deleted_posts.concat(@active_posts).each do |post|
            Factory :alert, :alertable => post
          end
        end

        it "deletes alerts for the deleted posts" do
          DeleteAlertsForDeletedPosts.up
          Post.where(:deleted => true).map(&:alerts).flatten.should be_empty
        end

        it "does not delete alerts for the active posts" do
          DeleteAlertsForDeletedPosts.up 
          Post.where(:deleted => false).map(&:alerts).flatten.should_not be_empty
        end
      end
    end
  end

  describe ".down" do
    it "does nothing" do
      true
    end
  end

  after(:all) do
    # Return to latest state
    ActiveRecord::Migrator.migrate "db/migrate"
  end
end

The Migration

 class DeleteAlertsForDeletedPosts < ActiveRecord::Migration
  def self.up
    execute <<-EOS
    DELETE FROM alerts
     WHERE alerts.alertable_type = 'Post'
     AND alerts.alertable_id IN (
      SELECT id FROM posts WHERE deleted = 1
     )
    EOS
  end

  def self.down
    puts "Previous migration removed stale alerts, there is nothing to do in this .down"
  end
end

Before I started testing my migrations with specs, I would do this by starting a Rails console and manually running each of these scenarios, then dropping back to the shell to run my migration, then back to the console to check the results. Perhaps you’re OK with doing things that way. To me, I find it wasteful and prone to error. By having the setup and teardown in my test’s “before” and “after” blocks, I can quickly make changes and verify the success or failure of my migration.

Plus, I’ve really grown to like the colors RED and GREEN.

Summary

Others may argue that migrations are not Ruby code and don’t need to be tested. Some will say that migrations are so straightforward that they don’t need to be tested. Both of these are valid points, and perhaps this is a bit of overkill. I am still experimenting with this process and trying to find out where these tests belong. I still have some questions about how to approach utilizing these tests in the full lifecycle of an application. But I’m a perfectionist, and I’ve seen cases where untested migrations cause very unexpected results. To me, migrations that can munge data on a production system should have some level of testing. There should be a flag (RED) that shows us developers that our migration doesn’t do what we expected it to do. I can’t think of a better solution for this than to start testing your migrations. Right now.

Let me know what you think. Comments, questions, suggestions are all appreciated. Keep in mind that I am an opinionated developer and my views do not necessarily always reflect those of Carbon Five.

This entry was posted in Process. Bookmark the permalink.
  • Pingback: Start Testing Your Migrations. (Right Now)

  • Jared Carroll

    Interesting idea. I’ve never tried to TDD my migrations before.

    For migrations that introduce structural changes (add/remove/change columns, create, drop tables, etc.) I always do rake db:migrate and then rake db:rollback manually verifying the changes after each task. For migrations that migrate existing data I take the same approach but with a dump of production data. However both approaches are tedious and would be much quicker and easier to develop by TDD’ing the migration against some test data.

    Would love to see this pulled into a gem, so the rails generator for a migration would create a corresponding spec file. The #teardown of migration specs could get tricky to ensure the database is always reverted to a known state especially when running multiple specs via rake. Maybe use a separate db for migration specs?

  • http://christianbradley.wordpress.com Christian Bradley

    I like the separate database idea… that would solve a lot of the issues with database state that could arise while running these as part of your full rake.

    In addition, it may actually make sense to keep these specs out of source control… Something to use as you’re developing to speed up the process and ensure your migrations have some level of testing, then tossed once you merge into master.

    As far as pulling it all into a gem… I’m all for it. Let’s get on that.

    • Rudy Jahchan

      I think that having a separate database environment or keeping them out of source control are not just possibilities, they’re necessities! Though you should really only use one of them, as they’re only needed to solve the same concern over database state or structure. I would side would a separate “test-migration” environment if we do a gem, vs continually having to create then throw away migration tests from source control.

  • http://henrik.nyh.se Henrik N

    Since they’re kind of throwaway tests that are only meant to guarantee that the migration does what it should until you run it, perhaps something more lightweight would be better.

    Could be useful with a plugin that lets you add a test straight inside the migration file in some minimal syntax, and the test runs automatically before the migration does and aborts the migration if the test fails.

  • Christian Bradley

    @Henrik – Great Suggestion! I am still experimenting with this but have found it more useful for complicated data normalization migrations etc. In this case, it feels pretty comfortable as a spec. However, I could see that spec testing that the “up” migrations are rolled back when they fail as well.

    My issue with simply aborting a migration is that it may run fine in your sandbox, pass through again in QA, then make it to Production and leave the database in an unknown state after encountering unexpected data.

    Another case for testing your migrations: ensure a rollback occurs if it does not achieve the expected results.

  • http://mikeperham.com Mike Perham

    Too heavyweight IMO. I’ve found rake db:migrate:redo to be sufficient for all but a handful of migrations I’ve written over the years.

  • Christian Bradley

    @Mike – I agree to some extent. However – for those “handful of migrations” would it have been helpful to have a test to automate the process, drive development, and ensure success? Take a data normalization migration that is pulling and persisting association ids from a tertiary relation, or deleting stale records, for example. Something with this much impact that lacks a checkpoint deserves a second look (which is what got me on this topic in the first place).

  • Pingback: Tweets that mention Start Testing Your Migrations. (Right Now) | Carbon Five Community -- Topsy.com

  • http://mwilden.blogspot.com/ Mark Wilden

    I tend to regard the database and its migrations as implementation details. I think the success or failure of a migration can and should be determined by model unit tests and integration tests.

    This doesn’t apply to all migrations, of course. But certainly, adding a column to a table should have farther-reaching implications.

  • http://christianbradley.wordpress.com Christian Bradley

    @All – Updated this post with a more detailed example. Feel free to reread.

  • http://mwilden.blogspot.com/ Mark Wilden

    My approach for things like deleting stale posts is usually to elevate the code to a model method. Then I test it “normally.” My thinking is that if I have to do something like this once, then I may have to do it again.

    The next question is whether cleanup like this should be a migration or a Rake task. The fact that there is no “down” to this migration really makes it seem like a task.

    But either way – good article, and food for thought.

  • http://christianbradley.wordpress.com Christian Bradley

    @Mark – Agreed that this could be a rake task. Or, maybe it could also be a migration where the “up” saves a YAML file with the deleted alerts and the “down” reloads them? Interestingly enough, if you google “test rake task”, it seems there is a lot of controversy around how to approach that as well.

    Part of me really likes the concept of moving this type of code to the model level and unit testing, but on the other hand, I feel like it would start to bloat the models and model specs themselves.

    Overall, I’m glad to have this discussion, it seems to be a gray area in the “Rails Way” that I’m not really sure has been well defined.

    Of Note, Testing Rake Tasks:
    http://www.philsergi.com/2009/02/testing-rake-tasks-with-rspec.html

  • http://mwilden.blogspot.com/ Mark Wilden

    My thinking on testing Rake tasks is similar to that on testing controllers – it shouldn’t be necessary because there shouldn’t be very much code in there.

    I know what you mean about bloating models. However, I have no problem at all deleting code I don’t need any more. I can always get it back.

  • http://theadmin.org Eric Davis

    I do the same thing as Mark. Move the code out of the migration and into the model and just test the model like normal. Example: https://gist.github.com/801368

  • http://rtlechow.com R.T. Lechow

    If you’re uncomfortable with Migrations or Active Record, there are guides for that:
    http://guides.rubyonrails.org/migrations.html
    http://guides.rubyonrails.org/active_record_querying.html

    The problem isn’t that “migrations are not Ruby code” (they certainly end in .rb); it’s that this isn’t your code.

  • http://vitobotta.com Vito Botta

    IMHO adding specs for migrations is a waste of time because if feels to me like you are testing the framework and not your code, and also because as others have said I can just migrate/rollback or redo. This has always been more than enough to check that the structure and perhaps the data change as expected, at least from my experience.

    I often “test” migrations with the test database first… e.g.

    rake db:migrate RAILS_ENV=test

    just in case; then I reload the schema if needed.

    rake db:schema:load RAILS_ENV=test

    Then if I am working with small databases, depending on the application, I usually use a rake task to pull the database from production (or with a subset of data – together with the backups I usually script the creation of an export with smaller datasets, e.g. 10000 rows or so if the dbs are large).
    Then I “test” migrations with production data.

    Other considerations:
    -for data changes (not structure), if it’s something I may have to run more than once I’d create a Rake task.
    -since structure/data changes affect the behaviour of models etc, if I am sleeping and forget to check if the db changes as expected following a migration, chances are that the problem will be caught by specs for the models etc.

    So, I may be wrong, but I do think that adding specs for migrations is overkill.

    If there’s something that annoys me about migrations, is that you can’t have transactional migrations with MySQL as you can with PostgresSQL because of the lack of support for transactional database definition changes… that would make things easier in a number of cases.

  • http://smartic.us bryanl

    I don’t think this is a good idea. Like already said before, this is an implementation detail. Test those behaviors.

  • http://blog.theamazingrando.com Paul Sadauskas

    When I wrote dm-migrations for DataMapper a few years ago, I made sure that also writing tests for them was easy: http://blog.theamazingrando.com/specing-migrations-a-tutorial-0


    describe :create_people_table, :type => :migration do

    before do
    run_migration
    end

    it 'should create a people table' do
    repository(:default).should have_table(:people)
    end
    end

  • http://twitter.com/tamouse Tamara Temple

    If your database remains simple, not a lot of linkages, and not a lot of changes, you can probably get away without spec testing migrations. In a complex application, with many changes happening to the database because of shifting requirements and so on, testing the migrations becomes more imperative as the potential for messing up a migration can become quite costly. As complexity grows (of project, of data, of developers, of requirements, etc,etc), practices should change accordingly to keep things from going belly up at just the critical point you can’t have them fail (which they will do, because, well, Murphy).

    • Nicholas

      If it is simple, why not start testing from the start of the project and gain the know-how progressively while it is still easy to do ? In addition, you will get all the written tests in your regression suite for cheap since they are easy to write and they will serve to others as example on how to do it when to going gets rough.

  • greg

    I like the idea of migration testing, but I can not make it working. It seems like migration doesn’t run. Could you check this gist out: https://gist.github.com/4048819 and tell me what am I doing wrong?

  • Pingback: Deployment Zone » Blog Archive » Testing ActiveRecord Migrations

  • Leszek Andrukanis

    Manipulating data in Rails migration is not the best idea. Have you thought about writing task with it and testing this task simply ?

    • Bert Bruynooghe

      Can you explain why it is not a good idea? If manipulating the data is part of keeping your database consistent with the new database structure, it definitely belongs in the migration AFAIK. Moreover, putting this code inside the migration makes it also clear that this code should never change since it is part of the migration process.

      • Robert Fletcher

        The problem is that migrations become brittle if you have code in them that modifies the data. Classes change over time, and migrations don’t so assumptions you make when writing your migrations might not still be true a few months down the line. For example, you might have a migration that loops over users and references the full_name method in your class, but later on down the line you decide to remove that method, at which point your migration will give you a NoMethodError when you try to run it.

        We tried to get around this by having a lib/tasks/oneoffs directory where we put code that needs to be run once to clean up data, and then we would remove the task later. We found that it makes things pretty awkward, though, since often times we wanted to apply database constraints, so we would have to deploy the rake task first, run it, then deploy the migration. Kind of messy.

        We ended up going back to cleaning up the data in the migration. It’s nice because we can add a column, populate the column, and add constraints in the same place. The problem with migrations becoming stale is kind of irrelevant anyway since there’s generally no reason to be running old migrations. If you’re setting up a new machine then you should be using rake db:schema:load, and then you can run recent migrations as necessary to keep it up to date.

        • Bert Bruynooghe

          Your first argument isn’t very valid since the approach we are using until now is to partly redefine the model in the migration itself: it is strictly tailored to the migration you want to execute, and never becomes stale. (Basically, this technique is discussed in http://guides.rubyonrails.org/migrations.html#using-models-in-your-migrations.)

          That being said, the objection in http://stackoverflow.com/questions/750307/how-do-you-change-your-rails-app-data/750829#750829 is a very valid one: your data migrations won’t be happening if a person executes rake db:reset.

          Or is the point that we should not be committing our schema.rb to the code repository to begin with?

          • Robert Fletcher

            Not sure I see how that invalidates my first argument. They talk about making a model within your migration as being a fix for running validations. In my case you’d have to define methods on the model in the migration, so it might mean duplicating code or messy logic. You can also avoid validations by saving with validate: false. Either way, you have to be aware of the ways your code might change over time and code defensively to make sure it won’t break later.

            I’m not sure I get the argument about rake db:reset. When you write your migrations, you typically run them to update the schema.rb file soon after. You should definitely check in your schema.rb file, since that’s the canonical source of your database’s structure. If you check in the changes, the next person to run rake db:reset will get the latest structure. It also runs your seeds file, so any data you want to be auto-populated can go there. I generally also add a rake db:populate task to add mock data for dev mode, which you can make depend on db:reset.