Testing Doesn’t Scale

Posted on by in Process

The Ruby community’s obsession with testing is unrivaled. Over the years, Rubyists have gone from old school TDD using test/unit, to modern BDD with RSpec and finally to comprehensive integration testing, including JavaScript support, via Cucumber. The goal was tests at all layers and to get as close as possible to simulating a real browser.

There’s no question that extensive test suites allow us to rapidly develop and deploy apps with confidence. Unfortunately as apps grow so do their test suites. Eventually a once agile test suite will become massive enough to slow development. Then it’s hard to find anyone willing to work on the app.

I’ve seen this happen time and time again on apps and I know I’m not alone. An app’s testing strategy has now become a major factor in its long-term success. How can we manage these large test suites? Let’s take a look at some possible solutions that might be able to help us out.

Parallelizing Tests

The first solution most people look into is usually hardware. Splitting a test suite into groups and then running each group in a separate process or even a separate machine will help reduce test times. parallel_tests, testjour and hydra are a few gems that take this approach.

Parallelizing your tests works; they will run faster. However, I’ve always had issues with setup and configuration. Whether it’s some problem with a particular version of Ruby or it doesn’t run on Linux, etc. Consistent test runs have also been a challenge. Difficult to debug random test failures are often blamed on obscure causes like race conditions. Also, Hardware isn’t free; so in the end cost will determine how well this will work for you.

Run Fewer Tests

This is an interesting approach. Instead of running the full test suite before integrating your changes, you run just a subset of it. A continuous integration server is then responsible for running the full suite.

Running less tests is probably the easiest and most used approach in dealing with a slow test suite. Of course you can still end up breaking the build due to your change breaking something not in that subset of tests you ran. Broken builds will become more common and you’ll gradually start to lose confidence in your main development line. And your test suite is still slow.

Mock Everything

Mocking is a testing technique where every object is tested in isolation and all external dependencies (e.g. a database) are removed from the tests. Because dependencies are mocked, end-to-end integration tests are necessary.

Basically everytime I’ve seen mocking used heavily it always turned out bad. Usually this was due to a large amount of ugly, brittle mocking code. Elegant mocking is an art and everyone seems to have their own style. It also can be a challenge to find a team of developers who are all pro-mocking and understand it. When someone is first exposed to mocking and the resulting test code, their reaction is almost always negative. I tend to feel the same way. Mocking is always a tough sell. In the end you’ll have faster tests but the resulting test code is complicated and difficult to maintain.

Service-Oriented Design

In Service-Oriented Design with Ruby and Rails author Paul Dix outlines an app architecture using services. Each service is a separate smaller app supporting a single aspect of the main app. The main app then interfaces with these services.

This isn’t a new idea but it feels like a very promising approach. If you took an app with say a 45-minute test suite and broke it into several smaller apps, then each smaller app could potentially have a more focused, faster test suite.

The tradeoff here is complexity i.e. overengineering. Each developer now has to run several smaller apps just to run the main app. The main app’s test suite is also going to need to mock out the smaller services. I’ve yet to attempt this architecture mainly because it’s very hard to justify to your teammates that a brand new app needs partitioned into several smaller apps. A team is more likely to support this approach as a refactor of a mature, monolithic app. Unfortunately at that point no one seems to have the energy, time or money for a large-scale time-consuming refactor.

Fewer Features (Simpler Apps)

Most agile developed apps have a very similar life. Features are written, discussed, estimated, and then put in a release. Releases usually happen every 1-2 weeks; possibly every month. Add in 5-10 developers and a year’s time and odds are you’ll end up with a codebase that no one wants to touch mainly because of a disgustingly slow test suite.

Maybe agile gives us a little too much flexibility? As a developer, you try to push back but there’s only so many times you can tell a client “no”; it’s their app you’re building not yours. It’s even harder if the client has the money to spend and you don’t have any new work in the pipeline. However, if you can convince a client to take a simpler approach, that means less features and less tests.

Can Testing Scale?

Any of these techniques can alleviate some of the pain of a slow test suite but sadly they end up being nothing more than stopgap solutions. I’ve tried most of them and I still don’t prefer one over another. Maybe we need to more carefully choose what we test instead of testing everything by default. More tests always sounds great at first but it doesn’t mean much to a 2-year old Rails app with a 60 minute test suite.

Don’t get me wrong, I love testing. I enjoy writing test code more than non-test code but it’s sad to watch a promising young app become bombarded with tests and in a few months completely miserable to work on. Unfortunately there’s no straightforward solution right now. Maybe it’s time to jump ship. Node.js? Scala?


  • Rubyists are obsessed with testing
  • This obsession is costing us our agility
  • There are ways to speed up tests
  • The reality is that scaling tests is hard


  Comments: 37

  1. This is exactly the problem I’ve been having with mocking in my app Lapsus. I find that mocking everything out in a Cocoa app also takes a long time to write. It forces you to think about dependencies between objects, which is a very good thing, but it’s really slowed me down. And my test suite is very slow now. I too don’t see any easy way of speeding up tests.

    Thanks for a great post.

  2. Testing and Ruby / Rails can scale very well, in my experience. The problem is when you write terrible tests, i.e.:

    – Blackbox testing entire swathes of your application.
    – you reproduce your application logic in your test. IE, use scopes. If you’re passing in a :conditions hash to a query, you’re likely doing it wrong.

    Trouble also comes from not understanding the tools you depend upon. You should understand what gems live in your Gemfile, and put related gems inside a group.

    Finally, there’s Spork. I’ve gotten Rails test suites to run in just a few seconds with that. Spork fails if you don’t know what your app is doing, or if you’ve buggered up ruby’s load path in new and exciting ways.

    One last thing: Documentation matters. Test are not your documentation. If methods, etc, are given plain-english context, you can easily tell which tests are repeating each other’s coverage and delete them. The problem that I have with Ruby’s TDD/BDD community is its “documentation are my tests” mantra. It’s stupid. Your tests are your examples.

  3. This sounds more like “software doesn’t scale” to me…

  4. I was skeptical at first, but seriously Spork is awesome for test suite that are starting up/running slow. Definitely look into it.

    • @nick

      Spork was one of my first attempts at speeding up slow tests. It does work.

      One issue I’ve seen repeatedly is when people aren’t aware that Spork is *not* reloading certain classes. This usually led to throwing all kinds of stuff in the Spork.prefork block and eventually people abandoning it entirely.

    • Spork fixes the Ruby bug of loading gems slowly. http://redmine.ruby-lang.org/issues/3924

      It shouldn’t speed up tests once all the gems are loaded.

  5. The integration tests are the slowest part of a Rails app, but they’re needed primarily in ways that type checking would be used elsewhere: to catch nil and to catch a mismatch in interface. If we used a language without nil and with static type analysis perhaps we could lean on fast unit tests more strongly.

    In short: jump ship to Haskell.

  6. In every app I’ve worked on I’ve found that it’s not the testing that doesn’t scale, it’s the poor test implementation that doesn’t scale. Usually this involves ripping out mocks and similar strategies that were supposed to make testing faster but didn’t when actually measured, and implementing strategies that used measurement to properly eliminate or avoid the slow parts.

    There’s no mention of measuring the speed of your tests in your post at all. How do you know the problem is testing and not the way you’re doing testing when you don’t show that you’ve profiled your tests to determine the slow spots and fixed your tests appropriately?

    For some benchmarks I’ll put down rdoc with 1300 tests and 3000 assertions that run in 7s (acceptable at 200 tests/s). RubyGems has 1000 tests and 3000 assertions that run in 60s (very slow at only 16 tests/s). The last Rails app I worked on grew to serving one billion requests per week and the test suite ran in under 60 seconds (down from about two minutes with less coverage when I started).

    Before I started working on any of these projects they had very few tests and the ability to make non-breaking changes in them was impossibly slow. Now they have large tests suites and it’s easy and fast, so I think you’re doing something wrong.

    • @eric,

      I apologize for the lack of stats; this wasn’t the most constructive post. Slow suites for me are any greater than 10 minutes, which seems to be the norm in the Rails world.

      I’m definitely interested in your Rails app with a test suite that runs in less than 60 seconds. What kind of testing strategy did you use? My typical strategy is very similar to the one outlined in the RSpec book, start at the outer layer with Cucumber and then drill down with RSpec to the controllers, helpers, and models. Sometimes there’s controller specs but sometimes I skip them.

      Thanks for the feedback

      • A 10 minute test suite is norm in the Rails world? I haven’t experienced this at all. The majority of my test suites are less than 20 seconds for an average size app. The largest app I have worked on takes around 60 seconds to do the test suite and that has over 100 models. Mix this in with running only focused tests and tools like Spork and you’ve got a usable test suite.

        Perhaps the difference is in the style of testing. See this for how I test: http://railscasts.com/episodes/275-how-i-test

        • @ryan,

          Great screencast. I noticed you weren’t using Cucumber and in the comments said you don’t like “one assertion per test”. I do both of those, which probably costs me some speed.

          Wow 20 seconds on average and under 60 seconds for an app with over 100 models. Even if those times are excluding the Rails startup, due to using Spork, those are some amazingly fast suites. What was the testing strategy you used? And what is your test coverage from ‘rake stats’?

  7. “This sounds more like “software doesn’t scale” to me…”

    What I was thinking.

    From the initial poster, it seem that the only problem is the execution time of tests. But he showed many way to solve it.


    Well it seem that the author discover that adding a new feature doesn’t cost the same price for a small or big software. The cost isn’t constant. The cost isn’t even linear. The cost is in fact exponential. Many studies from IBM or other confirm that. Just look on the internet.

    I work on a big company with very big code base… Productivity is at least 4-5 time slower than on a small project. Of course we could improve. But this is the reality. Because our software is more complex, because our software doesn’t comply to the need of one client, but is used by thousands company in the world all with their specific requirement.

    This make our product more complex, but it is still more effiscient than to make thousand software, one per client.

    This is still more effiscient to work slowly on a big project that solve thousand client problem than on one software that solve only one client problem. And that why we have big software anyway.

  8. The point Eric made is very important in my opinion. We have to get more picky on which level we test our apps.
    Starting Capybara and clicking through the app is worth a thousand low level controller tests. Together with Spork this can make running your tests incredibly fast, but setting up spork with capybara and possibly other tools is still quite some work. We need to integrate those tools a little better and then start testing on a relatively high level. Only go down to the lower levels if there is now way of having a functional test do the work.
    Also during development run only the tests that you know influence your code at the moment. Run the whole suite at a point you are done with implementing the feature. Tools like guard can help supporting this cycle.

  9. Great bold post. I have also felt the pain of slow, large and monolithic test suites. I think that complex problems usually require complex solutions. Scaling is hard and is usually not required at the early stages of an agile project. The problem occurs when the app starts to mature and large scale refactoring takes a backseat to the production of features. Planning for scale is difficult because there is little perceived benefit (from a lean/agile product perspective). That is, until development productivity decreases due to large test suites or the app crashes due to load. The result is a large app with a corresponding large test suite. I think constant refactoring, including changing testing implementations, and knowing when to scale is the key.

  10. I can only assume Eric is a wizard. I’ve worked at 5+ different Rails startups now and have never seen a Rails app whose test suite ran so quickly. 5-10 minutes is normal in my experience. I believe the slow test suite is due to one thing: the rise of FactoryGirl. Factory instantiation time quickly becomes the main performance issue in every test suite I’ve examined. Using fixtures helps the performance but has its own issues.

    • Fairly sure that it’s a matter of activerecord being slow, not factorygirl.

      Especially if you have callbacks that create other records.

    • I’ve seen a tendency to use create with factory_girl when just using build might have suited the test fine. On my last project we forced ourselves to use build unless we absolutely needed to persist to the db for the test. Since we didn’t have to touch the database, our tests were quite fast.

    • Christoph Olszowka

      Yes, creating records on the fly costs a tremendous amount of time, especially if you use shoulda/contest/rspec and the creation happens for each assertion when following “one assert per test”. I built a small gem called transactionata (https://github.com/colszowka/transactionata) that gives you a test_data class method in Rails test/unit tests. It hooks into Rails’ fixture loading and executes the code in your test_data block there. Since rails then wraps your tests in DB transactions after loading fixtures, you’ll be able to use dynamically generated test data but still have them being created only once per Test file. On the test suite for beta.ruby-toolbox.com, this changed test time from ~10 minutes to ~2 minutes alone.

      Of course, you’ll have to more carefully craft your test data than I think it is the norm when using factory_girl.

      Granted, the gem is very simple and should be considered alpha status. It would be lovely if rails provided this functionality (executing a given block of code once before the db transaction starts) itself since there currently is no public hook to plug into the fixture loading process.

      • Christoph Olszowka

        Sorry, I have to correct the numbers I gave per the README entry in the doc: The test suite went from 5 mins to 50 seconds instead of 10 -> 2.

  11. If you build ypur app to be easily deployed and self contained (that is a big “if”) then the cost of running concurrent tests shouldn’t be too bad, especially in our lovely cloud based world. For many folks, the cost of that seems acceptable against the benefits of lots of tests and early feedback, plus you have an app that can be deployed and provisioned with low effort and risk. Win!

    … Also, have few high level tests and lots of nippy unit tests keeps things quick. That also means unit tests for JavaScript, so you don’t have hundreds of browser tests.

  12. Christian Bradley

    I’ve been in a lot of discussions around this topic as of late, and I think the main issue is with what we Rubyists are calling “Test Driven Development”.

    We’re taught to *write* (automate) tests first before touching *any* code. This does not leave room for prototyping or manual testing as part of the development cycle.

    I’m finding the need to redefine the “Test” in “Test Driven” development. IRB, console, and SQL shell are all Test tools that do not tether us to implementation.

    At what point you migrate from manual to automated tests is a bit of an art form and requires a high level of situational awareness. Perhaps when you feel the design patterns locking into place, or when the afferent coupling of the module in question reaches a critical level, it’s to lock that component down in test.

    • Nothing in TDD says that you can’t explore with code before starting to write your tests. See http://c2.com/cgi/wiki?SpikeSolution.

      Also, remember that your spelunking in console or from irb aren’t repeatable. And they are also of no use to the next developer.

      As for “We’re taught to *write* (automate) tests first before touching *any* code”, that is wrong as well. We are taught to write the absolute smallest thing possible that can fail. We are then supposed to make that pass. We then repeat that ad infinitum, while refactoring as needed. You can’t really understand the book if all you’ve read is the jacket cover.

  13. Christian Bradley

    Edit: ” it’s *time* to lock that component down in test.”

  14. There is too much emphasis on testing these days, to the detriment of artful beautifully crafted code. Testing is subsuming the task at hand.

  15. “Our test suite is slow”
    “Our test suite is fragile”

    You have to approach your test suite the same way you approach your application. In many cases, people don’t pay enough attention to your test suites. There are only a few guidelines to follow to have sane test suites that run in a decent amount of time.

    1. Be careful what you mock. Mocks are deposits on future pain. Sure they speed up execution, but they aren’t resilient to changes you’ve made in other sections of your code.
    2. Don’t talk to external services unless you absolutely need to.
    3. I bet the box you run your tests on uses multiple cores. Why not introduce a test runner than actually take advantage of it.
    4. Only create just enough setup to actually test your behavior.
    5. Be observant of side effects. If you are using something like factory girl, models will be created and run callbacks. Is that really important to what you are testing?

    There are techniques you can try that will allow you to reduce test times significantly. Don’t let your previous failures or perceived inabilities trick you into thinking it isn’t possible. Testing most definitely scales.

    ps. hopefully this post formats correctly…who doesn’t use something like disqus or intensedebate or facebook comments these days?

  16. Tests don’t scale, and I don’t know why would have been a better title to this blog post.

  17. Jonathan Chauncey

    Parallel test execution in ruby is error prone because certain aspects of ruby arnt thread safe. So unless you are using separate processes you will definitely run into issues. Our selenium tests use the ruby driver (running on jruby) and we have had to patch several things to get them to work correctly. Net:HTTP isnt thread safe so we monkey patch it to use the apache client.

    We execute 2400 selenium tests in parallel in about 15 minutes.

    we have a couple of blog posts about it here – http://www.rallydev.com/engblog/

  18. I once gave a talk entitled “Why Agile will probably fail you” and had Jerry Weinberg in the audience. After a bit of talking, he raised his hand and said “I think you have the title wrong. It should be ‘why you will probably fail agile.'” It really opened my eyes to a better understanding when people talk about “testing doesn’t scale” or “testing slows you down”

    Here’s a screencast of my unit test suite. This is the suite I run during active development http://www.screencast.com/t/O2LhGoVSG. This is without spork or anything. Just proper extraction and isolation from third-party frameworks and apis.

    Maybe, if your test suite is really slow, you should ask yourself about your design issues a la test-driven development.

  19. (sorry, hit the end of the train line, so had to submit)

    I agree with those who are saying that this post really is about “software is hard.” Making a title “testing doesn’t scale” and then ending with the line “scaling tests is hard” sort of feels strange. Yes, learning how to build software well is hard. Yes, learning how to effectively listen to test pain (in this case speed) and change your design to alleviate it is hard. And, yes, learning the appropriate amount of tests at each level and when you run them is hard.

    As for scaling, hiring a bunch of people to manually go through your application and see if anything broke is difficult to scale. I know companies that do that. I’ve worked at a few. I’ve also worked on a team that wrote automated tests instead of relying on the manual testing team for rote, ‘click through the application’ testing. We outpaced the other teams in speed, and we outperformed them in (lack of) post-production issues.

    The bottom line remains that you have to test your application. Whether you do it in an automated fashion or not is up to you. If you don’t have the experience to build a test suite that outpaces a manual test plan, then you always have a choice to do practice and get better, perhaps find someone who knows how and see if you can study under them for a bit.

  20. Great post Jared. You really highlight the state of the world when it comes to testing. I’ve been writing a lot of node.js code lately and it is so refreshing to not have to worry about speed. I can run my full unit test suite in 100 milliseconds. And run full end-to-end integration tests with zombie in under 5 seconds. Sure I still have to worry about mocking out some services, but that’s fun.

    The problem as I see it is that Rails is dog slow. Rubyists like to wave their hands and say that turning off auto-class loading and using pre-forking servers or JRuby on 8x CPU machines with 64GBs of memory and a fat memcaching layer will solve the problem. But in reality it’s just a bandaid. Any framework that forces you to do mental gymnastics just to test it should set off alarm bells.

    Read this article about how LinkedIn went from 15 servers with 15 virtual instances to just 4 servers that can handle double the traffic:

  21. Nice one, thanks buddy. I do have basic understanding of jQuery. For more info dotlogics.com

  22. Very nice tips. I have read this story fully. Very helpful article. Thanks for sharing. for more information visit dotlogics.com

  23. But how can you be confident that a refactor didn’t break something unless you have a full test suite? I think these slow unit tests you’re talking about is a problem with the active record pattern, not testing _in general_; the post should be titled “ActiveRecord doesn’t scale” instead.

    This will all change when Ruby Object Mapper is ready though…

  24. A dual core processor is defferint from a multi-processor system. In the latter, there are two separate CPUs with their own resources. In the former, resources are shared, and the cores reside on the same chip. A multi-processor system is faster than a system with a dual core processor, and a dual core system is faster than a single-core system, when everything else is equal.If each core is 1.4ghz then GSIII will rule everything with his 5.6ghz, genious.

Your feedback