Author Archive for christian

Page 2 of 2

More fun with Java Concurrency: BlockingQueue

I’ve written about Java 5 concurrency in the past and I’ve recently had the opportunity to make use of another one of the concurrency constructs: the BlockingQueue.

The Problem

There’s a problem we’ve seen a few times in the last few years. From time to time, applications must import data from external systems and massage it into a form that is useful. Sometimes these feeds are streamed over the net while other times they’re in the form of massive text files. My current project has two such feeds that are imported on a weekly basis, the larger of the two rings in at around 10M entities.

Loading the data is pretty straightforward: entities are parsed from the source, a transient entity object is instantiated and the parsed values plugged in, entities are batched up and then persisted as a batch.

The Collaborators

The Parser is responsible for loading records one at a time from the source.

public interface Parser<T>
{
    boolean hasNextRecord();
    T nextRecord();
    void close();
}

The Persister is responsible for saving batches of entities.

public interface Persister<T>
{
    void initializeFeed();
    void insertBatch(List<T> entities);
    void finalizeFeed();
}

And the FeedLoader is the glue that pulls it all together, it coordinates parsing records into a batch and then triggering persisting the batches when they’re ready. It’s used like this, where 1000 is the batch size:

Parser<Wombat> parser = new WombatParser(inputStream);
Persister<Wombat> persister = new WombatPersister(dataSource);
FeedStats stats = new FeedLoader(parser, persister, 1000).loadData();

Now that the stage is set and we’ve covered the parts, we can get into what happens behind loadData().

Synchronous Parsing and Persistence

Our first cut of loadData() was a simple synchronous implementation:

public class FeedLoader<T>
{
    private Parser<T> parser;
    private Persister<T> persister;
    private int batchSize;
 
    public FeedLoader(Parser<T> parser, Persister<T> persister, int batchSize)
    {
        Validate.isTrue(batchSize > 0);
        this.parser = parser;
        this.persister = persister;
        this.batchSize = batchSize;
    }
 
    public FeedStats loadData()
    {
        persister.initializeFeed();
        List<T> entities = new ArrayList<T>(batchSize);
 
        while (parser.hasNextRecord())
        {
            entities.add(parser.nextRecord());
            if (entities.size() >= batchSize)
            {
                persister.insertBatch(entities);
                entities.clear();
            }
        }
 
        // Save the stragglers that didn't make it into the last batch.
        if (!entities.isEmpty())
        {
            persister.insertBatch(entities);
        }
 
        parser.close();
        persister.finalizeFeed();
        return new FeedStats(...);
    }
}

This worked well… we were able to process records at a throughput of about 3125 per second. After a bit of research I realized we were spending nearly as much time parsing records as we were persisting them. I also noticed that the load on the machine was pretty low during the import process. While there is a relationship between parsing and persisting, it seemed like there should be an easy way split the processes across multiple threads while keeping the code simple and readable.

Asynchronous processing with BlockingQueue and ExecutorService

Digging through java.util.concurrent, I came across BlockingQueue which is described as “A Queue that additionally supports operations that wait for the queue to become non-empty when retrieving an element, and wait for space to become available in the queue when storing an element.” Sounds like a great construct to bridge the gap between our Parser and Persister threads. The parser can add entities to the queue while the persister is pulling them off into batches. Let’s see what it looks like:

public class FeedLoader<T>
{
    private Parser<T> parser;
    private Persister<T> persister;
    private int batchSize;
    private boolean done = false;
 
    public FeedLoader(Parser<T> parser, Persister<T> persister, int batchSize)
    {
        Validate.isTrue(batchSize > 0);
        this.parser = parser;
        this.persister = persister;
        this.batchSize = batchSize;
    }
 
    public FeedStats loadData()
    {
        persister.initializeFeed();
 
        BlockingQueue blockingQueue = new ArrayBlockingQueue(batchSize * 2);
        try
        {
            ExecutorService executorService = Executors.newFixedThreadPool(2);
            // invokeAll() blocks until both tasks have completed
            executorService.invokeAll(
                asList(new ParserTask<T>(parser, blockingQueue),
                       new PersisterTask<T>(persister, blockingQueue)));
            executorService.shutdown();
        }
        catch (InterruptedException e)
        {
            log.error("Failed to load feed.", e);
            throw new RuntimeException("Failed to load feed.", e);
        }
 
        persister.finalizeFeed();
        return new FeedStats(...);
    }
 
    class ParserTask<T> implements Callable<Object>
    {
        Parser<T> parser;
        BlockingQueue<T> queue;
 
        ParserTask(Parser<T> parser, BlockingQueue<T> queue)
        {
            this.parser = parser;
            this.queue = queue;
        }
 
        public Object call()
        {
            while (parser.hasNextRecord())
            {
                try
                {
                    queue.put(parser.nextRecord());
                }
                catch (InterruptedException e)
                {
                    log.error("Failed to load feed.", e);
                    throw new RuntimeException("Failed to load feed.", e);
                }
            }
            parser.close();
            done = true; // Indicates that the parser is done.
            return null;
        }
    }
 
    class PersisterTask<T> implements Callable<Object>
    {
        Persister<T> persister;
        BlockingQueue<T> queue;
 
        PersisterTask(Persister<T> persister, BlockingQueue<T> queue)
        {
            this.persister = persister;
            this.queue = queue;
        }
 
        public Object call()
        {
            List<T> entities = new ArrayList<T>(batchSize);
 
            // "done" is set to false when the parser is done, at which point
            // all remaining entities will be in the queue.
            while (!done || !queue.isEmpty())
            {
                try
                {
                    entities.add(queue.take());
                    if (entities.size() >= batchSize)
                    {
                        persister.insertBatch(entities);
                        entities.clear();
                    }
                }
                catch (InterruptedException e)
                {
                    log.error("Failed to load feed.", e);
                    throw new RuntimeException("Failed to load feed.", e);
                }
            }
            if (!entities.isEmpty())
            {
                persister.insertBatch(entities);
            }
            return null;
        }
    }
}

By allowing the parser and persister to run concurrently using two threads, the feed loaded with a throughput of 4608 entities per second, nearly a 50% improvement over the single threaded version.

There are two caveats to the code as written above: creating an ExecutorService for each loadData() isn’t ideal; it’s best to configure one for the application and resuse it, and also is must be shutdown before the application quits. I’ve skimped on error handling, which is fine if the Parser and Persister implementations don’t throw exceptions.

Conclusion

The ExecutorService and BlockingQueue provide the tools to make this improvement easy while keeping the code pretty readable. As always, we should be striving for readability, so adding unnecessary concurrency is never a good idea. And your mileage may vary depending on many things, including the hardware, network, data, server load… so do some testing to measure the real improvement in production.

Even if you don’t end up using it, it’s fun to experiment with and learn about concurrency issues. There are scenarios where the smart application of concurrency constructs can yield fantastic benefits. Check our Greg Luck’s recent blog on Ehcache performance for an example.

Updates: added caveats and call to ExecutorService.shutdown(). Fixed a typo in the PersisterTask.

Make the things you do often fast and easy

Many of our projects are ‘greenfield’ and we have the opportunity to do things the way we like. By working on new projects every few months, as opposed to one project over the course of years, we have lots of opportunity to easily tweak and tune the way we do things. Not all of our projects are from scratch though (see Alon’s post about Rewrite or Rescue), so we sometimes end up dealing with years worth of history and crufty code. It’s safe to say that each time we roll onto one of these projects, there’s going to be some level of bewilderment regarding what developers deal with on a daily basis.

Maybe it’s because we have a special opportunity to optimize the hell out of our development process, or the fact that we’re all productivity junkies; regardless of the reason, we religiously embrace the tenant “Make the things you do often fast and easy”. It’s almost embarrassing to suggest that others don’t also subscribe to this simple notion, but — brace yourself — many do not. On a project that has history, not everyone has been there for every decision. In fact, many developers are at least relatively new and it’s somewhat customary to have a “it must be this way for a reason” attitude. After all, who would deliberately make something cumbersome without good reason?

When we start working on one of these projects, we dedicate time to do some serious spring cleaning and tackle the things that will cost us the most in terms of pain and productivity. The whole development team gets psyched about where we end up as it’s a significant improvement. Projects with a history usually have a fair bit of low-hanging fruit. Let’s discuss some of the things we see regularly.

Build Systems

Apparently few people like working on build scripts and when they do they have a habit of lowering their standards for quality of work. That’s obviously not literally true, but sometimes it seems that way. We’ve seen a number of beastly build systems that are slow because they’re doing things that aren’t necessary (extraneous jaring, copying, code generating, etc), they’re brittle and expensive to maintain, and full of dead code and duplicate target definitions… and they’re run many times every single work day. It’s true that most developers may be compiling code from their IDE and thus bypassing the command-line build, but it’s still run on the build server, by ops folks, and even by developers when they’re debugging why something works from the IDE but busts on the build server.

Guidelines for simplifying the build

  • Distill the build process down to the fastest, simplest steps that are necessary.
  • Eliminate duplicate and no-longer used dependencies; these files are being copied around and bundled for no reason (I’ve seen over 15 megs of unnecessary dependencies before).
  • When a project is split into multiple modules (and it should if it’s more than a few thousand lines), modules should be built in a consistent fashion using targets that are shared across modules.
  • Look for exceptions. When you see something special happening for a particular file type, file name, or modules, ask yourself why. Ask again. Strive to eliminate these special cases when possible, even if they seem trivial.
  • Build a single deployable (or deployables) for all environments by eliminating environment-specific build code and externalizing application configuration. (Use Spring? Check out this post on externalizing configuration with Spring.)
  • Look for unnecessary code generation steps; if generated code changes once a year then check it in and make regenerating a manual step.
  • If you generate code coverage data, make sure that it’s only created when it’s needed (e.g. a nightly build on the build server), not on every build.

So, we use Maven 2 for all of our Java projects. For sure, it has its share of rough edges (most of which are being fixed at a reasonable rate). But it recommends some very sound conventions and doesn’t provide any scripting functionality, so it’s harder to hack it to do anything too unorthodox (please don’t use the antrun plugin unless as an incremental step when moving from Ant to Maven). When you play ball by the Maven rules you’ll find your build much simpler and easier to maintain. It’s likely you’ll notice other emergent benefits to boot. For example, once you migrate to Maven you eliminate duplicate build configuration (both your command line build tool and IDE know how to compile your app — remember the DRY principle). IDEA, Eclipse (via m2eclipse), and NetBeans all support importing from and synchronize with Maven.

Some people use Buildr or Ant + Ivy, but either they don’t have the breadth of use (Buildr) or are more susceptible to writing nasty, unmaintainable build code (Ant). That’s why we use Maven.

Compile > Deploy > Make Changes > Deploy Development Cycle

Possibly more important than a simple and easy build, developers must be able to go through the compile, deploy, make changes, deploy cycle FAST (note that the compile, run tests, make changes, run tests cycle is also very important).

I remember working on an embedded system in 1998: a complex radio communications routing application written in C++ and deployed to custom hardware running the real-time operating system PSOS. The build and deploy cycle took about 30 minutes and there were only 10 hardware instances for 60 engineers; you had to sign up for time slot on real hardware. It was the epitome of unproductive as far as development environments go (and don’t even ask about debugging!). You’d think such things were completely in the past (luckily they mostly are), but they’re not completely. In the last 2 years I’ve seen applications that take 15 minutes to deploy.

It’s a drag when developers have to wait for these things to happen and it can totally destroy one’s rhythm, keeping developers from getting into the zone. What’s worse, it’s completely unnecessary with modern tools.

General Suggestions

  • Don’t drop down to the command-line; compile and deploy from your IDE (the IDE is your friend – master it).
  • Minimize steps for deploying changes to a running app:
    • Your IDE may support building on frame deactivation (IDEA does); check it out.
    • Look into the maven-jetty-plugin if you use Maven.
    • Run in debug mode so that code can be hot-swapped or invest in JavaRebel, which allows all sorts of code changes to take place without redeploying your application.
    • Deploy your application in exploded form; bundling a war or ear incurs unnecessary IO overhead.
  • Host your database either locally on your workstation or on a beefy database server on the same LAN. Remote databases are generally many times slower due to latency, even over fast connections.
  • Use JBoss? Consider migrating to Jetty or Tomcat. If that’s impossible, use the most stripped down profile (minimal, default, or all) that has what you need, or better yet, create a custom one which includes only what you need.
  • Minimize the amount of data needed in the database to run the application. The same goes for running tests: do whatever it takes to run your tests against an empty (or very close to) schema. Check out the Carbon Five DB Migration Project.
  • Don’t skimp on developer hardware. Buying the very fastest CPU isn’t going to be worth it, so aim for one or two models down from the fastest. Buy the fastest hard disk you can since development is generally IO bound (WD Raptor and the new VelociRaptor are awesome, consider SSDs if your coffers run deep). Avoid older CPU architectures (Pentium D), even when the clock speed (GHz) is faster. Lastly, don’t be shy with memory; 4GB isn’t too much for a developer machine.

In addition to these general recommendations, each individual application will have its own specific sources of inefficiency. Many real world applications depend on services provided by application servers and/or a commercial products: message queues (JMS, ActiveMQ, etc), distributed caches (memcached, coherence, etc), enterprise service buses, job schedulers, work flow engines, etc. It’s important that these services don’t get in the way of developing fast. Some of them can be run in a light-weight development mode. If you need to use one of these potentially heavyweight solutions, invest the time to minimize or eliminate any adverse effects to the development cycle.

What Else?

Some of the best improvements have nothing to do with the technical side of software development. Take a step back and look at what else is happening (or not happening) each day. There may be meetings which can be time boxed, consolidated, or eliminated all together. Take a look at collaboration between engineers, product managers, testers, support and operations. How long are developers waiting to have requirements-clarifying questions answered? Ask your whole team where they think things can be improved. Ask for feedback on a regular basis and allow it to help drive these improvements.

Survey of other activities that should be fast and easy

  • Running the automated test suite – < 10 minutes
  • Getting build results from continuous integration server – < 10 minutes
  • Pushing a build to staging/acceptance server – One click build and deploy
  • Create a new instance of a minimal database instance – Carbon Five DB Migration Project
  • Recreate production state in development for debugging
  • Story approval/acceptance – Continuous acceptance
  • Meetings – Timebox, Consolidate, Eliminate
  • Configure a new developer machine – Strive for zero configuration

Value Simplicity

Any intelligent fool can make things bigger, more complex and more violent. It takes a touch of genius and a lot of courage to move in the opposite direction. -Albert Einstein

There’s a theme underlying most of the solutions to these problems: simplicity. Complex systems don’t become complex and crufty overnight, they get that way one small step at a time. With each change to a system it’s important to recognize that the change will either add complexity or remove it. Complexity has a cost and it’s not to be taken lightly; make sure the benefit to each of the decisions that add complexity is worth that cost.

Conclusion

Making the things that people do often fast and easy can pay off geometrically as all developers benefit and regain a little more of their day (and sanity). In the end, it’s not just about shaving off seconds or minutes, though that’s a huge part of it; it’s about creating a development environment that lets the team do what’s really important: write awesome code to solve real problems. When the team dynamics, technical environment, and process are tuned just right, the overall benefit is greater than the sum of its parts.

Where have you seen changes in infrastructure, software, or process that’s resulted in a significant productivity bump?

Java Database Migrations

News: v0.9.9-m2 has been released!

A while back, I wrote to introduce the first incarnation of the Carbon Five Database Migration tools, a simple though powerful framework for applying discrete changes to a database and tracking which changes have been applied to a specific database. It was inspired by Rails’ Migration support.

We’ve made a number of changes in the v0.9.1 release. We adopted some of the improvements found in Rails 2.1 as well as feedback from our users. Here’s an overview of what’s changed:

  • New create, drop, and reset goals for maven plugin. Now you can create a new database, drop an existing one, or reset an existing database by dropping it, creating a new one and then migrating it. This is tested with MySQL and PostgreSQL.
  • Each applied migration is tracked in the database schema_version table (instead of just the last one). Also, when it was run and how long it took to run are now saved for each.
  • Validate goal now lists which migrations are pending in addition to whether the database is up to date.
  • Maven artifact ids have changed (migration -> db-migration, maven-migration-plugin -> maven-db-migration-plugin) and there’s been some restructuring in the core framework.
  • Maven plugin is configured a bit differently now; environments have been removed completely since maven supports a better solution out of the box: profiles.
  • Maven plugin now looks for migrations in src/main/db/migrations by default; alternate locations can be specified via the <migrationsPath/> element.
  • We now recommend using timestamps for migration versions instead of the NNN format, though any numerical character sequence will work.
  • Reworked the algorithm for determining which migrations to run to allow for a little more flexibility. Pending migrations aren’t determined by a single version number, they’re determined by comparing what is available to what has already been run. In conjunction with timestamp versions, developers won’t be stepping on each other’s migrations.
  • New and updated google code project and documentation.

As you can imagine, some of these changes aren’t backwards compatible. While we’re in pre-release (< v1.0) mode, we feel like it's more important to make the fundamental changes to build a solid foundation than to retain complete backwards compatibility. The release notes give some guidelines for upgrading.

Here’s a quick getting started guide for the maven-db-migration-plugin:

Step 1: Configure maven in your project’s pom.xml

21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
...
<build>
   ...
  <plugin>
    <groupId>com.carbonfive</groupId>
    <artifactId>maven-db-migration-plugin</artifactId>
    <version>RELEASE</version>
    <configuration>
      <url>jdbc:mysql://localhost/myapp_test</url>
      <username>dev</username>
      <password>dev</password>
    </configuration>
    <dependencies>
      <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.6</version>
      </dependency>
    </dependencies>
  </plugin>
</build>
...
<pluginRepositories>
    <pluginRepository>
        <id>c5-public-repository</id>
        <url>http://mvn.carbonfive.com/public</url>
    </pluginRepository>
</pluginRepositories>
...

Lines 29-31 configure the database connection (see the reference for more options).
Lines 35-39 specify the required dependency on our database driver.
Lines 43-48 adds the Carbon Five maven plugin repository.

Step 2: Create a migration script

In src/main/db/migrations, create a script using the format YYYYMMDDHHMMSS.sql (e.g. 20080830174515.sql). Example:

CREATE TABLE users (
  id INT PRIMARY KEY,
  email VARCHAR(255) NOT NULL,
  password VARCHAR(255),
  enabled BOOLEAN DEFAULT 'TRUE'
);

Step 3: Create the database

$ mvn db-migration:create

The supplied credentials must have the appropriate privileges, of course. If you’re not using MySQL or PostgreSQL, then do this step manually.

Step 4: Check the status of the database

$ mvn db-migration:validate

Your migration will be listed as pending and the database as not up-to-date.

Step 5: Migrate to the latest version

$ mvn db-migration:migrate

The pending migration will be applied to the database and logged in the schema_version table.

Want to learn more?

Check out the google code project page, the release notes and the sample applications.

In the near future, I’m going to look at supporting MS SQL Server and improving the SQL Script Runner. Thanks for all of the feedback and please keep it coming!

Christian

Database Testing with Spring 2.5 and DBUnit

Note: Version 0.9.1 of c5-test-support has been released.

We’ve been using DB Unit on our Java projects for years and the mechanics of how it’s used has evolved over time. I’ve recently spent some time making it work a little nicer for how we typically write database tests. What I’ve created makes using DBUnit on a project that is already using Spring and the testing support added in Spring 2.5 just a little easier through the application of convention and annotations.

In general, we’ve adopted the convention of loading data off the classpath from a flat dataset file named after the test located next to the test on the classpath. For example (in the maven standard directory structure):

  • src/test/java/com/acme/TripRepositoryTest.java – Java Test Code
  • src/test/resources/com/acme/TripRepositoryTest.xml – DB Unit Data Set for TripRepositoryTest

For most tests, the data set is loaded inside the test’s transaction and rolled back when the test completes so that nothing needs to be cleaned up (see Spring’s reference). For other tests — service or integration tests — the data is loaded outside of a transaction and must be cleared out manually. Most projects have a mix of both strategies and both should be easily supported.

When Spring 2.5 came out with its new testing framework, I threw together a custom TestExecutionListener that looks for test methods that are annotated with @DataSet, and when found, loads the data using DB Unit. Here’s a transaction-per-test example:

TripRepositoryImplTest.java – Example transaction-per-test Test Case

@ContextConfiguration(locations = {"classpath:applicationContext.xml"})
public class TripRepositoryImplTest extends AbstractTransactionalDataSetTestCase {
    @Autowired TripRepository repository;
 
    @Test
    @DataSet
    public void forIdShouldFindTrip() throws Exception {
        Trip trip = repository.forId(2);
        assertThat(trip, not(nullValue()));
    }
}

The high-level execution path for this example looks like:

  1. Inject dependencies (DependencyInjectionTestExecutionListener)
  2. Start transaction (TransactionalTestExecutionListener)
  3. Load dbunit data set from TripRepositoryImplTest.xml (DataSetTestExecutionListener) using the setup operation (default is CLEAN_INSERT)
  4. Execute test
  5. Optionally cleanup dbunit data using the tear down operation (default is NONE)
  6. Rollback transaction (TransactionalTestExecutionListener)

Here’s the trimmed down log output for this test:

INFO: Began transaction (1): transaction manager; rollback [true] (TransactionalTestExecutionListener.java:259)
INFO: Loading dataset from location 'classpath:/eg/domain/TripRepositoryImplTest.xml' using operation 'CLEAN_INSERT'. (DataSetTestExecutionListener.java:152)
INFO: Tearing down dataset using operation 'NONE', leaving database connection open. (DataSetTestExecutionListener.java:67)
INFO: Rolled back transaction after test execution for test context (TransactionalTestExecutionListener.java:279)

For this to work in its current incarnation, a single datasource must be available for lookup in the application context. One of the interesting details is what to do with the connection used to load the data. The framework assumes that if it’s a transactional connection it should be left open because whatever started the transaction should do the closing. When it’s non-transactional it’s closed after the dataset is loaded. This convention works well for how I typically write my database tests.

In addition to the @DataSet annotation, we must add the DataSetTestExecutionListener to the set of listeners that are applied to the test class. As in the above example, you can extend AbstractTransactionalDataSetTestCase which does this for you or you can specify the listener using the class-level annotation @TestExecutionListeners (see example). It’s important that the listener is triggered after the TransactionalTestExecutionListener.

If all test methods use the dataset, then the test class (or super class) can be annotated and every test will load the dataset. Also, if a different dataset should be loaded, the name of the resource can be specified in the annotation (e.g. @DataSet(“TripRepositoryImplTest-foo.xml”) or @DataSet(“classpath:/db/trips.xml”)). Lastly, the setup and teardown database operations can be overriden (e.g. @DataSet(setupOperation = “INSERT”, teardownOperation=”DELETE”)).

This functionality is part of the C5 Test Support package and is available in our maven repository. To use it, first add the C5 Public Maven repository to your pom.xml, and then add the necessary dependencies:

pom.xml

<repositories>
    <repository>
        <id>c5-public-repository</id>
        <url>http://mvn.carbonfive.com/public</url>
        <snapshots>
            <updatePolicy>always</updatePolicy>
        </snapshots>
    </repository>
</repositories>
...
<dependencies>
    <dependency>
        <groupId>org.dbunit</groupId>
        <artifactId>dbunit</artifactId>
        <version>2.2.3</version>
        <scope>test</scope>
    </dependency>
 
    <dependency>
        <groupId>com.carbonfive</groupId>
        <artifactId>test-support</artifactId>
        <version>0.6</version>
        <scope>test</scope>
    </dependency>
    ...
</dependencies>

Check out the sample application for details. It’s mavenized and utilizes an in-memory database. Just check it out of subversion, look over the code, and give it a run using your IDE or from the command-line (mvn install). I’d be psyched to hear what you think and of course, welcome comments and suggestions.

Resources:

Multithreaded Testing

Every now and then you’ll work on something that needs to handle requests from multiple concurrent threads in a special way. I say “special way” because in a web application, everything needs to handle being executed concurrently and there are a slew of techniques used to handle this (prototypes, thread locals, stateless services, etc). Here’s an example of what I mean by “special”…

On my current project, we have a queue of articles that need human-user attention (i.e. editorial moderation). Each article must be doled out to only one moderator and there are multiple instances of the web application servicing requests in the cluster. Imagine tens of thousands of articles per day and a team of moderators churning through them. We can’t rely on Java synchronization because it only works within the JVM instance, not across server instances.

The simplified version of the service interface we’re working on looks like this:

ArticleService.java – Example service interface

public interface ArticleService
{
    Article findNextArticleForModeration();
}

What makes this interesting is that we must ensure that the service doesn’t hand out the same Article to more than one user. This is impossible to assert using a single thread. We’ve all been told that multiple threads and automated testing don’t mix. It’s generally true and should be avoided if at all possible, but in some cases it’s the only way we can truly assert specific behavior. I’ve found a pretty simple way to do this type of testing in a reliable, consistent, and non-disruptive manner. Despite the fact that the technique leverages Java 1.5 built-in concurrency utilities, most of the engineers who have seen it are surprised and weren’t aware that such testing was so easy to implement.

Given the above service interface, here’s a test that will assert that no single article is given out to more than one invoker of the method findNextArticleForModeration(). The scenario we’re simulating is 10 users feverishly moderating a queue of 250 articles as quickly as possible.

ArticleServiceImplTest.java – Test to invoke service concurrently

...
public void findNextArticleForModerationStressTest() throws Exception
{
    final int ARTICLE_COUNT = 250;
    final int THREAD_COUNT = 10;
 
    // Create test data and callable tasks
    //
    Set<Article> testArticles = new HashSet<Article>();
 
    Collection<Callable<Article>> tasks = new ArrayList<Callable<Article>>();
    for (int i = 0; i < ARTICLE_COUNT; i++)
    {
        // Test data
        testArticles.add(new Article());
 
        // Tasks - each task makes exactly one service invocation.
        tasks.add(new Callable<Article>()
        {
            public Article call() throws Exception
            {
                return articleService.findNextArticleForModeration();
            }
        });
    }
    articleService.createArticles(testArticles);
 
    // Execute tasks
    //
    ExecutorService executorService = Executors.newFixedThreadPool(THREAD_COUNT);
    // invokeAll() blocks until all tasks have run...
    List<Future<Article>> futures = executorService.invokeAll(tasks);
    assertThat(futures.size(), is(ARTICLE_COUNT));
 
    // Assertions
    //
    Set<Long> articleIds = new HashSet<Long>(ARTICLE_COUNT);
    for (Future<Article> future : futures)
    {
        // get() will throw an exception if an exception was thrown by the service.
        Article article = future.get();
        // Did we get an article?
        assertThat(article, not(nullValue()));
        // Did the service lock the article before returning?
        assertThat(article.isLocked(), is(true));
        // Is the article id unique (see Set.add() javadoc)?
        assertThat(articleIds.add(article.getId()), is(true));
    }
    // Did we get the right number of article ids?
    assertThat(articleIds.size(), is(ARTICLE_COUNT));
}
...

The test starts off by creating 250 test articles to be moderated. It also creates 250 ‘tasks’, each designed to make a single service invocation of findNextArticleForModeration(). The real magic happens in Executors.newFixedThreadPool() and executorService.invokeAll(). The first creates a new ExecutorService backed by a thread pool of the specified size. This is a generic ExecutorService that is designed to churn through tasks using all of the threads in the pool. invokeAll blocks until every task has finished executing. In this test, 10 threads will rip through 250 tasks, each making a single call to our service and capturing the result of that call. Each task execution results in a Future, which is a handle to the results of the task (and more).

Iterating over each resulting future, we make several assertions. The most important one is the last, where we assert that every task is given a unique Article. Thanks to the natural semantics of Set, this is easy to do in an elegant way. Another useful, though unexpected, feature is that if an exception occurs during the task execution, an ExecutionException will be thrown when get() is called on the corresponding Future. If our service fails for some reason, the test will fail because no exceptions are expected.

This technique makes simulating a multi-threaded environment in a test easy and readable. It’s important to only use this technique when it’s really necessary. The resulting test is more of an integration test than a unit test, and its run time is orders of magnitude more than a unit test, so overuse of the technique will artificially inflate the time it takes to runs the tests. After I’ve finished working on the component under test, I will reduce the test-data size and thread count to a level that the test still provides value, but is no longer a stress test (e.g. 10 articles and 2 threads). The next time the component is being worked on, the developer can crank up the values and run the tests to be confident that the behavior isn’t broken.

The complete source for a working example of this technique is available here. You’ll need Maven (or IntelliJ IDEA 7.x) to build and run the test. By default, the tests run against an in-memory H2Database instance, but if you look at application.properties you’ll see configurations for PostgreSQL and MySQL as well.

Happy testing!

Configuring applications with Spring

If you’ve used Spring before, you’ve almost definitely used a PropertyPlaceholderConfigurer to inject settings from external sources — most likely properties files — into your application context. The most common use cases include JDBC and Hibernate settings, but it’s not that uncommon to also configure Lucene index, temp file, or image cache directories as well. The simplest case looks something like this:

<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
    <property name="location" value="classpath:application.properties"/>
</bean>
 
<!-- A sample bean that needs some settings. -->
<bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
    <property name="driverClassName" value="${jdbc.driver}"/>
    <property name="url" value="${jdbc.url}"/>
    <property name="username" value="${jdbc.username}"/>
    <property name="password" value="${jdbc.password}"/>
</bean>

And application.properties might look like this:

jdbc.driver=org.h2.Driver
jdbc.url=jdbc:h2:mem:example
jdbc.username=sa
jdbc.password=

Note, you can achieve the same simple configuration using the new spring 2.x style schema configuration, but it doesn’t allow for any further customization so we’re going to use the old style.

<!-- Example of new Spring 2.x style -->
<context:property-placeholder location="classpath:application.properties"/>

This handles the simple case of replacing placeholders (e.g. ${jdbc.url}) with values found in a properties files (e.g. jdbc.url=jdbc:h2:mem:example). In a real-world application, we not only need to collect settings, but also override them in different environments. Many of our applications are deployed in 4 or more environments (developer machine, build server, staging server, and production), each requiring different databases at the very least.

There are a few ways to enable overriding of properties. Let’s take a look at them in turn:

1. Setting the system properties mode to override (default is fallback)

<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
    <property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_OVERRIDE"/>
    <property name="location" value="classpath:application.properties"/>
</bean>

When configured in this mode, any value specified as a system property to the JVM will override any values set in properties files. For example, adding -Djdbc.url=jdbc:h2:mem:cheesewhiz to the JVM arguments would override the value in the file (jdbc:h2:mem:example). On a Java 1.5 or newer platform, Spring will also look for an environment variable called jdbc.url is no system property was found.

2. Specifying an optional properties file

<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
    <property name="ignoreResourceNotFound" value="true"/>
    <property name="locations">
        <list>
            <value>classpath:application.properties</value>
            <value>classpath:local.properties</value>
        </list>
    </property>
</bean>

When ignoreResourceNotFound is set to true, Spring will ignore resources that don’t exist. You can imagine application.properties, containing all of the default settings, versioned in your SCM system. Developers have the option of creating a properties file called local.properties to override any settings that differ in their environment. This file should be unversioned and ignored by your SCM system. This works because properties are loaded in order and replace previous values.

3. Web Application overrides

In a web application environment, Spring also supports specifying values in web.xml as context params or in your application server specific meta-data as servlet attributes. For example, if you’re using Tomcat you can specify one or more parameter elements in your context.xml, and Spring will can inject those values into placeholders.

<bean class="org.springframework.web.context.support.ServletContextPropertyPlaceholderConfigurer">
    <property name="location" value="classpath:application.properties"/>
</bean>

The ServletContextPropertyPlaceholderConfigurer conveniently works in non servlet environments by falling back to the behavior of a PropertyPlaceholderConfigurer. This is great when running unit tests.

4. Combining techniques

There’s no reason why these techniques can’t be combined. Technique #1 is great for overriding a few values while #2 is better for overriding many. #3 just expands the field of view when Spring goes to resolve placeholders. When combined, system properties override those in files. When using technique #3, there are some settings available for adjusting the override behavior (see contextOverride). Test the resolution order when combining to ensure it’s behaving as expected.

Optional External Properties

There’s another use case that applies to some projects. Often in non-developer environments, system admins want to keep properties for the environment outside of the deployable archive or the application server, and they don’t want to deal with keeping those files in a Tomcat context file; they prefer a simple properties file. They also don’t want to have to place the file in a hard-coded location (e.g. /var/acmeapp/application.properties) or they may keep configuration for multiple servers in the same network directory, each file names after the server. With a little trickery, it’s easy to support an optional external properties file that isn’t in a hard-coded location. The location of the file is passed as a single system property to the JVM, for example: -Dconfig=file://var/acmeapp/server1.properties. Here’s the configuration to make it happen:

<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
    <property name="ignoreUnresolvablePlaceholders" value="true"/>
</bean>
 
<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
    <property name="ignoreResourceNotFound" value="true"/>
    <property name="location" value="${config}"/>
</bean>

The first definition enables basic property resolution through system properties (in fallback mode). The second bean loads the resource from the location resolved from the system property -Dconfig. All spring resource urls are supported, making this very flexible.

Putting it all together

Here’s a configuration that does more than most people would need, but allows for ultimate flexibility:

<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
    <property name="ignoreUnresolvablePlaceholders" value="true"/>
</bean>
 
<bean class="org.springframework.web.context.support.ServletContextPropertyPlaceholderConfigurer">
    <property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_OVERRIDE"/>
    <property name="searchContextAttributes" value="true"/>
    <property name="contextOverride" value="true"/>
    <property name="ignoreResourceNotFound" value="true"/>
    <property name="locations">
        <list>
            <value>classpath:application.properties</value>
            <value>classpath:local.properties</value>
            <value>${config}</value>
        </list>
    </property>
</bean>

Every placeholder goes through the following resolution process. Once a value is found it’s set and the next placeholder is resolved:

  1. (optional) Property value specified as a system or environment property; useful for overriding specific placeholders (e.g. -Djdbc.host=devdb / -Djdbc.username=carbon5)
  2. (optional) Context parameters located in web.xml or context attributes specified in application server meta-data (e.g. a Tomcat context.xml).
  3. (optional) Properties file located by the system/environment variable called “config”; useful for externalizing configuration. All URL types are supported (e.g. -Dconfig=c://hmc.properties).
  4. (optional) Properties file identified by classpath:local.properties; useful for specific developer overrides.
  5. (required) Properties file identified by classpath:application.properties, which contains default settings for our application.

Best Practices

  • Deploy the same exact artifact (e.g. war, ear, etc) across all environments by externalizing configuration. This may seem daunting, but the emergent benefits are huge in terms of simplicity.
  • Only make things that can safely change across environments configurable. Also, only things that need to be configurable should be configurable, it’s easy to go overboard.
  • Configure the minimal properties search path that meets your requirements.
  • When looking for properties files in the project tree, use classpath resources whenever possible. This makes finding those files easy, consistent, and insensitive to the working-dir, which is great when running tests from your IDE and command line.
  • Aim for a zero-configuration check-out, build, run-tests cycle for the environment where its happens most: development.

What other interesting configuration scenarios have you seen?

Injecting Spring 2.5 beans into Stripes Actions

I’ve been playing around with Stripes, a light-weight, well-designed simple Java web MVC framework recently. I haven’t had the pleasure of working with it on a production application yet, but hope to sometime soon. Meanwhile, I’ve been tinkering on a pet project.

As you may know, we often use Spring for lifecycle management of our services and dependency injection (among other uses). In a Stripes + Spring application, you can imagine Spring-managed services being used by Stripes Actions (i.e. controllers). Every incoming HTTP request results in a new instance of a Stripes Action, thus the newly created actions must have their dependencies injected for every request.

Stripes ships with a SpringInterceptor that supports annotating fields and methods on your actions with @SpringBean. While this works fine, I was pretty interested in being able to use Spring 2.5′s annotations for marking what should be injected on my actions, so I created a Spring25Interceptor (see code below).

The Spring25Interceptor is configured the same way as the out-of-the-box SpringInterceptor. In your web.xml:

web.xml

...
<filter>
    <display-name>Stripes Filter</display-name>
    <filter-name>StripesFilter</filter-name>
    <filter-class>net.sourceforge.stripes.controller.StripesFilter</filter-class>
    <init-param>
        <param-name>Interceptor.Classes</param-name>
        <param-value>
            net.sourceforge.stripes.integration.spring.Spring25Interceptor
        </param-value>
    </init-param>
    ...
</filter>
...

Since it uses the same annotations that Spring 2.5′s supports (@Autowired, @Resource, and @Qualifier), annotating your Stripes Actions is easy and should look pretty familiar (see Juergen’s blog for a comprehensive overview). Here are a few hypothetical examples:

LandingActionBean.java

public class LandingActionBean extends AbstractActionBean
{
    // Autowire by type (looks for a bean in the application context of type ServiceA)
    @Autowired ServiceA serviceA;
 
    // Autowire by name (looks for a bean with the name 'serviceB')
    @Resource ServiceB serviceB;
 
    // Autowire by name (looks for a bean with the name 'serviceC')
    @Autowired @Qualifier("serviceC") ServiceC serviceCee;
 
    // Method injection examples (all of the above can be applied to methods as well)
 
    @Autowired
    public void setServiceA(ServiceA a) { this.serviceA = a; }
 
    @Autowired
    public void setServices(ServiceA a, ServiceB b, ServiceC c) {
        this.serviceA = a;
        this.serviceB = b;
        this.serviceC = c;
    }
 
    @DefaultHandler
    public Resolution execute()
    {
        return new ForwardResolution("/landing.jsp");
    }
}

While it may be considered a subtle improvement, I really like the fact that with this new interceptor, my application has a consistent syntax for dependency injection across the tiers. Additionally, the semantics of the Spring annotations are also consistent and shared throughout. An added bonus is that the new Interceptor is significantly smaller than the existing one.

Here’s the Interceptor code that does the dependency injection into actions:

Spring25Interceptor.java

import net.sourceforge.stripes.action.*;
import net.sourceforge.stripes.controller.*;
import net.sourceforge.stripes.util.*;
import org.springframework.beans.factory.config.*;
import org.springframework.context.*;
import org.springframework.util.*;
import org.springframework.web.context.support.*;
import javax.servlet.*;
 
@Intercepts(LifecycleStage.ActionBeanResolution)
public class Spring25Interceptor implements Interceptor
{
    private static final Log log = Log.getInstance(Spring25Interceptor.class);
 
    public Resolution intercept(ExecutionContext context) throws Exception
    {
        Resolution resolution = context.proceed();
        log.debug("Running Spring dependency injection for instance of ", context.getActionBean().getClass().getSimpleName());
        ServletContext servletContext = StripesFilter.getConfiguration().getServletContext();
        ApplicationContext applicationContext = WebApplicationContextUtils.getWebApplicationContext(servletContext);
        AutowireCapableBeanFactory beanFactory = applicationContext.getAutowireCapableBeanFactory();
        beanFactory.autowireBeanProperties(context.getActionBean(), AutowireCapableBeanFactory.AUTOWIRE_NO, false);
        beanFactory.initializeBean(context.getActionBean(), StringUtils.uncapitalize(context.getActionBean().getClass().getSimpleName()));
        return resolution;
    }
}

I’ve created a enhancement request for this feature, though the comment-thread is a little bit all over the place. You can find the latest version of the code and Javadoc as an attachment on the issue.

Introducing Java DB Migrations

UPDATE: A new version of the Java DB Migrations framework has been release, check this post for details and the project documentation.

Here at Carbon Five we have the luxury of working on many projects, so anything we can do to make things easier will pay off in multiplicity across new projects. One of the things that we have to deal with on every project is maintaining a database schema over time. We’ve had a manual process of capturing changes in incremental db patch scripts for a while, but it was error prone and sometimes neglected. We’ve been doing more Ruby on Rails work and found Rails Migrations easy to work with and a real time saver. We wanted something that would make our lives easier when working on Java projects in the same way Migrations improve Rails development. With that manifest in mind, Alon and I collaborated on a simple Java database migration framework.

During development, it’s a big deal because each engineer has two instances of the database, one for unit tests and another for running the application. We need an easy way to create a new, up-to-date database and update existing databases. Once a project has launched, it’s a big deal because we need a way to migrate a database teeming with important production data to the latest version without losing critical information.

High Level Requirements

  • Initiate a migration from the command-line as a Maven plugin
  • Programmatically migrate a database during application startup
  • Convention over Configuration
  • Initially support migrations written in SQL

At a high level, the migration process looks like this:

  1. Query the database (table db_version) to find the current version.
  2. Determine the latest database schema version available.
  3. If the database is out of date, run each migration in order in its own transaction, updating the db_version for each migration.

We’d identified two usage patterns, the first is more akin to the Rails Migration model in that you explicitly migrate the database via the command line. The second is automatic migration when an application starts up, before Hibernate initializes or any other data access takes place. I’ll
discuss each in turn.

Migrating using Maven

This functionality is easy to enable in a mavenized project. First you add the Carbon Five public plugin repository:

pom.xml

...
<pluginRepositories>
    <pluginRepository>
        <id>c5-public-repository</id>
        <url>http://mvn.carbonfive.com/public</url>
    </pluginRepository>
</pluginRepositories>
...

And then you configure the migration plugin:

pom.xml

...
<plugin>
    <groupId>com.carbonfive</groupId>
    <artifactId>maven-migration-plugin</artifactId>
    <version>0.9-SNAPSHOT</version>
    <configuration>
        <defaultEnvironment>test</defaultEnvironment>
        <environments>
            <environment>
                <name>default</name>
                <driver>com.mysql.jdbc.Driver</driver>
                <username>dev</username>
                <password>dev</password>
            </environment>
            <environment>
                <name>test</name>
                <url>jdbc:mysql://localhost/myapp_test</url>
            </environment>
            <environment>
                <name>development</name>
                <url>jdbc:mysql://localhost/myapp_development</url>
            </environment>
        </environments>
    </configuration>
    <dependencies>
        <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.5</version>
        </dependency>
    </dependencies>
</plugin>
...

You’ll notice that we’ve got 2 environments configured. You can have as many as you need and you can specify which you want to migrate on the command line. If none are specified the default environment will be migrated. In this example we’re specifying the dependency on our JDBC driver so that the plugin has access to the code it needs to connect the database.

Lastly, you drop in your migration scripts into the src/main/resources/db/migrations directory, naming them using the pattern NNN_description.sql, where NNN is three digits indicating the script sequence. Some examples might be:

  • 001_create_users_table.sql
  • 002_add_default_users.sql
  • 003_add_lastvisit_column.sql

The description is optional and isn’t used for anything, it’s just there so that other developers can get an idea of what a script does without having to open it.

From the command line, you can run the migration plugin like this:

$ mvn migration:migrate

Note that he database must exist for the migrations to take place as we do not create missing databases (yet).

I’ve created a simple, complete sample that shows off this functionality, it’s on the C5 public subversion repository here. Check it out and then read the readme.txt at the top of the project.

Migrating from your Application

The other usage scenario is to auto-migrate during application startup. At the core of the framework, there’s an interface called MigrationManager which has two implementations: DataSourceMigrationManager and DriverManagerMigrationManager. Migration happens right after a datasource (of the javax.sql variety) is created.

Migrating from your application is as easy as instantiating one of these early in the startup cycle and invoking the migrate() method, something like this:

MigrationManager migrationManager = new
    DriverManagerMigrationManager(“com.mysql.jdbc.Driver”,
    “jdbc:mysql://localhost/myapp_test”, “dev”, “dev”);
migrationManager.migrate();

Of course this needs to happen before anything else in the application uses the database; we want to database to be updated completely before it’s used.

Spring is part of our standard development stack on our Java projects, and it’s easy to enforce these dependencies in Spring configuration. First we define a data source for the application:

<bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
    <property name="driverClassName" value="org.h2.Driver"/>
    <property name="url" value="jdbc:h2:file:~/.h2/migration_sample2_test"/>
    <property name="username" value="dev"/>
    <property name="password" value="dev"/>
</bean>

And now we declare our MigrationManager instance. Note the ‘init-method’ attribute:

<bean id="migrationManager"
    class="com.carbonfive.db.migration.DataSourceMigrationManager"
    init-method="migrate">
    <constructor-arg ref="dataSource"/>
</bean>

And then you define something that’s going to use the defined DataSource. Note the ‘depends-on’ attribute:

<bean id="userService"
    class="com.carbonfive.migration.sample2.UserService"
    depends-on="migrationManager">
    <constructor-arg ref="dataSource"/>
</bean>

This is obviously a little contrived for the sake of example, but you get the point. In a typical application the thing that would depend on the datasource is a Hibernate SessionFactory.

You can visit the source code for this example on the C5 public subversion repository href="https://svn.carbonfive.com/public/christian/migration-sample2/trunk">here.

Best Practices

Here are a few of the things we’ve learned along the way:

  • Start using migrations early. Definitely start by time there’s more than one person on a project. I usually start off letting hibernate generate my schema while I’m experimenting with things, but as soon as I’m really working on features I’ll switch over to migrations.
  • All database changes are captured as a new migration.
  • Migration scripts cannot be changed once anyone has run them and make further changes in a new migrations.

Source Code Access

The Carbon Five db-support project which contains all of this migration goodness is available on the C5 public subversion repository at https://svn.carbonfive.com/public/carbonfive/db-support/trunk. It’s a maven project and should compile and pass its tests out of the box. I encourage you to look through the code and check out the tests.

Future

If you look through the code you’ll see some of what’s in store for this project. We’ve got initial support for writing migrations in Groovy and JRuby and we’re thinking about added Java support as well. We’re looking for feedback to drive the future direction of the project, so feel free to write us and let us know what you think.

Versioning your IDEA module meta-data (.iml)

On one of our larger client projects, we’ve been tackling an array of improvements to the development environment. This is largely motivated by the fact that it’s a big project with lots of modules and engineers and time lost dealing with environment issues was expensive and frustrating. Everyone wants to be productive without having to wrestle with their tools or understand the ins and outs of things on the periphery, like build systems and IDE configuration.

We migrated to Maven 2 for builds partially because it’s convention driven, so when questions of how we should do something come up we can default to Maven best practices. Another reason is that it enabled us to stick to the Don’t Repeat Yourself (DRY) principle. We didn’t want to duplicate build meta-data by maintaining both command-line configuration (Maven pom.xmls) and IntelliJ IDEA configuration. Maven deals with this pretty well via the maven-idea-plugin.

The downside to running “mvn idea:module” every time there’s a pom change is that it eats at least a few minutes with all of the checks for sources and javadocs. To make matters worse, we always had to run it from the master directory so that intra-module dependencies were modeled correctly as IDEA module dependencies, not jar dependencies. Each developer would go through this routine when there was a change because we weren’t versioning our IDEA meta-data in subversion. When considering all of the engineer’s time, this process would consume over half an hour in total.

I wanted to change the process so that we still used the maven-idea-plugin to generate the meta-data (staying true to DRY), but then commit the generated files to subversion so that only one person would have to go through the pain of re-generating (presumably whoever made the changes to the poms) and the rest of the team could continue without much disruption. The problem is that each generated .iml file contains absolute paths which are specific to their machine.

After a bit of sleuthing, I discovered an IDEA option called Path Variables which is available in the general settings section. There isn’t much to tell you what it does, so it’s easily overlooked. You can specify one or more variables and IDEA will automatically replace file paths in project (.ipr) and module files (.iml) with the provided variables. Everyone on the development team added a Path Variable called “M2_REPO” which points to their local maven repository (~/.m2/repository).

add path variable

After running “mvn idea:module” we just open the project in IDEA and it will do the replacing behind the scenes. Now the module meta-data can be versioned and shared.

It’s a small feature, but it’s helping us streamline everyone’s day-to-day.

IDEA 7 includes native Maven support which is getting better with each point release. The process described herein works great with the native plugin as well.

DBUnit 2.2, Spring, and Testing

You may have noticed that DBUnit changed its connection closing behavior in v2.2. We noticed it when tests deriving from our custom DatabaseTestCase implementation (which uses DBUnit) starting failing. Initially we just reverted back to version 2.1. Since then I’ve discovered what the problem was and found a workaround, and while I was at it, I created some nice utilities to be used in tests which need data fixtures.

Problem

In v2.2 DBUnit introduces some new abstractions, one of which is the IDatabaseTester. Whether it’s by design or by accident, the default implementation closes connections after executing its operations (more info). The end result is that in our unit tests, the database connection is closed before the test can run.

Solution

I’ve created two simple util classes for loading test data into a database without closing the connection:

/** A helper for loading data sets into unit tests. */
public class DatabaseUtils {
    public static void loadDataSet(Class clazz, final DataSource dataSource) throws Exception {
        IDataSet dataSet = new FlatXmlDataSet(TestUtils.datasetInputStream(clazz));
        IDatabaseTester tester = new ExistingConnectionDatabaseTester(dataSource);
        tester.setDataSet(dataSet);
        tester.onSetup();
    }
}
/** A special DatabaseTester that doesn’t close the connection when its done. */
public class ExistingConnectionDatabaseTester extends AbstractDatabaseTester {
    private DataSource dataSource;
 
    public ExistingConnectionDatabaseTester(DataSource dataSource) {
        super();
        this.dataSource = dataSource;
    }
 
    public IDatabaseConnection getConnection() throws Exception {
        return new DatabaseConnection(DataSourceUtils.getConnection(dataSource));
    }
 
    public void closeConnection(IDatabaseConnection connection) throws Exception {
        // Don't close that connection!
    }
}

The first of these depends on another test-related utility, TestUtils.java (see below), which converts a class to a path. The second depends on Spring’s DataSourceUtils.

/**
 * A set of utilities that generate paths from classnames.  This is useful when
 * making reference to resources used by test cases, when the resources are
 * located in a directory path matching the class' package.
 */
public class TestUtils {
 
    public static final String TEST_PREFIX = "src/test/resources/"; // Maven2 default
 
    public static String pathString(Class clazz) {
        return pathString(TEST_PREFIX, clazz, "");
    }
 
    public static String pathString(Class clazz, String resource) {
        return pathString(TEST_PREFIX, clazz, resource);
    }
 
    public static String pathString(String prefix, Class clazz, String resource) {
        prefix = (prefix != null ? prefix : "");
 
        StringBuffer sb = new StringBuffer();
        sb.append(prefix);
 
        if (!prefix.endsWith("/")) {
            sb.append("/");
        }
 
        sb.append(ClassUtils.classPackageAsResourcePath(clazz));
        sb.append("/");
        sb.append(resource);
 
        return sb.toString();
    }
 
    public static InputStream pathInputStream(Class clazz, String resource)
        throws FileNotFoundException {
        return pathInputStream(TEST_PREFIX, clazz, resource);
    }
 
    public static InputStream pathInputStream(String prefix, Class clazz, String resource)
        throws FileNotFoundException {
        return new FileInputStream(pathString(prefix, clazz, resource));
    }
 
    public static InputStream datasetInputStream(Class clazz) throws FileNotFoundException {
        return pathInputStream(TEST_PREFIX, clazz, ClassUtils.getShortName(clazz) + ".xml");
    }
}

These three classes give us everything we need to load data into our test database using DBUnit. And TestUtils can be used to load other test resources from the classpath as well, like images, documents, CSVs, etc.

Here’s an example of how it can be used in a DAO test based on Spring’s test classes.

public class ArtistHibernateDaoTest extends AbstractAnnotationAwareTransactionalTests {
 
    private ArtistHibernateDao artistDao;
 
    public void setArtistDao(ArtistHibernateDao artistDao) {
        this.artistDao = artistDao;
    }
 
    protected String[] getConfigLocations() {
        return new String[]{"applicationContext-database.xml", "applicationContext-hibernate.xml"};
    }
 
    protected void onSetUpInTransaction() throws Exception {
        DatabaseUtils.loadDataSet(getClass(), getJdbcTemplate().getDataSource());
    }
 
    public void testFindAllNames() {
        List<String> names = artistDao.findAllNames();
        assertEquals(3, names.size());
        assertTrue(names.contains("The Cure"));
        assertTrue(names.contains("Depeche Mode"));
        assertTrue(names.contains("New Order"));
    }
}

onSetUpInTransaction() loads the data set using the convention of (packagename).(ClassName).xml before each test (e.g. com.c5.PizzaTest loads the file com/c5/PizzaText.xml on the classpath). In this case the data is loaded in the same transaction as the test so that no clean-up is necessary when the test is completed. These tests run fast!

We’ve been talking about moving away from our custom DatabaseTestCase class hierarchy. The above functionality in conjunction with Spring’s test hierarchy could be a good start.

On a somewhat related note, there is a new unit testing framework called Unitils which does this sort of thing and much more. I haven’t used it but it sounds interesting.

Christian