Author Archive for christian

Solid State Disks

I’m a bit of a hardware geek. I tend to keep up on what’s new and neat. I’m a bit conservative when it actually comes to buying the latest and greatest, but that doesn’t stop me from following the trends.

I’ve been eyeing solid state disks for the last year or so, speculating that they could have a huge impact on developer productivity once all of the major gotchas were worked out. I specifically though that pairing a decent laptop with an SSD would be awesome, since laptops generally have significantly slower hard disks than workstations. They’re also a bit slower across the board (cpu, memory bus, etc), so I thought the IO boost from an SSD might even things out.

In December I decided to end the speculation, so I bought an 80G Intel x25m (Gen 2) SSD for my 15″ Unibody MacBook Pro.
Continue reading ‘Solid State Disks’

Database Migrations: Fail when database is out of date

The latest release of the Carbon Five Database Migration maven plugin supports a new goal: check. The check goal fails the build if the database isn’t up to date. That is, if there are pending migrations the plugin produces a clear message explaining that the database is out of date and lists the pending migrations. Run mvn test and see something like this:

[INFO] ------------------------------------------------------------------------
[INFO] Building Gearlist - Data Access
[INFO]    task-segment: [test]
[INFO] ------------------------------------------------------------------------
[INFO] [db-migration:check {execution: default}]
[INFO] Checking jdbc:mysql://localhost/gearlist_test using migrations at src/main/db/migrations/.
[INFO] Loaded JDBC driver: com.mysql.jdbc.Driver
[WARNING] There are 2 pending migrations: 
 
    20100116010256_audit_tracking.sql
    20100121052539_add_daily_reports.sql
 
    Execute db-migration:migrate to apply pending migrations.
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] There are 2 pending migrations, migrate your db and try again.

It’s very easy to include the check goal in your build lifecycle if you’re already using the db-migration-maven-plugin.

...
<build>
    <plugins>
        <plugin>
            <groupId>com.carbonfive.db-support</groupId>
            <artifactId>db-migration-maven-plugin</artifactId>
            <version>0.9.9-m2</version>
            <executions>
                <execution>
                    <phase>validate</phase>
                    <goals>
                        <goal>check</goal>
                    </goals>
                </execution>
            </executions>
            <configuration>
                <url>jdbc:mysql://${db.host}/${db.name}</url>
                <username>${db.username}</username>
                <password>${db.password}</password>
            </configuration>
        </plugin>
    </plugins>
</build>

Check out the project home for additional documentation and details. There’s also a simple, complete example application showing off this configuration.

Enjoy!
Christian

C5 Test Support new addition: FunctionalTestRunner

We’re always looking for new ways to test our applications and we’ve been trying a few new things on our projects. One of the recent additions is a JUnit test runner designed to help make writing and running functional tests easier. In Javaland, we use Selenium and/or HtmlUnit for our functional tests. These are the tests that run against a deployed application over the wire using a real or simulated browser. Most of our functional tests work the application in the same way a real user would, testing sequences of realistic activity and often touching a number of pages. Since our functional tests use either a real browser or a simulated one, Javascript is executed and assertions made on the results. This gives us greater confidence that our app is really working, end to end.

Here’s the high-level flow that the functional test runner provides:

  1. Load fixture data from a DBUnit dataset.
  2. Download and install the application server (if necessary).
  3. Start the application server (using Cargo).
  4. Deploy the application, waiting until it’s completely started.
  5. Run one or more functional tests (using your preferred testing framework – Selenium, HTMLUnit, etc)…
    If a test dirties the database in a manner that must be reset, the test class can be marked with the @DirtiesDatabase annotation. This will reload the database fixture and optionally restart the application.
  6. Shutdown the application server.

Continue reading ‘C5 Test Support new addition: FunctionalTestRunner’

Web Application Testing @ SDForum October 6th

Updated: Added links to presentation and source code.

I’ll be down in Palo Alto speaking about Automated Web Application Testing Tuesday October 6th. If you’re interest in getting a peek at the typical Carbon Five Java web architecture along with a variety of strategies and techniques for testing, c’mon down and join us. The session will be primarily code-driven. I’ll implement a few features along with unit, integration and functional tests and show off some of the techniques and custom tools that help keep things simple and easy during development. Some of the topics include:

  • Brief overview of our typical web architecture and tools stack
  • Differences between Unit, Integration and Functional Tests
  • Dealing with the database (schema and fixtures)
  • Where does test coverage pay off the most?
  • In-browser and out-of-browser functional testing
  • Carbon Five best practices, custom tools and techniques
  • And more…

You can find the gritty details on the SDForum site.

Doors at 6:30 and the show begins at 7:00. Hope to see you there!

Here are the artifacts from the presentation:

Continuous Integration and Build Promotion

We have a build server and we practice continuous integration on all of our projects. In fact, it’s pretty much the first thing we set up after version control. We’re feedback junkies. It became especially apparent while working on a client project last year where we used their development infrastructure. They had a build server running, but the problem was that it took too long to get feedback. I was lost and jonesing. They had a monolithic build that took about 45 minutes to run its course and give me the affirmation I was seeking.

While setting up my current project, I decided to take a different approach when configuring our build server and continuous integration. I had a few goals in mind:

  • Super fast feedback
  • “Promoted” builds get pushed to the acceptance server
  • Minimize issues with the build infrastructure

I split our build into three plans:

Continuous Integration Build Plan

This resets the unit test database (using c5-db-migrations), compiles the project, runs all of the unit tests, and if there are no errors or failures, produces a war. The war is installed locally using Maven so that it’s accessible to other processes in a known location. This build is very fast and is triggered on every subversion commit. The command used for this build plan is:

mvn db-migration:reset clean install

Functional Tests Build Plan

This resets the functional test database, deploys the war built during the continuous integration build to tomcat using Cargo, runs all of the functional tests, and shuts down tomcat. This build is triggered on every successful continuous integration build (i.e. as a dependent build). A very short script performs the work of this build:

mvn -Pdev db-migration:reset
cd functional-tests; mvn -Pdev clean test-compile cargo:start surefire:test cargo:stop

And here’s the cargo-maven-plugin configuration:

<plugins>
...
    <plugin>
        <groupId>org.codehaus.cargo</groupId>
        <artifactId>cargo-maven2-plugin</artifactId>
        <version>1.0-beta-2</version>
        <configuration>
            <wait>false</wait>
            <configuration>
                <deployables>
                    <deployable>
                        <groupId>com.acme</groupId>
                        <artifactId>acme-web</artifactId>
                        <type>war</type>
                        <properties>
                            <context>acme-web</context>
                        </properties>
                    </deployable>
                </deployables>
            </configuration>
            <container>
                <containerId>tomcat6x</containerId>
                <zipUrlInstaller>
                    <url>http://www.apache.org/dist/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip</url>
                </zipUrlInstaller>
            </container>
        </configuration>
    </plugin>
</plugins>

Deploy to Acceptance Build Plan

This step doesn’t build anything or run any tests, but is a little more complicated than the others because it’s interacting with a another machine: our dedicated project acceptance server. We scp the war (built during the continuous integration phase) to our acceptance server. Next, we shutdown tomcat and clean a few things up (logs, old webapp, and work). Then we migrate the database (no reset because we care about the data). Last, we bring tomcat back up with the new war. This build is triggered on every successful functional test build. Here’s the script that’s run by the build server:

ARTIFACT_NAME=./ROOT.war
 
# SCP Application and Scripts
WAR=`find ~/.m2/repository/com/acme/acme-web -name "*-*.war" | sort | tail -1`
echo "Copying $WAR to acceptance server."
scp $WAR acme@acme-acceptance:$ARTIFACT_NAME
scp ./bin/*_as.sh acme@acme-acceptance:.
 
# Shutdown and Clean Tomcat
ssh acme@acme-acceptance sh ./shutdown_as.sh
 
# DB Migration
mvn -Pdev db-migration:migrate -Djdbc.host=acme-acceptance
 
# Install Application
ssh acme@acme-acceptance mv $ARTIFACT_NAME ./apache-tomcat/webapps/
 
# Startup
ssh acme@acme-acceptance sh ./startup_as.sh

Conclusion

This has worked out really well for the project and we get feedback very quickly.

You may wonder why we broke functional tests into their own plan. I find that functional tests can be a little less stable than unit tests (especially if selenium is involved), and they run much more slowly. I’ve seen cases where flaky functional tests caused a team to start ignoring build results because it was usually a problem with the infrastructure, not with the code. So, the decision was somewhat defensive and in retrospect, probably unnecessary.

We’ve spent less time maintaining our build plans than in the past as well. At some point, our build server MySQL instance crapped out and even though all of the databases were deleted, our builds all ran successfully when the build server came back up because they start with a database reset, which creates the target database and migrates it to the latest schema.

A single war is promoted as it passes a greater level of testing, and is eventually deployed to the acceptance server if all of the tests pass. While we’re saving a little time by not rebuilding the archive for each plan, that’s not the only thing I like about it. It just feels a little more right and it completely eliminates the chance that something about the artifact changes as it makes its way through the pipelines. The same war can make its way from the first CI build all the way to production. This is possible because we include default configuration for the application which matches our development environments, and then provide a mechanism for externalizing application configuration for the one-off environments.

I think most modern build server software provides everything you need to do something like this as it’s rather straight forward. However, for those who are curious, we have been using Bamboo for the last year or two and recently installed TeamCity so that we can give it a proper try. Both are great products, and if you’re on a smallish team, TeamCity is completely free (and is superior to the open source alternatives, IMHO).

How are you using your build server?

Stripes: A Successful First Project

We’re wrapping up a project that I’ve been leading since September and I’ve been reflecting on some of my decisions. Some of this reflection might be interesting to other developers. There are a few things on my mind, but I’ll start off off with my decision to use Stripes as our MVC instead of our usual, Spring MVC.

Background

I’ve never been completely satisfied with Spring MVC (note that it’s pretty hard to win me over completely). We know it well and we’ve had many successful projects while using it. We’ve also used many of the new features that came along with Spring 2.5 (@Controller, more convention over configuration, etc), but in the end I still wasn’t loving it.

I came across Stripes over a year ago, and noted that it had a small but fairly vibrant and excited community. The projects goals definitely resonated with me:

  • Make developing web applications in Java easy
  • Provide simple yet powerful solutions to common problems
  • Make the Stripes ramp up time for a new developer less than 30 minutes
  • Make it really easy to extend Stripes, without making you configure every last thing

So I decided to give it a try on a real project. Switching from something we know inside and out to something that none of us had production experience with was arguably risky, so we decided to give it a try for a week with the intention that we’d go back to Spring MVC if anything took too long or felt awkward. Thankfully, that never happened.

Some of this article may read a bit like a Stripes versus Spring MVC comparison. That’s not really my intention, but it’s somewhat inevitable as much of my experience has been with Spring MVC. This isn’t intended to be a Stripes tutorial (there are great ones out there), so the code snippets and technical details will be sparse.

Controller Lifecycle, Binding, and the Model

Stripes controllers are called “Actions” or “ActionBeans” and each incoming HTTP request is routed to one primary Action (like Spring MVC). Stripes creates a new Action instance for each incoming request; Spring Controllers are singletons in comparison. Stripes binds parameters into the fields on the Action where Spring MVC passes them as method parameters. The Stripes Action is not only the “Controller” in the MVC, but it also serves as the root of the “Model” as well. The Action is made available to the View and all properties with getters can be queried using JSP-EL. Spring’s model is separate, necessitated by the Singleton nature of the Controller.

Let’s look at a simple example:

// URI and embedded parameters defined using CleanURLs
@UrlBinding("/status/{orderId}/{$event}")
public class OrderStatusAction extends AbstractActionBean
{
    // Spring managed service to be dependency injected (see "Worth Mentioning" below)
    @Autowired OrderService orderService;
 
    // Required incoming parameter bound in the URI with {orderId}
    @Validate(required = true, minvalue = 1) long orderId;
 
    // OrderStatus to be accessible from the view for rendering
    OrderStatus orderStatus;
 
    // public setter tells stripes to allow binding
    public void setOrderId(long orderId) { this.orderId = orderId; }
 
    // public getter tell stripes to allow access from the view
    public OrderStatus getOrderStatus() { return orderStatus; }
 
    public Resolution view()
    {
        orderStatus = orderService.getOrderStatus(orderId);
        if (orderStatus == null) return new ErrorResolution(404);
        return new ForwardResolution("/WEB-INF/jsp/order-status.jsp");
    }
}

In our view we can access the Action / Model:

...
<jsp:useBean id="actionBean" scope="request" type="eg.OrderStatusBean"/>
...
<li>Order Number: ${actionBean.orderStatus.order.id}</li>
<li>Status: ${actionBean.orderStatus.status}</li>
<li>Tracking Number: ${actionBean.orderStatus.trackingNumber}</li>
...

This example binds to a long, which is pretty simple. Stripes can bind into graphs of objects, instantiating them along the way if necessary. Collections are fully supported as well. We haven’t yet found an example of something we couldn’t bind into right out of the box.

While I first was resistant to the Stripes lifecycle and combination of Controller and Model, I soon warmed up to it and now I find it quite natural, a bit better from a code readability standpoint, and more aesthetic. It’s just the right amount of abstraction and encapsulation to make for speedy development while being easy to maintain. I really like the fact that new instances of actions are created for each request, because the alternative is to pass all of your state into a handler method, which can easily lead hard-to-read code, especially with the annotations required to describe which request parameter maps to which method parameter.

If you can’t bind directly into your value objects and entities, the Action gives you a great place to bind into first, allowing you to manually instantiate your domain objects plugging in values from the Action. This is quite useful when your domain model isn’t direct-binding friendly because of invariant enforcing, immutable value objects, and other practices encouraged by Domain Driven Design.

We keep our Actions simple and lightweight, deferring all non-display logic to transactional, spring-managed services. I’ve seen examples where Actions are directly interacting with the database, a pattern I discourage.

If you want to know more about how Stripes works, check out the references section at the end of this article.

Generating URIs in Views

I can’t tell you how many times I’ve run into regressions after making changes to URIs where a page would link to a controller at the wrong URI. With Stripes, your URIs are defined once and only once, so when you change where an Action lives, pages will link to it correctly at its new location. Stripes tags take a beanclass argument so that it can determine the correct URI at runtime rather than hard-coding it in the view.

<stripes:link beanclass="eg.OrderStatusAction" event="view">
  <stripes:param name="orderId" value="65432"/>
  View Order Status
</stripes:link>

Renders: <a href="http://example.com/status/65432/view">View Order Status</a>

The <stripes:url …/> and <stripes:form …/> tags work the same way. To round it out, Actions can forward or redirect to other Actions without embedding URIs:

return new ForwardResolution(OrderHistoryAction.class, "view").
return new RedirectResolution(OrderHistoryAction.class, "view").

The net result is that there’s a single definition of each URI in our system and it lives on the Action which handles that URI, realizing the Don’t Repeat Yourself (DRY) principle. We’ve been able able to change our URIs easily without fear of breaking views, which has been helpful as the project grows.

Configuration

Stripes only needs a few lines of configuration in your web.xml. That’s it (really). Stripes was built with convention over configuration in mind from day one. Actions and Extensions (Converters, Formatters, Interceptors, etc) are auto-discovered via classpath scanning. We never found ourselves needing to configure something differently than how it was out of the box.

While you can configure Spring MVC to be convention based (it’s okay to chuckle at this too), it’s not that way out of the box. Perhaps Spring 3.0 will change this, but I have the sense that no matter what, there will always be some evidence of the fact that Spring MVC’s internals allow a wide range of configuration.

Converters and Formatters

In other frameworks, these concepts are often conflated into a single class. Stripes converters do one thing: convert from strings to objects (e.g. phone numbers, zip codes, etc). Whenever you need to turn an incoming request parameters into something more than a string, the converter is there to help.

Formatters work the other way, formatting objects into something that looks right on the screen as text. Formatters can support multiple format types, so that you can support displaying objects differently when necessary (e.g. phone with extension, zip code with a plus 4, etc).

These helpers are simple to write and test. All of the stripes tags will use them if they’re present, so it’s easy to affect how something — phone numbers for example — are displayed across the entire application.

Testing

There’s no reference to the servlet API in your Action classes, so it’s easy to write tests against the Java code within. Tests fall into two categories: very lightweight unit tests which only test what’s happening in your handler and slightly more heavy-weight tests which involve more of the stack (but not the servlet container).

The outline for a unit test goes something like this:

  1. Instantiate your Action
  2. Inject service stubs/mocks
  3. Use the public setters to specify values to necessary fields
  4. Invoke the handler
  5. Assert on the Resolution and the state of the Action (optionally your Stub)

Note that all of our Action tests fall into this category, even though you can test more of the stack (URL binding and validation).

What you can’t do — I haven’t seen any Java MVC provide this though — is write tests against the rendered markup of your views (a la Rails) without bringing up the servlet container.

Read more about testing with Stripes here.

Documentation and Community

The Stripes documentation is definitely not very complete or polished. Some of the documentation is out of date or non-existent (e.g. CleanURLs). The same goes for the examples. There is a decent book from Pragmatic Programmers press however, which I recommend if you’re interested in Stripes.

The good news is that despite all of this, it’s easy enough to find or figure out what you need without too much fuss. Piecing together examples, tutorials, bogs, documentation, etc ultimately gives you what you need. The Stripes source is small enough that you can rummage through to see how things work easily. It’s not as configurable as Spring MVC so the code is less abstract and a little easier to grok (though it’s not as elegant).

There is an active mailing list where the developers and other users help out with questions.

Conclusion

I think working with Stripes is a lot of fun and that we made the right decision to use it. I’d go as far as saying that we were at least as productive with it as we would have been with Spring MVC, and it’s likely we were more productive. I’d say that the authors have largely delivered on their goals. One of our front-end developers quickly dove right into building Actions without much help from other developers. While we had the occasional “Huh?!” moment while trying to figure out why something wasn’t working, they were few and far between.

In comparing it to Spring MVC, I think there’s a simplicity and elegance to Stripes that comes from it being just an MVC and it not having the same legacy as Spring. While Spring MVC has certainly evolved, it’s brought some of its crufty parts along with it. I’ll be keeping an eye on Spring MVC to see what’s in store with 3.0, and I hope to be proven wrong.

There’s no doubt that Stripes is in a niche as compared to many of the other web frameworks. The community is much smaller, and the development cycle much longer (last release was August 2008 and the one before that May 2007), which sometimes makes me wonder what Stripes future holds.

I’d say that one downside to Stripes, or any other framework that has a single backing Action per URI, is that there isn’t a great story for dealing with pages that aggregate a number of features and those features also show up on other pages. The problem is that we can’t rely on the Action to provide all of the reference data, so we have to rely on other mechanisms for fetching it (filters, interceptors, tags, etc). It’s not an issue with Stripes specifically, but all framework with the same approach.

For projects where this single primary controller per URI limitation isn’t a problem, I would definitely use Stripes again and I think it’s a framework Java developers should look into if they aren’t completely happy with whatever they’re using.

Also Worth Mentioning…

There are lots of other neat features in Stripes too: validation annotations and helper methods, stripes layout, flash scope, wizard forms, encrypted parameters, a JavaScriptResolution for serializing Java objects to JavaScript, etc. Open up the Stripes jar and start looking around.

Our application is using Spring as an IoC container for everything behind our Actions; to get handles to your Spring managed services we use a simple Stripes Interceptor which injects dependencies into Actions.

When using CleanURLs, you’ll want to use the DynamicMappingFilter, though there’s not much mention of it in the documentation. CleanURLs in Stripes 1.5.1 should be even more flexible (see STS-617).

We used Spring Security on our application found that Stripes and Spring Security play nicely together.

Spring 3.0 will include a few new features that are similar to features I really like in Stripes, including RESTy URLs, and tags for generating the URIs to controllers. The downside is that it’s still using the singleton model, which equates to controller handler methods with potentially lots of annotated parameters.

References

Updates: added reference to recently published Spring 3.0 MVC blog.

More fun with Java Concurrency: BlockingQueue

I’ve written about Java 5 concurrency in the past and I’ve recently had the opportunity to make use of another one of the concurrency constructs: the BlockingQueue.

The Problem

There’s a problem we’ve seen a few times in the last few years. From time to time, applications must import data from external systems and massage it into a form that is useful. Sometimes these feeds are streamed over the net while other times they’re in the form of massive text files. My current project has two such feeds that are imported on a weekly basis, the larger of the two rings in at around 10M entities.

Loading the data is pretty straightforward: entities are parsed from the source, a transient entity object is instantiated and the parsed values plugged in, entities are batched up and then persisted as a batch.

The Collaborators

The Parser is responsible for loading records one at a time from the source.

public interface Parser<T>
{
    boolean hasNextRecord();
    T nextRecord();
    void close();
}

The Persister is responsible for saving batches of entities.

public interface Persister<T>
{
    void initializeFeed();
    void insertBatch(List<T> entities);
    void finalizeFeed();
}

And the FeedLoader is the glue that pulls it all together, it coordinates parsing records into a batch and then triggering persisting the batches when they’re ready. It’s used like this, where 1000 is the batch size:

Parser<Wombat> parser = new WombatParser(inputStream);
Persister<Wombat> persister = new WombatPersister(dataSource);
FeedStats stats = new FeedLoader(parser, persister, 1000).loadData();

Now that the stage is set and we’ve covered the parts, we can get into what happens behind loadData().

Synchronous Parsing and Persistence

Our first cut of loadData() was a simple synchronous implementation:

public class FeedLoader<T>
{
    private Parser<T> parser;
    private Persister<T> persister;
    private int batchSize;
 
    public FeedLoader(Parser<T> parser, Persister<T> persister, int batchSize)
    {
        Validate.isTrue(batchSize > 0);
        this.parser = parser;
        this.persister = persister;
        this.batchSize = batchSize;
    }
 
    public FeedStats loadData()
    {
        persister.initializeFeed();
        List<T> entities = new ArrayList<T>(batchSize);
 
        while (parser.hasNextRecord())
        {
            entities.add(parser.nextRecord());
            if (entities.size() >= batchSize)
            {
                persister.insertBatch(entities);
                entities.clear();
            }
        }
 
        // Save the stragglers that didn't make it into the last batch.
        if (!entities.isEmpty())
        {
            persister.insertBatch(entities);
        }
 
        parser.close();
        persister.finalizeFeed();
        return new FeedStats(...);
    }
}

This worked well… we were able to process records at a throughput of about 3125 per second. After a bit of research I realized we were spending nearly as much time parsing records as we were persisting them. I also noticed that the load on the machine was pretty low during the import process. While there is a relationship between parsing and persisting, it seemed like there should be an easy way split the processes across multiple threads while keeping the code simple and readable.

Asynchronous processing with BlockingQueue and ExecutorService

Digging through java.util.concurrent, I came across BlockingQueue which is described as “A Queue that additionally supports operations that wait for the queue to become non-empty when retrieving an element, and wait for space to become available in the queue when storing an element.” Sounds like a great construct to bridge the gap between our Parser and Persister threads. The parser can add entities to the queue while the persister is pulling them off into batches. Let’s see what it looks like:

public class FeedLoader<T>
{
    private Parser<T> parser;
    private Persister<T> persister;
    private int batchSize;
    private boolean done = false;
 
    public FeedLoader(Parser<T> parser, Persister<T> persister, int batchSize)
    {
        Validate.isTrue(batchSize > 0);
        this.parser = parser;
        this.persister = persister;
        this.batchSize = batchSize;
    }
 
    public FeedStats loadData()
    {
        persister.initializeFeed();
 
        BlockingQueue blockingQueue = new ArrayBlockingQueue(batchSize * 2);
        try
        {
            ExecutorService executorService = Executors.newFixedThreadPool(2);
            // invokeAll() blocks until both tasks have completed
            executorService.invokeAll(
                asList(new ParserTask<T>(parser, blockingQueue),
                       new PersisterTask<T>(persister, blockingQueue)));
            executorService.shutdown();
        }
        catch (InterruptedException e)
        {
            log.error("Failed to load feed.", e);
            throw new RuntimeException("Failed to load feed.", e);
        }
 
        persister.finalizeFeed();
        return new FeedStats(...);
    }
 
    class ParserTask<T> implements Callable<Object>
    {
        Parser<T> parser;
        BlockingQueue<T> queue;
 
        ParserTask(Parser<T> parser, BlockingQueue<T> queue)
        {
            this.parser = parser;
            this.queue = queue;
        }
 
        public Object call()
        {
            while (parser.hasNextRecord())
            {
                try
                {
                    queue.put(parser.nextRecord());
                }
                catch (InterruptedException e)
                {
                    log.error("Failed to load feed.", e);
                    throw new RuntimeException("Failed to load feed.", e);
                }
            }
            parser.close();
            done = true; // Indicates that the parser is done.
            return null;
        }
    }
 
    class PersisterTask<T> implements Callable<Object>
    {
        Persister<T> persister;
        BlockingQueue<T> queue;
 
        PersisterTask(Persister<T> persister, BlockingQueue<T> queue)
        {
            this.persister = persister;
            this.queue = queue;
        }
 
        public Object call()
        {
            List<T> entities = new ArrayList<T>(batchSize);
 
            // "done" is set to false when the parser is done, at which point
            // all remaining entities will be in the queue.
            while (!done || !queue.isEmpty())
            {
                try
                {
                    entities.add(queue.take());
                    if (entities.size() >= batchSize)
                    {
                        persister.insertBatch(entities);
                        entities.clear();
                    }
                }
                catch (InterruptedException e)
                {
                    log.error("Failed to load feed.", e);
                    throw new RuntimeException("Failed to load feed.", e);
                }
            }
            if (!entities.isEmpty())
            {
                persister.insertBatch(entities);
            }
            return null;
        }
    }
}

By allowing the parser and persister to run concurrently using two threads, the feed loaded with a throughput of 4608 entities per second, nearly a 50% improvement over the single threaded version.

There are two caveats to the code as written above: creating an ExecutorService for each loadData() isn’t ideal; it’s best to configure one for the application and resuse it, and also is must be shutdown before the application quits. I’ve skimped on error handling, which is fine if the Parser and Persister implementations don’t throw exceptions.

Conclusion

The ExecutorService and BlockingQueue provide the tools to make this improvement easy while keeping the code pretty readable. As always, we should be striving for readability, so adding unnecessary concurrency is never a good idea. And your mileage may vary depending on many things, including the hardware, network, data, server load… so do some testing to measure the real improvement in production.

Even if you don’t end up using it, it’s fun to experiment with and learn about concurrency issues. There are scenarios where the smart application of concurrency constructs can yield fantastic benefits. Check our Greg Luck’s recent blog on Ehcache performance for an example.

Updates: added caveats and call to ExecutorService.shutdown(). Fixed a typo in the PersisterTask.

Make the things you do often fast and easy

Many of our projects are ‘greenfield’ and we have the opportunity to do things the way we like. By working on new projects every few months, as opposed to one project over the course of years, we have lots of opportunity to easily tweak and tune the way we do things. Not all of our projects are from scratch though (see Alon’s post about Rewrite or Rescue), so we sometimes end up dealing with years worth of history and crufty code. It’s safe to say that each time we roll onto one of these projects, there’s going to be some level of bewilderment regarding what developers deal with on a daily basis.

Maybe it’s because we have a special opportunity to optimize the hell out of our development process, or the fact that we’re all productivity junkies; regardless of the reason, we religiously embrace the tenant “Make the things you do often fast and easy”. It’s almost embarrassing to suggest that others don’t also subscribe to this simple notion, but — brace yourself — many do not. On a project that has history, not everyone has been there for every decision. In fact, many developers are at least relatively new and it’s somewhat customary to have a “it must be this way for a reason” attitude. After all, who would deliberately make something cumbersome without good reason?

When we start working on one of these projects, we dedicate time to do some serious spring cleaning and tackle the things that will cost us the most in terms of pain and productivity. The whole development team gets psyched about where we end up as it’s a significant improvement. Projects with a history usually have a fair bit of low-hanging fruit. Let’s discuss some of the things we see regularly.

Build Systems

Apparently few people like working on build scripts and when they do they have a habit of lowering their standards for quality of work. That’s obviously not literally true, but sometimes it seems that way. We’ve seen a number of beastly build systems that are slow because they’re doing things that aren’t necessary (extraneous jaring, copying, code generating, etc), they’re brittle and expensive to maintain, and full of dead code and duplicate target definitions… and they’re run many times every single work day. It’s true that most developers may be compiling code from their IDE and thus bypassing the command-line build, but it’s still run on the build server, by ops folks, and even by developers when they’re debugging why something works from the IDE but busts on the build server.

Guidelines for simplifying the build

  • Distill the build process down to the fastest, simplest steps that are necessary.
  • Eliminate duplicate and no-longer used dependencies; these files are being copied around and bundled for no reason (I’ve seen over 15 megs of unnecessary dependencies before).
  • When a project is split into multiple modules (and it should if it’s more than a few thousand lines), modules should be built in a consistent fashion using targets that are shared across modules.
  • Look for exceptions. When you see something special happening for a particular file type, file name, or modules, ask yourself why. Ask again. Strive to eliminate these special cases when possible, even if they seem trivial.
  • Build a single deployable (or deployables) for all environments by eliminating environment-specific build code and externalizing application configuration. (Use Spring? Check out this post on externalizing configuration with Spring.)
  • Look for unnecessary code generation steps; if generated code changes once a year then check it in and make regenerating a manual step.
  • If you generate code coverage data, make sure that it’s only created when it’s needed (e.g. a nightly build on the build server), not on every build.

So, we use Maven 2 for all of our Java projects. For sure, it has its share of rough edges (most of which are being fixed at a reasonable rate). But it recommends some very sound conventions and doesn’t provide any scripting functionality, so it’s harder to hack it to do anything too unorthodox (please don’t use the antrun plugin unless as an incremental step when moving from Ant to Maven). When you play ball by the Maven rules you’ll find your build much simpler and easier to maintain. It’s likely you’ll notice other emergent benefits to boot. For example, once you migrate to Maven you eliminate duplicate build configuration (both your command line build tool and IDE know how to compile your app — remember the DRY principle). IDEA, Eclipse (via m2eclipse), and NetBeans all support importing from and synchronize with Maven.

Some people use Buildr or Ant + Ivy, but either they don’t have the breadth of use (Buildr) or are more susceptible to writing nasty, unmaintainable build code (Ant). That’s why we use Maven.

Compile > Deploy > Make Changes > Deploy Development Cycle

Possibly more important than a simple and easy build, developers must be able to go through the compile, deploy, make changes, deploy cycle FAST (note that the compile, run tests, make changes, run tests cycle is also very important).

I remember working on an embedded system in 1998: a complex radio communications routing application written in C++ and deployed to custom hardware running the real-time operating system PSOS. The build and deploy cycle took about 30 minutes and there were only 10 hardware instances for 60 engineers; you had to sign up for time slot on real hardware. It was the epitome of unproductive as far as development environments go (and don’t even ask about debugging!). You’d think such things were completely in the past (luckily they mostly are), but they’re not completely. In the last 2 years I’ve seen applications that take 15 minutes to deploy.

It’s a drag when developers have to wait for these things to happen and it can totally destroy one’s rhythm, keeping developers from getting into the zone. What’s worse, it’s completely unnecessary with modern tools.

General Suggestions

  • Don’t drop down to the command-line; compile and deploy from your IDE (the IDE is your friend – master it).
  • Minimize steps for deploying changes to a running app:
    • Your IDE may support building on frame deactivation (IDEA does); check it out.
    • Look into the maven-jetty-plugin if you use Maven.
    • Run in debug mode so that code can be hot-swapped or invest in JavaRebel, which allows all sorts of code changes to take place without redeploying your application.
    • Deploy your application in exploded form; bundling a war or ear incurs unnecessary IO overhead.
  • Host your database either locally on your workstation or on a beefy database server on the same LAN. Remote databases are generally many times slower due to latency, even over fast connections.
  • Use JBoss? Consider migrating to Jetty or Tomcat. If that’s impossible, use the most stripped down profile (minimal, default, or all) that has what you need, or better yet, create a custom one which includes only what you need.
  • Minimize the amount of data needed in the database to run the application. The same goes for running tests: do whatever it takes to run your tests against an empty (or very close to) schema. Check out the Carbon Five DB Migration Project.
  • Don’t skimp on developer hardware. Buying the very fastest CPU isn’t going to be worth it, so aim for one or two models down from the fastest. Buy the fastest hard disk you can since development is generally IO bound (WD Raptor and the new VelociRaptor are awesome, consider SSDs if your coffers run deep). Avoid older CPU architectures (Pentium D), even when the clock speed (GHz) is faster. Lastly, don’t be shy with memory; 4GB isn’t too much for a developer machine.

In addition to these general recommendations, each individual application will have its own specific sources of inefficiency. Many real world applications depend on services provided by application servers and/or a commercial products: message queues (JMS, ActiveMQ, etc), distributed caches (memcached, coherence, etc), enterprise service buses, job schedulers, work flow engines, etc. It’s important that these services don’t get in the way of developing fast. Some of them can be run in a light-weight development mode. If you need to use one of these potentially heavyweight solutions, invest the time to minimize or eliminate any adverse effects to the development cycle.

What Else?

Some of the best improvements have nothing to do with the technical side of software development. Take a step back and look at what else is happening (or not happening) each day. There may be meetings which can be time boxed, consolidated, or eliminated all together. Take a look at collaboration between engineers, product managers, testers, support and operations. How long are developers waiting to have requirements-clarifying questions answered? Ask your whole team where they think things can be improved. Ask for feedback on a regular basis and allow it to help drive these improvements.

Survey of other activities that should be fast and easy

  • Running the automated test suite – < 10 minutes
  • Getting build results from continuous integration server – < 10 minutes
  • Pushing a build to staging/acceptance server – One click build and deploy
  • Create a new instance of a minimal database instance – Carbon Five DB Migration Project
  • Recreate production state in development for debugging
  • Story approval/acceptance – Continuous acceptance
  • Meetings – Timebox, Consolidate, Eliminate
  • Configure a new developer machine – Strive for zero configuration

Value Simplicity

Any intelligent fool can make things bigger, more complex and more violent. It takes a touch of genius and a lot of courage to move in the opposite direction. -Albert Einstein

There’s a theme underlying most of the solutions to these problems: simplicity. Complex systems don’t become complex and crufty overnight, they get that way one small step at a time. With each change to a system it’s important to recognize that the change will either add complexity or remove it. Complexity has a cost and it’s not to be taken lightly; make sure the benefit to each of the decisions that add complexity is worth that cost.

Conclusion

Making the things that people do often fast and easy can pay off geometrically as all developers benefit and regain a little more of their day (and sanity). In the end, it’s not just about shaving off seconds or minutes, though that’s a huge part of it; it’s about creating a development environment that lets the team do what’s really important: write awesome code to solve real problems. When the team dynamics, technical environment, and process are tuned just right, the overall benefit is greater than the sum of its parts.

Where have you seen changes in infrastructure, software, or process that’s resulted in a significant productivity bump?

Java Database Migrations

News: v0.9.9-m2 has been released!

A while back, I wrote to introduce the first incarnation of the Carbon Five Database Migration tools, a simple though powerful framework for applying discrete changes to a database and tracking which changes have been applied to a specific database. It was inspired by Rails’ Migration support.

We’ve made a number of changes in the v0.9.1 release. We adopted some of the improvements found in Rails 2.1 as well as feedback from our users. Here’s an overview of what’s changed:

  • New create, drop, and reset goals for maven plugin. Now you can create a new database, drop an existing one, or reset an existing database by dropping it, creating a new one and then migrating it. This is tested with MySQL and PostgreSQL.
  • Each applied migration is tracked in the database schema_version table (instead of just the last one). Also, when it was run and how long it took to run are now saved for each.
  • Validate goal now lists which migrations are pending in addition to whether the database is up to date.
  • Maven artifact ids have changed (migration -> db-migration, maven-migration-plugin -> maven-db-migration-plugin) and there’s been some restructuring in the core framework.
  • Maven plugin is configured a bit differently now; environments have been removed completely since maven supports a better solution out of the box: profiles.
  • Maven plugin now looks for migrations in src/main/db/migrations by default; alternate locations can be specified via the <migrationsPath/> element.
  • We now recommend using timestamps for migration versions instead of the NNN format, though any numerical character sequence will work.
  • Reworked the algorithm for determining which migrations to run to allow for a little more flexibility. Pending migrations aren’t determined by a single version number, they’re determined by comparing what is available to what has already been run. In conjunction with timestamp versions, developers won’t be stepping on each other’s migrations.
  • New and updated google code project and documentation.

As you can imagine, some of these changes aren’t backwards compatible. While we’re in pre-release (< v1.0) mode, we feel like it's more important to make the fundamental changes to build a solid foundation than to retain complete backwards compatibility. The release notes give some guidelines for upgrading.

Here’s a quick getting started guide for the maven-db-migration-plugin:

Step 1: Configure maven in your project’s pom.xml

21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
...
<build>
   ...
  <plugin>
    <groupId>com.carbonfive</groupId>
    <artifactId>maven-db-migration-plugin</artifactId>
    <version>RELEASE</version>
    <configuration>
      <url>jdbc:mysql://localhost/myapp_test</url>
      <username>dev</username>
      <password>dev</password>
    </configuration>
    <dependencies>
      <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.6</version>
      </dependency>
    </dependencies>
  </plugin>
</build>
...
<pluginRepositories>
    <pluginRepository>
        <id>c5-public-repository</id>
        <url>http://mvn.carbonfive.com/public</url>
    </pluginRepository>
</pluginRepositories>
...

Lines 29-31 configure the database connection (see the reference for more options).
Lines 35-39 specify the required dependency on our database driver.
Lines 43-48 adds the Carbon Five maven plugin repository.

Step 2: Create a migration script

In src/main/db/migrations, create a script using the format YYYYMMDDHHMMSS.sql (e.g. 20080830174515.sql). Example:

CREATE TABLE users (
  id INT PRIMARY KEY,
  email VARCHAR(255) NOT NULL,
  password VARCHAR(255),
  enabled BOOLEAN DEFAULT 'TRUE'
);

Step 3: Create the database

$ mvn db-migration:create

The supplied credentials must have the appropriate privileges, of course. If you’re not using MySQL or PostgreSQL, then do this step manually.

Step 4: Check the status of the database

$ mvn db-migration:validate

Your migration will be listed as pending and the database as not up-to-date.

Step 5: Migrate to the latest version

$ mvn db-migration:migrate

The pending migration will be applied to the database and logged in the schema_version table.

Want to learn more?

Check out the google code project page, the release notes and the sample applications.

In the near future, I’m going to look at supporting MS SQL Server and improving the SQL Script Runner. Thanks for all of the feedback and please keep it coming!

Christian

Database Testing with Spring 2.5 and DBUnit

Note: Version 0.9.1 of c5-test-support has been released.

We’ve been using DB Unit on our Java projects for years and the mechanics of how it’s used has evolved over time. I’ve recently spent some time making it work a little nicer for how we typically write database tests. What I’ve created makes using DBUnit on a project that is already using Spring and the testing support added in Spring 2.5 just a little easier through the application of convention and annotations.

In general, we’ve adopted the convention of loading data off the classpath from a flat dataset file named after the test located next to the test on the classpath. For example (in the maven standard directory structure):

  • src/test/java/com/acme/TripRepositoryTest.java – Java Test Code
  • src/test/resources/com/acme/TripRepositoryTest.xml – DB Unit Data Set for TripRepositoryTest

For most tests, the data set is loaded inside the test’s transaction and rolled back when the test completes so that nothing needs to be cleaned up (see Spring’s reference). For other tests — service or integration tests — the data is loaded outside of a transaction and must be cleared out manually. Most projects have a mix of both strategies and both should be easily supported.

When Spring 2.5 came out with its new testing framework, I threw together a custom TestExecutionListener that looks for test methods that are annotated with @DataSet, and when found, loads the data using DB Unit. Here’s a transaction-per-test example:

TripRepositoryImplTest.java – Example transaction-per-test Test Case

@ContextConfiguration(locations = {"classpath:applicationContext.xml"})
public class TripRepositoryImplTest extends AbstractTransactionalDataSetTestCase {
    @Autowired TripRepository repository;
 
    @Test
    @DataSet
    public void forIdShouldFindTrip() throws Exception {
        Trip trip = repository.forId(2);
        assertThat(trip, not(nullValue()));
    }
}

The high-level execution path for this example looks like:

  1. Inject dependencies (DependencyInjectionTestExecutionListener)
  2. Start transaction (TransactionalTestExecutionListener)
  3. Load dbunit data set from TripRepositoryImplTest.xml (DataSetTestExecutionListener) using the setup operation (default is CLEAN_INSERT)
  4. Execute test
  5. Optionally cleanup dbunit data using the tear down operation (default is NONE)
  6. Rollback transaction (TransactionalTestExecutionListener)

Here’s the trimmed down log output for this test:

INFO: Began transaction (1): transaction manager; rollback [true] (TransactionalTestExecutionListener.java:259)
INFO: Loading dataset from location 'classpath:/eg/domain/TripRepositoryImplTest.xml' using operation 'CLEAN_INSERT'. (DataSetTestExecutionListener.java:152)
INFO: Tearing down dataset using operation 'NONE', leaving database connection open. (DataSetTestExecutionListener.java:67)
INFO: Rolled back transaction after test execution for test context (TransactionalTestExecutionListener.java:279)

For this to work in its current incarnation, a single datasource must be available for lookup in the application context. One of the interesting details is what to do with the connection used to load the data. The framework assumes that if it’s a transactional connection it should be left open because whatever started the transaction should do the closing. When it’s non-transactional it’s closed after the dataset is loaded. This convention works well for how I typically write my database tests.

In addition to the @DataSet annotation, we must add the DataSetTestExecutionListener to the set of listeners that are applied to the test class. As in the above example, you can extend AbstractTransactionalDataSetTestCase which does this for you or you can specify the listener using the class-level annotation @TestExecutionListeners (see example). It’s important that the listener is triggered after the TransactionalTestExecutionListener.

If all test methods use the dataset, then the test class (or super class) can be annotated and every test will load the dataset. Also, if a different dataset should be loaded, the name of the resource can be specified in the annotation (e.g. @DataSet(“TripRepositoryImplTest-foo.xml”) or @DataSet(“classpath:/db/trips.xml”)). Lastly, the setup and teardown database operations can be overriden (e.g. @DataSet(setupOperation = “INSERT”, teardownOperation=”DELETE”)).

This functionality is part of the C5 Test Support package and is available in our maven repository. To use it, first add the C5 Public Maven repository to your pom.xml, and then add the necessary dependencies:

pom.xml

<repositories>
    <repository>
        <id>c5-public-repository</id>
        <url>http://mvn.carbonfive.com/public</url>
        <snapshots>
            <updatePolicy>always</updatePolicy>
        </snapshots>
    </repository>
</repositories>
...
<dependencies>
    <dependency>
        <groupId>org.dbunit</groupId>
        <artifactId>dbunit</artifactId>
        <version>2.2.3</version>
        <scope>test</scope>
    </dependency>
 
    <dependency>
        <groupId>com.carbonfive</groupId>
        <artifactId>test-support</artifactId>
        <version>0.6</version>
        <scope>test</scope>
    </dependency>
    ...
</dependencies>

Check out the sample application for details. It’s mavenized and utilizes an in-memory database. Just check it out of subversion, look over the code, and give it a run using your IDE or from the command-line (mvn install). I’d be psyched to hear what you think and of course, welcome comments and suggestions.

Resources: