Our client Viscape was cited in a USA Today article on travel to DC for the upcoming inauguration as showing a 25x increase in rental listings. Viscape provides vacation rental listings and travel destination planning for property owners and travelers who want to use the wisdom of crowds to maximize their vacation experience.
The San Francisco of Modern Art released the redesign of www.sfmoma.org on Monday. This redesign has been in the works for a couple years and is the result of a lot of hard work by the folks at SFMOMA, Hot Studio and of course Carbon Five. Congratulations to all involved.
SFMOMA.org is the latest site to be released using SmileMaker, our favorite web CMS.
Read more about this effort on the SFMOMA site.
TechCrunch is spreading the word about our client The Mechanical Zoo and their social search product Aardvark. I’m sure they’re helping the network grow. I certainly got a number of requests for invites to the beta as soon as the post was up.
We worked closely with TMZ to bootstrap the development of Aardvark. They are a fun and talented team working on a great product.
Or course we like the comment by one reader:
Technically it works surprisingly well for a new app in beta - they must have superior techies working on the details, not just marketing and sales guys - I am impressed so far and they are only at the beginning of the zoo alphabet (aa)!!!
We’re spending a couple days next week with developers at one of our clients to practice agile and more specifically test driven development. In preparation I’ve been exchanging emails with one of their leads on what he is hoping they will get out of this time. He’s a great developer who has recently been exposed to agile practices through his collaboration with Carbon Five. I think his comments are insightful and widely relevant so with his permission I am sharing them with you here.
<snip>
[What we want to accomplish is:] 1. Elucidate what it means to be agile, how it changes the way we think about problems, and show how that thinking is actually practical
…
* WRT #1, a very important and powerful thing I’ve learned from the last few months is to think about the measure of progress as user/customer “visible” artifacts. This really flies in the face of how many of us traditionally think about “progress”; typically progress is this piece of infrastructure or that feature that is required for some future thingy. Concentrating on “user”-visible artifacts gives concrete reward, ability to adapt to problems early, and doesn’t call into existence things that aren’t strictly necessary. It also requires trust in your ability to refactor (ruby is great at that!) and solve problems in the future (we’re all reasonably smart).
Typical objections include:
- “but I’m working on something that isn’t user-visible” — we should address the notion of “user” to be stakeholder, developer, etc. and “feature” to be a concrete artifact
- “but we need all these pieces of infrastructure are required for user-visible feature X!” — you can still decompose the feature into smaller, if not completely useful, pieces. Some of these pieces imply various pieces of infrastructure. By building your infrastructure gradually you always have something you can use *now*, and reduce the risk of ended up with something that’s not quite right.
- “but then where do I put things like design discussion, generating a test plan, and so on?” — some ahead of time design dicussion is useful and necessary, but specifics design decisions are *part* of the implementation of the feature. Testing is *not* a separate step; it’s the definition of what it means for something to be complete. Agile tends to wrap all these things into discrete micro-waterfall units.
- “but I’m building this experimental thing that doesn’t directly contribute to ‘user’ experience” — perhaps there’s another way to look at it, and perhaps it might make sense to think about what specific problem you’re trying to solve. If you can state the problem, you can state a story that implies that it is solved. If not, what are you really doing?
* “Embrace change” is also *really* important. In an environment like ours it’s not useful to get upset if requirements change. Agile methodologies have changing requirements as their primary assumption. Hence the “useful right now” and evolutionary process. [Interesting to note that we built the automated [system] like this, one story at a time, and came in _ahead_ of schedule]
* “Trust yourself” is important too; assume you can refactor and adjust later if you made the wrong decision. This is hard to do, but easier with practice.
* “Communication and collaboration” — In person is way more efficient, and these methodologies tend to favor it. If at an impasse, just start and see where you end up. That’s what we did. A common concern here is that it might be hard to coordinate several teams into a single release, but solutions exist.
* WRT agile practices, something I’ve learned to do in the last few months that is valuable is timeboxing. Totally underrated!
One further thought:
* We can’t afford to *not* take an agile approach (rapid small value-adding changes towards a goal) because 90% of the time when you’re a startup like us you probably won’t end up with (or even want to end up with) what you originally thought you wanted.
</snip>
We’re using jQuery for one of our current projects. Today I found myself in an IE situation that could be solved by using the prototype librarie’s absolutize method. I couldn’t find any equivilent implementation that I liked in jQuery so I went ahead and ported absolutize from prototype to jquery.
It can be used in the standard jQuery way, like so:
$('some-selector').absolutize()
Here’s the code:
jQuery.fn.absolutize = function() { return this.each(function() { var element = jQuery(this); if (element.css('position') == 'absolute') { return element; } var offsets = element.offset(); var top = offsets.top; var left = offsets.left; var width = element[0].clientWidth; var height = element[0].clientHeight; element._originalLeft = left - parseFloat(element.css("left") || 0); element._originalTop = top - parseFloat(element.css("top") || 0); element._originalWidth = element.css("width"); element._originalHeight = element.css("height"); element.css("position", "absolute"); element.css("top", top + 'px'); element.css("left", left + 'px'); element.css("width", width + 'px'); element.css("height", height + 'px'); return element; }); }
Many of our projects are ‘greenfield’ and we have the opportunity to do things the way we like. By working on new projects every few months, as opposed to one project over the course of years, we have lots of opportunity to easily tweak and tune the way we do things. Not all of our projects are from scratch though (see Alon’s post about Rewrite or Rescue), so we sometimes end up dealing with years worth of history and crufty code. It’s safe to say that each time we roll onto one of these projects, there’s going to be some level of bewilderment regarding what developers deal with on a daily basis.
Maybe it’s because we have a special opportunity to optimize the hell out of our development process, or the fact that we’re all productivity junkies; regardless of the reason, we religiously embrace the tenant “Make the things you do often fast and easy”. It’s almost embarrassing to suggest that others don’t also subscribe to this simple notion, but — brace yourself — many do not. On a project that has history, not everyone has been there for every decision. In fact, many developers are at least relatively new and it’s somewhat customary to have a “it must be this way for a reason” attitude. After all, who would deliberately make something cumbersome without good reason?
When we start working on one of these projects, we dedicate time to do some serious spring cleaning and tackle the things that will cost us the most in terms of pain and productivity. The whole development team gets psyched about where we end up as it’s a significant improvement. Projects with a history usually have a fair bit of low-hanging fruit. Let’s discuss some of the things we see regularly.
Build Systems
Apparently few people like working on build scripts and when they do they have a habit of lowering their standards for quality of work. That’s obviously not literally true, but sometimes it seems that way. We’ve seen a number of beastly build systems that are slow because they’re doing things that aren’t necessary (extraneous jaring, copying, code generating, etc), they’re brittle and expensive to maintain, and full of dead code and duplicate target definitions… and they’re run many times every single work day. It’s true that most developers may be compiling code from their IDE and thus bypassing the command-line build, but it’s still run on the build server, by ops folks, and even by developers when they’re debugging why something works from the IDE but busts on the build server.
Guidelines for simplifying the build
- Distill the build process down to the fastest, simplest steps that are necessary.
- Eliminate duplicate and no-longer used dependencies; these files are being copied around and bundled for no reason (I’ve seen over 15 megs of unnecessary dependencies before).
- When a project is split into multiple modules (and it should if it’s more than a few thousand lines), modules should be built in a consistent fashion using targets that are shared across modules.
- Look for exceptions. When you see something special happening for a particular file type, file name, or modules, ask yourself why. Ask again. Strive to eliminate these special cases when possible, even if they seem trivial.
- Build a single deployable (or deployables) for all environments by eliminating environment-specific build code and externalizing application configuration. (Use Spring? Check out this post on externalizing configuration with Spring.)
- Look for unnecessary code generation steps; if generated code changes once a year then check it in and make regenerating a manual step.
- If you generate code coverage data, make sure that it’s only created when it’s needed (e.g. a nightly build on the build server), not on every build.
So, we use Maven 2 for all of our Java projects. For sure, it has its share of rough edges (most of which are being fixed at a reasonable rate). But it recommends some very sound conventions and doesn’t provide any scripting functionality, so it’s harder to hack it to do anything too unorthodox (please don’t use the antrun plugin unless as an incremental step when moving from Ant to Maven). When you play ball by the Maven rules you’ll find your build much simpler and easier to maintain. It’s likely you’ll notice other emergent benefits to boot. For example, once you migrate to Maven you eliminate duplicate build configuration (both your command line build tool and IDE know how to compile your app — remember the DRY principle). IDEA, Eclipse (via m2eclipse), and NetBeans all support importing from and synchronize with Maven.
Some people use Buildr or Ant + Ivy, but either they don’t have the breadth of use (Buildr) or are more susceptible to writing nasty, unmaintainable build code (Ant). That’s why we use Maven.
Compile > Deploy > Make Changes > Deploy Development Cycle
Possibly more important than a simple and easy build, developers must be able to go through the compile, deploy, make changes, deploy cycle FAST (note that the compile, run tests, make changes, run tests cycle is also very important).
I remember working on an embedded system in 1998: a complex radio communications routing application written in C++ and deployed to custom hardware running the real-time operating system PSOS. The build and deploy cycle took about 30 minutes and there were only 10 hardware instances for 60 engineers; you had to sign up for time slot on real hardware. It was the epitome of unproductive as far as development environments go (and don’t even ask about debugging!). You’d think such things were completely in the past (luckily they mostly are), but they’re not completely. In the last 2 years I’ve seen applications that take 15 minutes to deploy.
It’s a drag when developers have to wait for these things to happen and it can totally destroy one’s rhythm, keeping developers from getting into the zone. What’s worse, it’s completely unnecessary with modern tools.
General Suggestions
- Don’t drop down to the command-line; compile and deploy from your IDE (the IDE is your friend - master it).
- Minimize steps for deploying changes to a running app:
- Your IDE may support building on frame deactivation (IDEA does); check it out.
- Look into the maven-jetty-plugin if you use Maven.
- Run in debug mode so that code can be hot-swapped or invest in JavaRebel, which allows all sorts of code changes to take place without redeploying your application.
- Deploy your application in exploded form; bundling a war or ear incurs unnecessary IO overhead.
- Host your database either locally on your workstation or on a beefy database server on the same LAN. Remote databases are generally many times slower due to latency, even over fast connections.
- Use JBoss? Consider migrating to Jetty or Tomcat. If that’s impossible, use the most stripped down profile (minimal, default, or all) that has what you need, or better yet, create a custom one which includes only what you need.
- Minimize the amount of data needed in the database to run the application. The same goes for running tests: do whatever it takes to run your tests against an empty (or very close to) schema. Check out the Carbon Five DB Migration Project.
- Don’t skimp on developer hardware. Buying the very fastest CPU isn’t going to be worth it, so aim for one or two models down from the fastest. Buy the fastest hard disk you can since development is generally IO bound (WD Raptor and the new VelociRaptor are awesome, consider SSDs if your coffers run deep). Avoid older CPU architectures (Pentium D), even when the clock speed (GHz) is faster. Lastly, don’t be shy with memory; 4GB isn’t too much for a developer machine.
In addition to these general recommendations, each individual application will have its own specific sources of inefficiency. Many real world applications depend on services provided by application servers and/or a commercial products: message queues (JMS, ActiveMQ, etc), distributed caches (memcached, coherence, etc), enterprise service buses, job schedulers, work flow engines, etc. It’s important that these services don’t get in the way of developing fast. Some of them can be run in a light-weight development mode. If you need to use one of these potentially heavyweight solutions, invest the time to minimize or eliminate any adverse effects to the development cycle.
What Else?
Some of the best improvements have nothing to do with the technical side of software development. Take a step back and look at what else is happening (or not happening) each day. There may be meetings which can be time boxed, consolidated, or eliminated all together. Take a look at collaboration between engineers, product managers, testers, support and operations. How long are developers waiting to have requirements-clarifying questions answered? Ask your whole team where they think things can be improved. Ask for feedback on a regular basis and allow it to help drive these improvements.
Survey of other activities that should be fast and easy
- Running the automated test suite - < 10 minutes
- Getting build results from continuous integration server - < 10 minutes
- Pushing a build to staging/acceptance server - One click build and deploy
- Create a new instance of a minimal database instance - Carbon Five DB Migration Project
- Recreate production state in development for debugging
- Story approval/acceptance - Continuous acceptance
- Meetings - Timebox, Consolidate, Eliminate
- Configure a new developer machine - Strive for zero configuration
Value Simplicity
Any intelligent fool can make things bigger, more complex and more violent. It takes a touch of genius and a lot of courage to move in the opposite direction. -Albert Einstein
There’s a theme underlying most of the solutions to these problems: simplicity. Complex systems don’t become complex and crufty overnight, they get that way one small step at a time. With each change to a system it’s important to recognize that the change will either add complexity or remove it. Complexity has a cost and it’s not to be taken lightly; make sure the benefit to each of the decisions that add complexity is worth that cost.
Conclusion
Making the things that people do often fast and easy can pay off geometrically as all developers benefit and regain a little more of their day (and sanity). In the end, it’s not just about shaving off seconds or minutes, though that’s a huge part of it; it’s about creating a development environment that lets the team do what’s really important: write awesome code to solve real problems. When the team dynamics, technical environment, and process are tuned just right, the overall benefit is greater than the sum of its parts.
Where have you seen changes in infrastructure, software, or process that’s resulted in a significant productivity bump?
Java has been the tool of choice for enterprise application development for many organizations for over 10 years. We are seeing more and more applications that are aging and suffering for it. We have worked with several recent clients to breathe new life in to these legacy applications so they can evolve and grow with the business they support.
This is the first of a series of articles about improving the quality of existing software and the processes that produce it.
Why Not Rewrite
We’ve all encountered software that we would rather just rewrite than try to fix. The problem may be poor design, poor implementation, lack of tests, bugs or any of a host of other software ills. We see ahead of us the pain of working with a poor quality system and the risks to schedule and quality of end product it creates.
At Carbon Five, we have participated in many successful projects to rewrite applications for our clients. Sometimes starting with a clean slate and all the lessons learned from a previous effort is exactly the right approach. Often it is not.
In our experience the greatest obstacle to a rewrite effort is that the existing system is the only accurate record of the requirements for the system. Applications evolve over years to meet the changing needs of a organization and its users. Many small decisions are made and captured in the functioning application only. Rewriting to reproduce the features of an existing system can be a very difficult effort to define and scope.
Usually the troubled application will continue to be use while a rewrite effort is underway. Often support, maintenance and feature development will need to continue on that application even while a rewrite is under way. A rewrite will compete for development resources and will be chasing new feature development on the existing application.
Not everything in the troubled application is rotten. There are often good pieces, or at least components that are reliable and well understood. It would be great to not have to rewrite them.
How can you be sure that a rewritten application will be so much better than the one it is replacing? All too often, the source of poor software quality is poor process and practices. Unless you fix those problems it is not worth embarking on a new software development effort.
How to Rescue
Sometimes a decision to rewrite is made because rescuing an application seems too daunting. Where do you start? How long will it take? Here are some high level thoughts from our experiences with our customers. We will get in to more detail on specifics in later articles.
Set Short Term Goals
Be sure to set demonstrable short term goals with meaningful names. When your job is to make the system “better” your job is never done. You need discrete goals with names that you can finish, demonstrate, release and feel good about before moving on. Easily communicated goals help to show management that the effort is progressing successfully and help your team build confidence in an often daunting task.
On one of our projects we set a goal of having a unit test in place for a component in our service layer. “Service Test” would be a fine name for that goal. In a system with no testing support, this was a big task that once completed opened the door for much easier testing.
Don’t Let “Refactor” Become a Dirty Word
A lot of the work you will do when rescuing an application can be characterized as refactoring:
Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior.
Martin Fowler, http://www.refactoring.com
Don’t let “refactor” become a dirty word. It’s too easy to say, “Yesterday I worked on refactoring our persistence layer. Today I’m going to keep working on that.” Again, come up with a name or description for the work you are doing that enables others to understand what you are doing and for you to describe progress through that task. Refactoring is one activity of many that you will engage in during this process. Avoid using “refactor” in the names of your short term goals.
Introduce Testing
A common attribute of many struggling applications is conspicuous lack of automated testing and the design and infrastructure to support it. We have found that the many changes required to support and write automated unit and integration tests are exactly the changes an application needs to get healthy. They include:
- Automating and streamlining build systems
- Decoupling application components
- Minimizing dependencies on application server features
- Building test data sets
- Continuous integration
- Building a team interest in quality
Each of these items is challenging to implement. You can’t do them all at once yet they are dependent on each other. Fortunately there are good open tools and practices available that provide a helpful bootstrap including Maven for convention-based builds, EasyMock to test code with lots of dependencies, Spring for component management outside of an application server, and CruiseControl for continuous integration (not to ignore the excellent almost-free options like TeamCity and Bamboo).
Introducing testing to a legacy application with no support for it is a daunting challenge and slow going at first. We’ve had good success in the past by first cleaning up the build and adding the ability to run tests, then finding a place in the application to break apart components and wedge a first test in. With a running test, you have something to automate so you can get continuous integration running. You also have a lot more insight into the architectural changes that will both improve the health of the application and make it easier to add more tests.
We have a lot more to say about this topic and specific experiences to share in coming articles.
A Culture of Quality
Another common attribute of a struggling application is that the team maintaining and developing the application is in continuous firefighting mode. They’ve lost hope of making fundamental improvements to their system and on a daily basis are fixing emergency bugs and hacking in features with their fingers crossed that they are not breaking something else.
Teams working in this mode collaborate poorly with product management. Product management feels that the developers are resistant to new features. Getting releases out is painful and unpredictable because defect rates are high.
These teams also collaborate poorly with each other. Developers tend to become isolated from each other as sole maintainers of one application component or another. They do not talk about how to make things better and become resigned to living with the status quo.
To be successful in a rescue mission or even a rewrite, you have to turn the firefighting culture into one where developers value quality and work for it daily. They should be excited to make things better and be engaging each other with ideas and practices to get there.
Again, this can be a very difficult effort. Sometimes it requires dramatic measures. In our experience this includes:
- Changing a workspace to remove barriers to casual conversation
- Relocating developers to the same physical location
- Hiring new blood and firing those resistant to change
- Pair programming
- Book clubs and study groups
These changes are often the most painful to make. As a consulting company Carbon Five can advocate for these changes, teach pair programming and run a study group, but the real changes have to come from within. We have seen the most success in making these changes when there is an commitment to improve from the business and champions of this effort in management.
News: v0.9.3 has been released!
A while back, I wrote to introduce the first incarnation of the Carbon Five Database Migration tools, a simple though powerful framework for applying discrete changes to a database and tracking which changes have been applied to a specific database. It was inspired by Rails’ Migration support.
We’ve made a number of changes in the v0.9.1 release. We adopted some of the improvements found in Rails 2.1 as well as feedback from our users. Here’s an overview of what’s changed:
- New create, drop, and reset goals for maven plugin. Now you can create a new database, drop an existing one, or reset an existing database by dropping it, creating a new one and then migrating it. This is tested with MySQL and PostgreSQL.
- Each applied migration is tracked in the database schema_version table (instead of just the last one). Also, when it was run and how long it took to run are now saved for each.
- Validate goal now lists which migrations are pending in addition to whether the database is up to date.
- Maven artifact ids have changed (migration -> db-migration, maven-migration-plugin -> maven-db-migration-plugin) and there’s been some restructuring in the core framework.
- Maven plugin is configured a bit differently now; environments have been removed completely since maven supports a better solution out of the box: profiles.
- Maven plugin now looks for migrations in src/main/db/migrations by default; alternate locations can be specified via the <migrationsPath/> element.
- We now recommend using timestamps for migration versions instead of the NNN format, though any numerical character sequence will work.
- Reworked the algorithm for determining which migrations to run to allow for a little more flexibility. Pending migrations aren’t determined by a single version number, they’re determined by comparing what is available to what has already been run. In conjunction with timestamp versions, developers won’t be stepping on each other’s migrations.
- New and updated google code project and documentation.
As you can imagine, some of these changes aren’t backwards compatible. While we’re in pre-release (< v1.0) mode, we feel like it's more important to make the fundamental changes to build a solid foundation than to retain complete backwards compatibility. The release notes give some guidelines for upgrading.
Here’s a quick getting started guide for the maven-db-migration-plugin:
Step 1: Configure maven in your project’s pom.xml
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | ... <build> ... <plugin> <groupId>com.carbonfive</groupId> <artifactId>maven-db-migration-plugin</artifactId> <version>RELEASE</version> <configuration> <url>jdbc:mysql://localhost/myapp_test</url> <username>dev</username> <password>dev</password> </configuration> <dependencies> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.6</version> </dependency> </dependencies> </plugin> </build> ... <pluginRepositories> <pluginRepository> <id>c5-public-repository</id> <url>http://mvn.carbonfive.com/public</url> </pluginRepository> </pluginRepositories> ... |
Lines 29-31 configure the database connection (see the reference for more options).
Lines 35-39 specify the required dependency on our database driver.
Lines 43-48 adds the Carbon Five maven plugin repository.
Step 2: Create a migration script
In src/main/db/migrations, create a script using the format YYYYMMDDHHMMSS.sql (e.g. 20080830174515.sql). Example:
CREATE TABLE users ( id INT PRIMARY KEY, email VARCHAR(255) NOT NULL, password VARCHAR(255), enabled BOOLEAN DEFAULT 'TRUE' );
Step 3: Create the database
$ mvn db-migration:create
The supplied credentials must have the appropriate privileges, of course. If you’re not using MySQL or PostgreSQL, then do this step manually.
Step 4: Check the status of the database
$ mvn db-migration:validate
Your migration will be listed as pending and the database as not up-to-date.
Step 5: Migrate to the latest version
$ mvn db-migration:migrate
The pending migration will be applied to the database and logged in the schema_version table.
Want to learn more?
Check out the google code project page, the release notes and the sample applications.
In the near future, I’m going to look at supporting MS SQL Server and improving the SQL Script Runner. Thanks for all of the feedback and please keep it coming!
Christian
We’ve been using DB Unit on our Java projects for years and the mechanics of how it’s used has evolved over time. I’ve recently spent some time making it work a little nicer for how we typically write database tests. What I’ve created makes using DBUnit on a project that is already using Spring and the testing support added in Spring 2.5 just a little easier through the application of convention and annotations.
In general, we’ve adopted the convention of loading data off the classpath from a flat dataset file named after the test located next to the test on the classpath. For example (in the maven standard directory structure):
- src/test/java/com/acme/TripRepositoryTest.java - Java Test Code
- src/test/resources/com/acme/TripRepositoryTest.xml - DB Unit Data Set for TripRepositoryTest
For most tests, the data set is loaded inside the test’s transaction and rolled back when the test completes so that nothing needs to be cleaned up (see Spring’s reference). For other tests — service or integration tests — the data is loaded outside of a transaction and must be cleared out manually. Most projects have a mix of both strategies and both should be easily supported.
When Spring 2.5 came out with its new testing framework, I threw together a custom TestExecutionListener that looks for test methods that are annotated with @DataSet, and when found, loads the data using DB Unit. Here’s a transaction-per-test example:
TripRepositoryImplTest.java - Example transaction-per-test Test Case
@ContextConfiguration(locations = {"classpath:applicationContext.xml"}) public class TripRepositoryImplTest extends AbstractTransactionalDataSetTestCase { @Autowired TripRepository repository; @Test @DataSet public void forIdShouldFindTrip() throws Exception { Trip trip = repository.forId(2); assertThat(trip, not(nullValue())); } }
The high-level execution path for this example looks like:
- Inject dependencies (DependencyInjectionTestExecutionListener)
- Start transaction (TransactionalTestExecutionListener)
- Load dbunit data set from TripRepositoryImplTest.xml (DataSetTestExecutionListener) using the setup operation (default is CLEAN_INSERT)
- Execute test
- Optionally cleanup dbunit data using the tear down operation (default is NONE)
- Rollback transaction (TransactionalTestExecutionListener)
Here’s the trimmed down log output for this test:
INFO: Began transaction (1): transaction manager; rollback [true] (TransactionalTestExecutionListener.java:259) INFO: Loading dataset from location 'classpath:/eg/domain/TripRepositoryImplTest.xml' using operation 'CLEAN_INSERT'. (DataSetTestExecutionListener.java:152) INFO: Tearing down dataset using operation 'NONE', leaving database connection open. (DataSetTestExecutionListener.java:67) INFO: Rolled back transaction after test execution for test context (TransactionalTestExecutionListener.java:279)
For this to work in its current incarnation, a single datasource must be available for lookup in the application context. One of the interesting details is what to do with the connection used to load the data. The framework assumes that if it’s a transactional connection it should be left open because whatever started the transaction should do the closing. When it’s non-transactional it’s closed after the dataset is loaded. This convention works well for how I typically write my database tests.
In addition to the @DataSet annotation, we must add the DataSetTestExecutionListener to the set of listeners that are applied to the test class. As in the above example, you can extend AbstractTransactionalDataSetTestCase which does this for you or you can specify the listener using the class-level annotation @TestExecutionListeners (see example). It’s important that the listener is triggered after the TransactionalTestExecutionListener.
If all test methods use the dataset, then the test class (or super class) can be annotated and every test will load the dataset. Also, if a different dataset should be loaded, the name of the resource can be specified in the annotation (e.g. @DataSet(”TripRepositoryImplTest-foo.xml”) or @DataSet(”classpath:/db/trips.xml”)). Lastly, the setup and teardown database operations can be overriden (e.g. @DataSet(setupOperation = “INSERT”, teardownOperation=”DELETE”)).
This functionality is part of the C5 Test Support package and is available in our maven repository. To use it, first add the C5 Public Maven repository to your pom.xml, and then add the necessary dependencies:
pom.xml
<repositories> <repository> <id>c5-public-repository</id> <url>http://mvn.carbonfive.com/public</url> <snapshots> <updatePolicy>always</updatePolicy> </snapshots> </repository> </repositories> ... <dependencies> <dependency> <groupId>org.dbunit</groupId> <artifactId>dbunit</artifactId> <version>2.2.3</version> <scope>test</scope> </dependency> <dependency> <groupId>com.carbonfive</groupId> <artifactId>test-support</artifactId> <version>0.6</version> <scope>test</scope> </dependency> ... </dependencies>
Check out the sample application for details. It’s mavenized and utilizes an in-memory database. Just check it out of subversion, look over the code, and give it a run using your IDE or from the command-line (mvn install). I’d be psyched to hear what you think and of course, welcome comments and suggestions.
Resources:
- C5 Test Support Source: http://svn.carbonfive.com/public/carbonfive/test-support/trunk
- C5 Test Support Maven Home: http://mvn.carbonfive.com/public/com/carbonfive/test-support/0.6/
- Sample Application: http://svn.carbonfive.com/public/christian/spring-dbunit-test-extension/trunk
Every now and then you’ll work on something that needs to handle requests from multiple concurrent threads in a special way. I say “special way” because in a web application, everything needs to handle being executed concurrently and there are a slew of techniques used to handle this (prototypes, thread locals, stateless services, etc). Here’s an example of what I mean by “special”…
On my current project, we have a queue of articles that need human-user attention (i.e. editorial moderation). Each article must be doled out to only one moderator and there are multiple instances of the web application servicing requests in the cluster. Imagine tens of thousands of articles per day and a team of moderators churning through them. We can’t rely on Java synchronization because it only works within the JVM instance, not across server instances.
The simplified version of the service interface we’re working on looks like this:
ArticleService.java - Example service interface
public interface ArticleService { Article findNextArticleForModeration(); }
What makes this interesting is that we must ensure that the service doesn’t hand out the same Article to more than one user. This is impossible to assert using a single thread. We’ve all been told that multiple threads and automated testing don’t mix. It’s generally true and should be avoided if at all possible, but in some cases it’s the only way we can truly assert specific behavior. I’ve found a pretty simple way to do this type of testing in a reliable, consistent, and non-disruptive manner. Despite the fact that the technique leverages Java 1.5 built-in concurrency utilities, most of the engineers who have seen it are surprised and weren’t aware that such testing was so easy to implement.
Given the above service interface, here’s a test that will assert that no single article is given out to more than one invoker of the method findNextArticleForModeration(). The scenario we’re simulating is 10 users feverishly moderating a queue of 250 articles as quickly as possible.
ArticleServiceImplTest.java - Test to invoke service concurrently
... public void findNextArticleForModerationStressTest() throws Exception { final int ARTICLE_COUNT = 250; final int THREAD_COUNT = 10; // Create test data and callable tasks // Set<Article> testArticles = new HashSet<Article>(); Collection<Callable<Article>> tasks = new ArrayList<Callable<Article>>(); for (int i = 0; i < ARTICLE_COUNT; i++) { // Test data testArticles.add(new Article()); // Tasks - each task makes exactly one service invocation. tasks.add(new Callable<Article>() { public Article call() throws Exception { return articleService.findNextArticleForModeration(); } }); } articleService.createArticles(testArticles); // Execute tasks // ExecutorService executorService = Executors.newFixedThreadPool(THREAD_COUNT); // invokeAll() blocks until all tasks have run... List<Future<Article>> futures = executorService.invokeAll(tasks); assertThat(futures.size(), is(ARTICLE_COUNT)); // Assertions // Set<Long> articleIds = new HashSet<Long>(ARTICLE_COUNT); for (Future<Article> future : futures) { // get() will throw an exception if an exception was thrown by the service. Article article = future.get(); // Did we get an article? assertThat(article, not(nullValue())); // Did the service lock the article before returning? assertThat(article.isLocked(), is(true)); // Is the article id unique (see Set.add() javadoc)? assertThat(articleIds.add(article.getId()), is(true)); } // Did we get the right number of article ids? assertThat(articleIds.size(), is(ARTICLE_COUNT)); } ...
The test starts off by creating 250 test articles to be moderated. It also creates 250 ‘tasks’, each designed to make a single service invocation of findNextArticleForModeration(). The real magic happens in Executors.newFixedThreadPool() and executorService.invokeAll(). The first creates a new ExecutorService backed by a thread pool of the specified size. This is a generic ExecutorService that is designed to churn through tasks using all of the threads in the pool. invokeAll blocks until every task has finished executing. In this test, 10 threads will rip through 250 tasks, each making a single call to our service and capturing the result of that call. Each task execution results in a Future, which is a handle to the results of the task (and more).
Iterating over each resulting future, we make several assertions. The most important one is the last, where we assert that every task is given a unique Article. Thanks to the natural semantics of Set, this is easy to do in an elegant way. Another useful, though unexpected, feature is that if an exception occurs during the task execution, an ExecutionException will be thrown when get() is called on the corresponding Future. If our service fails for some reason, the test will fail because no exceptions are expected.
This technique makes simulating a multi-threaded environment in a test easy and readable. It’s important to only use this technique when it’s really necessary. The resulting test is more of an integration test than a unit test, and its run time is orders of magnitude more than a unit test, so overuse of the technique will artificially inflate the time it takes to runs the tests. After I’ve finished working on the component under test, I will reduce the test-data size and thread count to a level that the test still provides value, but is no longer a stress test (e.g. 10 articles and 2 threads). The next time the component is being worked on, the developer can crank up the values and run the tests to be confident that the behavior isn’t broken.
The complete source for a working example of this technique is available here. You’ll need Maven (or IntelliJ IDEA 7.x) to build and run the test. By default, the tests run against an in-memory H2Database instance, but if you look at application.properties you’ll see configurations for PostgreSQL and MySQL as well.
Happy testing!

