Development

Composing Data Pipelines: (Mostly) Stateless Web Applications in Clojure

Erin Swenson-Healey · December 14th, 2014

I describe building an application in a functional style as the act of composing many smaller, context-free functions into pipelines of data transformations which map from system inputs to outputs.

Compojure (a Clojure web application library similar to Bottle and Sinatra) represents this input as a simple, immutable hash map which is transformed through a pipeline of middleware functions to a new hash map representing an HTTP response. The code of the resulting application resembles one in which we’re applying a mapping function to values in a static collection rather than ones in which we’re working with HTTP requests. By favoring expressions over statements, immutable data and functions over mutable objects and methods, our web application will share many of the positive characteristics of its static sequence-mapping cousins – in terms of simplicity (we can think about each transformation step independently from others) and robustness (our functions are small; their behavior easy to reason about and test).

In this article, we’ll see how we can easily extend the provided Compojure pipeline with our own functional transformations, mapping requests to RDBMS database queries and back to responses. We’ll be building a headless, JSON-speaking web service that will allow clients to HTTP POST to “/notes” a JSON encoded string containing property “text” (a string) representing a note to be created in the database. The client should receive their note back with a new property “id” (an integer) added, as a JSON-encoded string.

ECHOING CLIENT INPUT

We’ll start with the following Compojure application:

If we run the app and issue an HTTP POST to the “/notes” path, we’ll get back exactly what we sent:

What we’ve done is created an application that simply maps system inputs to outputs:

ADDING MIDDLEWARE

To be able to work with JSON (both incoming and outgoing), we need to plug the wrap-json-response and wrap-json-body middleware functions into our application’s request-handling pipeline:

The functions wrap-json-response and wrap-json-body are higher-order functions that accept our original request-handler (now named router) and return new functions that transform whose inputs and outputs will serialize/deserialize JSON. Our application pipeline now looks a little more interesting:

ADDING VALIDATION

Adding validation rules to our system doesn’t mean we have to break out of the pipeline-of-functions paradigm. Taking inspiration from Haskell’s Either type, we can represent the result of a function that could fail as a vector, with the presence of a nil at the vector’s head to represent a failure (the error message will be at the head of the tail) and a non-nil value at the head to represent a success. For instance:

Composing validators is straightforward:

…and can be simplified further using Adam Bard’s err->> macro:

The resulting validator can be plugged into our application pipeline’s post-note-handler function, along with a convenience function that represents our response hash map as an HTTP 200 response:

Our application’s request-handling pipeline now includes our validator – but the thread of execution hasn’t gotten any harder to follow.

CONNECTING TO A DATABASE

So far, our application maintains no state across HTTP requests; once a request has been echoed back to the API client, it is lost. We’ll make it more useful by saving the requested note to a database and returning to the API client their original note plus the unique identifier returned from our database at insertion time. We’ll see that, in spite of introducing a big hunk of mutable state to our system, our application will be able to retain its original data-in, data-out flow.

To connect to our database we’ll be using Taylor Lapeyre’s oj, a thin wrapper over the Clojure JDBC library that fits nicely into our application’s pipeline. I’ll create a function create-note that maps a string of text to the execution of an insert-query and (ultimately) the newly-generated id. We’ll follow the pattern established by our validator, representing the result of a failable operation using a vector:

We can now plug the final mapping function into our pipeline:

…and visualize it in our diagram:

SUMMARY

In the end, we’ve built an easy-to-understand pipeline of functions that maps an inbound HTTP request to a call to our database and back to an HTTP response. Adding new features to our application is a simple matter of plugging into the transformation pipeline. The flow of data through our application is easy to understand; it goes in only one direction (the result of function a goes as the input to function b whose result is used as function c’s input, et cetera). By relying on immutable data and (mostly) pure functions, we’ve been able to create a system whose subcomponents (ok, bad-request, internal-error, validate-note, ensure-text-presence, ensure-text-length – and so forth) easy to test and whose behavior is easy to reason about.

NOTES

It must be noted that I’ve completely stolen this validation approach from Adam Bard. For those of you interested in Control.Monad.Error-style validation in Clojure, do check out his excellent blog post ‘”Good Enough” error handling in Clojure, found here.
For the source to a working version of this application and installation procedures, look here.