Everything Else

Explorations in Go: A dupe checker in Go and Ruby

Jon Cooper · October 4th, 2011

I've recently started exploring the new(ish) programming language "Go". Go is intended to be a systems programming language and offers speed and low-level API along with some sweet features and the beginnings of a great standard library.

At Carbon Five we do most of our work in Ruby, JavaScript, Objective-C, and lately, node.js. I've really been enjoying Objective-C but realistically half the value is in the standard library, which is not public, which precludes its use in any kind of server environment to which we're likely to deploy (i.e. non-Mac). I also spent some time this year contributing C code to an open source project and remembering why I don't program in C, given the choice.

I'm exploring Go as a possible solution for times when I want to get close-to-the-metal and really control what's going on, but without having to reinvent the wheel. Example: having to write your own collection frameworks. Blehft.

I wrote a dupe checker in Ruby for a project recently and thought I'd write a Go port as an experiment. (In case you're curious about the motivation: tons of files get moved from a tree to a flat namespace, leading to collisions -- and confusion.)

I've written walkthroughs of the Ruby and Go versions below - please check it out.

The code

Source code is here: https://github.com/joncooper/go/tree/master/dupe

Argument parsing

Ruby

Longer than the Ruby idiom, for sure, but still a breeze compared to, say, doing in in straight C.

The 'flag' package that I've used here gets us some niceties, too, such as the ability to print a help message:

Filesystem traversal

Ruby

Reading Ruby code is a pleasure. Except for one mildly gnarly (yet transparent to Rubyists) bit, this is pretty darn self-documenting.

Even you ignore the fact that Go syntax looks a bit wonky compared to mainstream languages, this is still a bit confusing. It is, however, quite concise compared to a C version, again because of the kick-ass Go standard library.

I use Go's filepath.Walk in place of Ruby's Find.find. The function signature looks like this:

To use this function, we need something that implements the filepath.Visitor interface:

This interface defines two functions, which, if implemented, mean that the implementor implements the Visitor interface. Say that ten times fast. Or maybe don't. The point is that in Go, you don't make a claim that any given type implements an interface. You just do so, and then the (static) type inference system can perform safety (and sanity) checks at compile-time.

Aside aside, the func VisitDir simply returns a boolean letting the caller know whether or not to traverse into the directory in question, and the func VisitFile does something with a file found during the traversal.

http://golang.org/pkg/path/filepath/#Walk

MD5 hashing files

Ruby

Easy.

Pretty nice for a language aimed at systems programming.

One could of course do this in C by linking an external library. (You could also just exec 'md5sum'). The point is that MD5 is part of the Go standard library, so you don't have to.

Take note also of the optional precondition to the if conditional. This helps clarify the typical POSIX C monkey business like:

(Note: this code is unrelated to the Go MD5OfFile(), it's just an example of how error handling is done in C.)

Printing results

Ruby

Two aspects of Ruby that I really like are on display here: string interpolation and functional operations on collections.

Interpolating variables or code into strings with "#{}" is an example of the "make common tasks easy" mentality that makes programming in Ruby so nice.

I am of two minds regarding the common use of functional operations on collections in Ruby, for example:

For me, this code is extremely legible, and expresses my intent clearly. "Choose only the elements of this collection where the fullpaths array is >= length 2."

On the other hand, it is clearly less performant than directly iterating over the collection, and if you do a lot of this kind of thing on big collections, performance starts to drag.

I don't love the way that this looks compared to Ruby, but it is in the same ballpark in terms of expressivity. And it is vastly less horrible than doing this in C, even if you had a good implementation of a map and an iterator to work with, which you probably wouldn't.

Three quick notes: 1. the 'range' operator is an iterator; 2. assigning to _ means "throw this away"; and 3. applying the range operator to an array returns (index, value).

Benchmarking

I've applied a super craptastic benchmarking technique here: I just run each version three times.

On my home directory (49k-ish files):

Ruby

Conclusion

I'm definitely going to be writing more Go in the future. The syntax has grown on me and I like the standard library and language features quite a bit. I'm interested to do a quick port of some UNIX C stuff in order to see what an improvement it makes in terms of intelligibility and maintainability; that's actually a much more fair comparison than a Ruby app.

Explorations in Go: A dupe checker in Go and Ruby

The code

Argument parsing

Filesystem traversal

MD5 hashing files

Printing results

Benchmarking

Conclusion

Related Posts

Jon Cooper