Don’t Fear the Rebase: Git Garbage Collection and You

Chris Svenningsen ·

At Carbon Five, we work with developers of all experience levels. One source of fear and uncertainty I’ve seen at all levels is Git, the primary source control system used by our teams. Fear of losing work due to mishandled merge resolution, resetting branches or interactive rebasing is keeping developers from using some of the more powerful aspects of the tool. I believe this fear to be unwarranted and hope to show that it’s much harder to truly lose code in Git than most would guess. Here’s the high-level overview of what you need to know:

  1. If you want to keep any bit of code, commit or stash it
  2. By default, anything you commit will be accessible for at least 30 days and anything you stash for two weeks
  3. Merging and rebasing are always safe

Commit To A Git-filled Lifestyle

I imagine you read the phrase, “if you want to keep any bit of code, commit or stash it” and thought, “Of course I understand this – it’s the simplest concept of source control! Do you think I’ve been creating commit-less Git repos?”

No, I do not. The meaning of this point runs a little deeper. As soon as you commit or stash any code, Git will begin tracking it and assign it a SHA-1 hash based on the included changes and some headers (much more info here). As long as you can get your hands on the SHA for a particular commit or stash, you can access those changes at any point before Git’s garbage collection sweeps it up. The default expiration is 30 days for commits that are “unreachable” (not part of any current branch, including commits that were later amended, reset or changed during a rebase) and two weeks for “loose objects” (stashes fall under this designation, whether you’ve already applied them or not). More details about this configuration can be found in the docs for git-gc.

Hide and Seek

Hey look, I found a commit.

Knowing your commits and stashes are available is only part of the equation. You’ll need a way to find your lost misplaced code, or, more specifically, the related SHAs. This can be handled on a case-by-case basis.

First, let’s say you’ve just committed some very important work:

$ git show
9259359 - (HEAD, master) Add the work. (62 seconds ago) <Chris Svenningsen>
diff --git a/my-work.rb b/my-work.rb
new file mode 100644
index 0000000..edfb476
--- /dev/null
+++ b/my-work.rb
@@ -0,0 +1,6 @@
+a = 1
+b = 2
+
+c = a + b

and then you realize you left out something, so you make some changes and amend that commit:


// do coding stuff
$ git commit -a --amend

$ git show
3c53098 - (HEAD, master) Add the work. (44 seconds ago) <Chris Svenningsen>
diff --git a/my-work.rb b/my-work.rb
new file mode 100644
index 0000000..115f4e9
--- /dev/null
+++ b/my-work.rb
@@ -0,0 +1,7 @@
+a = 1
+b = 3
+d = 5
+
+c = a + b + d

Unfortunately, while making that change, you also overrode the value of b. Nice job, now we’ll never know the original value! Unless, of course, we had some way to find the original SHA.

Enter git reflog. The reflog lists every time a reference (HEAD, a branch name, a remote) changed. Think of it as a timeline of your actions. Let’s run it and see what we come up with:

$ git reflog
3c53098 HEAD@{0}: commit (amend): Add the work.
9259359 HEAD@{1}: commit (initial): Add the work.

Here you can see both of our commits with their respective SHAs. There’s even some helpful notation to show that the second was an amendment to the first. At this point, you can run any Git commands with the first SHA. If you wanted to see the original value, you could run:

$ git show 9259359
9259359 - Add the work. (21 minutes ago) <Chris Svenningsen>
diff --git a/my-work.rb b/my-work.rb
new file mode 100644
index 0000000..edfb476
--- /dev/null
+++ b/my-work.rb
@@ -0,0 +1,6 @@
+a = 1
+b = 2
+
+c = a + b

The reflog is also very helpful for recovering from a mishandled rebase. If we branched a change off of that amended commit:

$ git show feature/add-good-stuff
5abf05d - Doing good stuff. (8 minutes ago) <Chris Svenningsen>
diff --git a/my-work.rb b/my-work.rb
index 115f4e9..11e4312 100644
--- a/my-work.rb
+++ b/my-work.rb
@@ -1,7 +1,8 @@
a = 1
b = 3
d = 5
+e = 10

-c = a + b + d
+c = a + b + d + e

and there was some work done on master in the mean time:

$ git show master
be2a3bd - (master) Add some vars. (9 minutes ago) <Chris Svenningsen>
diff --git a/my-work.rb b/my-work.rb
index 115f4e9..7bc75ca 100644
--- a/my-work.rb
+++ b/my-work.rb
@@ -1,7 +1,9 @@
a = 1
b = 3
d = 5
+e = 7
+f = 9

-c = a + b + d
+c = a + b + d + e + f

When we go to rebase, we’re going to have some conflicts to deal with, as both the amended commit and our branch touched the e and c variables. While resolving those conflicts, we might make a mistake and set the wrong value for e:

$ git checkout feature/add-good-stuff

$ git rebase master
// resolve some conflicts

$ git show
d36b3ca - (HEAD, feature/add-good-stuff) Doing good stuff. (3 minutes ago) <Chris Svenningsen>
diff --git a/my-work.rb b/my-work.rb
index 7bc75ca..55030b2 100644
--- a/my-work.rb
+++ b/my-work.rb
@@ -1,5 +1,5 @@
a = 1
b = 3
d = 5
e = 7
f = 9

-c = a + b + d + e
+c = a + b + d + e + f

Oh no! What happened to our version of variable e? Fear not, reflog will save us:

$ git reflog
d36b3ca HEAD@{0}: rebase finished: returning to refs/heads/feature/add-good-stuff
d36b3ca HEAD@{1}: rebase: Doing good stuff.
be2a3bd HEAD@{2}: rebase: checkout master
5abf05d HEAD@{3}: checkout: moving from master to feature/add-good-stuff

Here you can see, from the top, the post-rebase SHA for our branch, two SHAs for the steps of the rebase process, and the SHA from just prior to the rebase. From here, we can reset --hard our branch to that last SHA before the rebase:

$ git reset --hard 5abf05d
$ git show
5abf05d - (HEAD, feature/add-good-stuff) Doing good stuff. (15 minutes ago) <Chris Svenningsen>
diff --git a/my-work.rb b/my-work.rb
index 115f4e9..11e4312 100644
--- a/my-work.rb
+++ b/my-work.rb
@@ -1,7 +1,8 @@
a = 1
b = 3
d = 5
+e = 10

-c = a + b + d
+c = a + b + d + e

This leaves us free to retry our rebase, as if nothing ever changed.
Checking the reflog one last time now shows that reset as the last action taken:

$ git reflog
5abf05d HEAD@{0}: reset: moving to 5abf05d
d36b3ca HEAD@{1}: rebase finished: returning to refs/heads/feature/add-good-stuff
d36b3ca HEAD@{2}: rebase: Doing good stuff.
be2a3bd HEAD@{3}: rebase: checkout master
5abf05d HEAD@{4}: checkout: moving from master to feature/add-good-stuff

Reversing it would be as easy as resetting to the previous SHA. This reinforces the main takeaway: none of these actions are written in stone.

Digging Deeper

reflog covers you for recent changes, but what if the amendment or rebase happened a couple weeks ago and you didn’t catch it? Yes, you could browse through the reflog and check any place it might have happened. If you want to filter out just the misplaced commits, though, you can use git fsck. Git’s fsck tool “validates the connectivity” of the objects Git is tracking. Try it out: go to a repo you’ve been using and run git fsck --no-reflogs. You should see something like:

dangling commit d29d216e7ec8a8a9cef54ad81d053c03713d5182
dangling blob 171ee9651602fd62b5117208153b54419f3a7105
dangling blob 761eece30a6e7d94abca851bd918c46d7e0098f2
dangling blob 921ece9a0d2835c12029b2073c1ff9439b72d134
dangling commit 991e85a6706c3526edd4fa5a74e1125f4f38291e
dangling commit fc1e4461e1eed8fef1dec536e826702b43ede28c
dangling commit 151f8f9dd00776d6793c255a7091ce31f3328610
dangling blob 281fec312a37a93786a8bc57076df3788a158a19
dangling commit 321f103d9a1f91ef70eb1883d6df3c0fbf06c3de

Each of those commit lines represents a commit that detached due to rebasing, amending or some other tree modification. The --no-reflogs flag ensures we’ll see commits that are still listed in a reflog. Knowing this, you can filter out the SHAs and run them through a command to get more information:

// get basic info about commits
$ git fsck --no-reflogs | grep commit | cut -d ' ' -f3 | xargs git --no-pager log --no-walk
2b55656 - Something great (2 weeks ago) <Chris Svenningsen>
ef353ce - Real good work (2 weeks ago) <Chris Svenningsen>
91968ab - Not so great (2 weeks ago) <Chris Svenningsen>

// get files changed for commits
$ git fsck --no-reflogs | grep commit | cut -d ' ' -f3 | xargs git --no-pager show --name-status
2b55656 - Something great (2 weeks ago) <Chris Svenningsen>
MM app/assets/javascripts/thing.js
MM spec/javascripts/thing_spec.js

ef353ce - Real good work (2 weeks ago) <Chris Svenningsen>
A db/migrate/20150512172255_add_yet_another_model.rb
M db/structure.sql

// find commits that updated config/routes.rb
$ git fsck --no-reflogs | grep commit | cut -d ' ' -f3 | xargs -J @ git log --no-walk --name-status @ -- config/routes.rb
025a7ad - wip (3 months ago) <Chris Svenningsen>
M config/routes.rb
40df0c4 - wip (3 months ago) <Chris Svenningsen>
M config/routes.rb

Also contained within are any changes you’ve git stashed in the last two weeks. You can filter out the lost stashes by grepping for the phrase “WIP on”, which is the start of Git’s default stash commit message:

$ echo 'stuff' >> Gemfile

$ git stash

$ git stash clear

$ git fsck --no-reflogs | grep commit | cut -d ' ' -f3 | xargs git log --no-walk --grep="WIP on"
Checking object directories: 100% (256/256), done.
Checking objects: 100% (6428/6428), done.
64f0d01 - WIP on develop: 3e94afe Stuff and things (1 minute ago) <Chris Svenningsen>
a1fa63d - WIP on feature/show-another-thing-90718956: 68ac57d A big thing (4 weeks ago) <Chris Svenningsen>

$ gt show 64f0d01
64f0d01 - WIP on develop: 3e94afe Stuff and things (1 minute ago) <Chris Svenningsen>
diff --cc Gemfile
index 54aa7a5,54aa7a5..b265484
--- a/Gemfile
+++ b/Gemfile
@@@ -77,3 -77,3 +77,4 @@@ group :development d
gem 'rb-fsevent'
gem 'growl'
end
++stuff

Into the Future, Unafraid

Hopefully, if you’ve ever been nervous to run a particularly gnarly Git command, you will feel empowered by this information. Take some time to explore the nooks and crannies of Git’s extensive command list, knowing that you’re unlikely to do any great harm. I don’t expect you’re actually going to need any of the techniques in this post more than a few times a year, but knowing they’re available can have a positive impact on your Git usage on a daily basis.