Andrew Yurisich a collection of things

Don't Link that Line Number!

Today I saw a co-worker take a link from a specific line number in our team's github repository and place it in the comments section of a user story. This is very helpful for those who need to see a quick reference to the code that could be causing a defect, but it's wrong. Here's what you need to do instead.

First, the original setup had a link that looked like this.

https://github.com/Droogans/.emacs.d/blob/mac/init.el#L135-L138

A snippet from a current github file.

Looks great, right? Here's that same link a few weeks later.

A snippet from an out of date github file.

The problem with linking to line numbers is that if someone were to add some new code above line number 135, your link will still point there, regardless if the new code makes any sense. It's not anchored to the code, just to the line numbers in the file.

Here's what you do instead. First, highlight the code that you want to link to, just like in the examples above. Then, tap the "y" key to jump to the last commit found for that region.

That's it! You've now anchored the code in question to an immutable reference in github's history of your project. No matter what happens, this commit will remain unique forever, and you won't have to worry about having to work out what that old reference used to point to.

Use your Editor for Pull Request Reviews

When reviewing pull requests, I've noticed it's way too easy to scan through the green and red lines, compare them quickly in your mind, and make hasty decisions about what's changed in the code.

Here's a secret: when code changes, it changes in more places than what's shown in the diff. It sounds strange, but it's true. You look at the diff of a two line change, scan the green line, the red line, and think, "This is doing the same thing, just a little differently. And it fixes a bug. Looks good to me". You merge it, and later that week, end up looking at the file on an unrelated task while following a stack trace. Suddenly, you see the method in the context of the file: the usage of this other class' method is borderline abusive. What's it doing here? It wasn't meant to be used in these types of situations! Who did this? You run git blame, and trace it back to a pull request that you merged just a few days ago.

If this hasn't happened to you, it will eventually, given your team and codebase grow to such a size as to make it impossible for one person to have mastery over all of it. So in an effort to reduce the silo effect, you may participate in code reviews. The number one way to do this is in the web interface provided by github.

A sample of github's web based diff view.

Although convenient, it lacks several things:

  1. Your keyboard is useless in this view.
  2. Your color scheme is probably not the same.
  3. Your font is likely different than what you're used to.
  4. The context of the surrounding code is weak at best.

And most importantly, reading diffs is hard. I imagine a developer spends at best a tiny fraction of their time looking at code through the lens of a diff file. The overwhelming majority of the time is spent reading code in the context of wielding it, via an editor of some kind. Looking at code in an editor forces you to stop reading the code and start comprehending it instead.

Reading diffs is like shopping for clothes online. You can get a good idea of what you think it's like, but until it arrives in a more tangible form, you won't know how it fits and feels overall. So from now on, consider adding an extra 30 seconds to your code review process, and fetch that branch, check it out, and take it for a spin. I guarantee you'll notice a difference in your review process right away. You don't write your code using the hilariously inadequate github web interface, so why should you settle for reading it in a similarly crippled fashion?

I would go as far to say that github should expose an option for me to completely disable diff previews for pull requests, and their respective commits on my team. Instead, it would only tell you what files changed, and where. It's up to you to see what those changes were.

There should only be one way to look at code: in the same circumstances that it was written in, and in the same environment it will be interacted with from now on.

Git Off My Lawn

After working through a tricky situation in an intern's git repository (her co-worker had literally squashed everything on master into one commit), I was asked how I did it. This led to some passing remarks from others participating in the conversation that almost everything I was showing her involved "bad, destructive commands", and I immediately agreed. My git work flow is centered around many commands that online resources will tell you, quite bluntly, to never use. But many of those commands offer the most flexibility, granted you follow a couple of precautionary steps first.

Step One: Fork Every Project You Ever Work On.

This is just a good habit to get into. If I could, I would fork this blog, but since I'm both the owner and the only contributor, I can't. Once you've forked a repository, add it as a remote.

$> git clone projectYouForked && cd projectYouForked
...
$> git remote show origin
* remote origin
  Fetch URL: git@github.com:Droogans/projectYouForked.git
  Push  URL: git@github.com:Droogans/projectYouForked.git
  HEAD branch: master
...
$> git remote add upstream git@github.com:Org/projectYouForked.git

Once you have this set up, you can use this wonderful git update command, which requires that you have both an origin and upstream remote set up.

$> git update --help

git update is aliased to

git fetch --all --prune;
git cleanup;
git pull --rebase upstream master;
git push origin master;

What this command does is:

  1. Grab all new updates from every remote repository.
  2. Also, delete anything that has been deleted on their end, too
  3. Delete any branches on our remote that have been merged into master.
  4. Pull any new changes underneath our existing ones.
  5. Push our new copy of the project to our repository.

Make sure you're on master when you run this, or you may inadvertently trigger a rebase on your current branch.

The command git cleanup is alised to:

git branch --merged | grep  -v '\\\\*master' | xargs -n 1 git branch -d

This deletes the branches that have already been merged into master.

At this point, maintaining a fork becomes as easy as checking out master and running git update periodically. I actually have some projects with four remotes, and this works just fine in those instances, too.

Step Two: Make Branches for Everything.

This is a far less controversial suggestion than rewriting history. But, when combined with step one, you are pretty much assured that the number of people who will be relying on your history to be stable are limited to just you. I keep a pretty open policy with how I manage my teammates' remotes and their branches: your remote, your branch, your history. Obviously, anyone who is going to alter the deeper history they inherited from the master branch is being very inconsiderate of the rest of the team.

Step Three: Rebase, Amend, and More

In the above example, we had an unusual situation. The intern I was helping had a pull request open against a master branch, that, earlier that day, had been squashed into one commit. Since this was a new project with less than twenty commits, I was understanding of this move, but not amused. Her pull request was no longer mergable. Github was telling her to checkout her branch, update it against master, and push it back to get it mergable again.

Her branch had about six commits on it, and another 8 beneath in, which came from an outdated version of the master branch. Those eight commits were now one. So first things first, I immediately saved what she had, and got her master branch in sync with github's version of master.

$> git commit -am "New Feature WIP."
$> git checkout master
$> git reset --hard origin/master

Many people see git reset --hard and panic, and for good reason. You can possibly lose everything you've worked on, given it's currently being tracked by git. But in this case, I wanted to "match" everything that was in the master branch on github, regardless of what's in my local master branch.

$> git checkout feature-branch
$> git rebase master

When I ran this, I discovered that there were a lot of small commits on her feature-branch, for trivial fixes that were created while discovering what was going to work, and what wasn't. This sort of pattern leads to great commit discipline, but awful rebasing. Every commit will likely reintroduce the same conflicts each time you approach the most recent commit, which is frustrating.

An easy way around this is to squash everything you've worked on into one commit. That way, rebasing only has to run through one set of conflicts.

$> git rebase -i HEAD~8
pick 0c55f9e added the main file.
pick 5cf9d40 new helper file added.
pick 1be507c fixed unit test.
pick 5b1edae fix complaints from lint checker
pick 7661672 update readme
pick 93648a7 add new method
pick 67f1dfb fix new method, ready for pull request
pick fc2f6e5 New Feature WIP.

I wanted to keep the first commit, but make the commit sound better. Then, I'd need to move all of those tiny fixes into it, and lastly, keep the "New Feature" commit separate for later.

$> git rebase -i HEAD~8
reword 0c55f9e added the main file.
squash 5cf9d40 new helper file added.
squash 1be507c fixed unit test.
squash 5b1edae fix complaints from lint checker
squash 7661672 update readme
squash 93648a7 add new method
squash 67f1dfb fix new method, ready for pull request
pick fc2f6e5 New Feature WIP.

The end result is a history that now looks like this:

$> git lg -3
fc2f6e5 - New Feature WIP. (2 minutes ago) <Intern>
4fa0854 - Main Feature. (2 minutes ago) <Intern>
bcdb780 - Squash master. (7 hours ago) <Co-worker>
91feb23 - Initial Commit. (4 days ago) <Co-worker>

Next, I needed to get that "New Feature WIP" commit on a separate branch, where it belonged:

$> git reset ^HEAD
$> git status -sb
## feature-branch
 M next_big_thing.py

A big mental hurdle that the intern had to get over was that, by creating a new branch here, I lose absolutely nothing in the history we'd been working towards building (or rebuilding) the entire time.

$> git checkout -b new-feature
$> git commit -am "New Feature WIP."
$> git lg -4
d710b6d - New Feature WIP. (1 minute ago) <Intern>
4fa0854 - Main Feature. (4 minutes ago) <Intern>
bcdb780 - Squash master. (7 hours ago) <Co-worker>
91feb23 - Initial Commit. (4 days ago) <Co-worker>

And then we go to the old branch.

$> git checkout feature-branch
$> git lg -3
4fa0854 - Main Feature. (4 minutes ago) <Intern>
bcdb780 - Squash master. (7 hours ago) <Co-worker>
91feb23 - Initial Commit. (4 days ago) <Co-worker>

I now only have three commits in this branch: the initial commit, the squashed master commit, and the clean, ready to merge version of the feature. Checking out the new-feature branch will have all those same commits, plus the one new commit I had made for her just now. Very clean!

The next step is to get this up to github, where it can be reviewed and merged by her co-worker.

$> git push origin -f

The -f flag is the dreaded force push, possibly the most famous of all git operations in programming circles. It says, "my version of this branch is now the official version of this branch, for everyone". Obviously, doing this to the master branch is a really easy way to get a bad performance review at work, but in our own fork, in our own branch, there should be exactly zero other people depending on your version of history as a source of truth. Finally, we can have our co-worker merge this branch into master. Afterwards, we continue working like nothing happened.

$> git checkout master
$> git update
Fetching origin
Fetching upstream
...
From github.com:Org/projectYouForked.git
 x [deleted]  (none)     -> upstream/co-workers-other-feature
 x [deleted]  feature-branch -> origin/feature-branch
From github.com:Org/projectYouForked
 * branch     master     -> FETCH_HEAD
First, rewinding head to replay your work on top of it...
Fast-forwarded master to be64c9f71dcb7dea22e8828bd93de6b9daeb1d6c.
...
To git@github.com:Coworker/projectYouForked.git
   5c009ee..be64c9f  master -> master

Now, we simply jump back into the new-feature branch that we set up earlier, and get back to work.

$> git checkout new-feature
$> git rebase master

We are now completely up to date, even though the foundation of the entire project was destroyed without our consent.

Why You Should at Least Know this Stuff

I will admit, this is a really dangerous workflow, and not many people would be comfortable using it. That's fine. But if all you know is how to merge the new master into your current branch, you'd have never made it out of this situation, since merging would never resolve the disparate histories of your branch and the new master branch. The pull request would have never become mergable.

You'd have to clone the repo again in a new directory, and copy-paste your work from the one set of files to the other. Or, perhaps you know how to create a patch file, and you did it that way. Maybe you know how to reset --hard, and you instead created a feature-retry branch, using git cherry pick to grab each of the six commits you made in the first iteration of the feature-branch. All of these ways are slower, more error prone, and realistically, not an option in a larger project where dozens (or hundreds) of commits separate you and the new master branch. Most options outside of rewriting history are harder than just using the options git provides you.

Test Driving Javascript Libraries in the Browser

Last week, one of my co-workers showed me a really nice trick you can use if you find yourself needing to validate some snippet that features a javascript library.

Here, I was about to test whether or not I could sort the results of a map call in lodash by going through the typical ceremony of using require in the node interpreter environment.

Testing javascript functionality through node.

Fortunately, said co-worker was working with me when he saw me doing this, and stopped me.

Next time you're tempted to drop down into an interactive environment to test out an idea like this, instead, just go to that library's live demo or documentation website. In this case, it was http://lodash.com/docs. Next, open a command-line console in the browser's developer tools. Using Chrome on a Mac, that would be j, or ctrl shift j for Windows/Linux. For Firefox, replace j with i.

From there, you're good to go! Just start using the library, as it's already been brought into the page via the demo anyhow.

Getting instant access to the Lodash library on the lodash site.

Colorizing Output from the cat Command

I have a special function called pcat, which stands for "pretty catenate", saved in my dotfiles repository on github. I really wish this style of cat would be the default on modern machines.

Here it is, sans link:

function pcat() {
  pygmentize -f terminal256 -O style=native -g $1 | less;
}
alias cat=pcat

I've omitted the part where you get pip installed on your machine first, and install the pygments module for colorizing output based on a detected language. This is required.

Here's an example of what the output looks like on my machine:

$> cat src/openstack/identity.clj
Pretty cat output for a clojure file

Even though it seems like overkill (you could just as easily open the file in your editor), it's small, unobtrusive, and strangely, not the default.