Protecting Sensitive Information in Public Git Repositories

On the (much too early) bus to work this morning, I was reading my Twitter feed and saw an interesting question from Rob Hawkes:

How do you handle config files in your apps when you use Git? I keep accidentally adding config files with sensitive data to Git. :(
— Robin Hawkes (@robhawkes) October 5, 2011

Rob's Twitter followers made all kinds of recommendations and Rob eventually decided it was a solved problem, declaring

Best method I've found so far is creating a temporary config file and keeping that in git, then .gitignoreing the real one.

and then

Thanks for the config file tips! In the end I went with a config.example.js file stored in git and a config.js file that is ignored.

For those following along at home, they mean the same thing.

As Rob was probably intending, this can be used for deploying an app on your personal server, or for a sample App on a PaaS like Google App Engine or Heroku. When testing such an app, the ability to have a native environment locally is a huge convenience, but the overhead of remembering which private keys need to be hidden is a headache and sometimes completely neglected. But it shouldn't be, because git never forgets!

Anyone who has used git for any substantial amount of time probably initially conceived of this hack when on first thought. (This is no insult to Rob, just the inevitability of the pattern.) But, by the time Rob posted his solution, I had moved on from this and came up a solution that I think does the trick a bit more thoroughly. I envisioned a solution which assumes people who checkout my code will want to keep their config in a specified path that is already in the repo; of course, I also wanted to share this with the interwebs.

Anyhow, this is quick and dirty. First, create config.js and _config.js in the root directory of your git repository (the same directory that .git/ lives in). I intend config.js to be the local copy with my actual passwords and keys and _config.js to hold the master contents that actually show up in the public repo. For example, the contents of config.js are:

var SECRET = 'Nnkrndkmn978489MDkjw';

and the contents of _config.js are:

var SECRET = 'SECRET';

Since I don't want a duplicate in my repo, I put a rule in my .gitignore file to ignore _config.js. (For those unfamiliar, this can be done just by including _config.js on its own line in the .gitignore file.) After doing so, I set up two git hooks, a pre-commit and post-commit hook.

To install the hooks, just add the files pre-commit and post-commit to the .git/hooks/ subdirectory in your repo.They are nearly identical files, with a one-line difference. Both files simply swap the contents of config.js and _config.js, while pre-commit also adds config.js to the changelist. First I'll give you the contents of pre-commit, and then explain why it's cool/safe:

#!/usr/bin/env python

import os

hooks_dir = os.path.dirname(os.path.abspath(__file__))
relative_dir = os.path.join(hooks_dir, '../..')
project_root = os.path.abspath(relative_dir)

git_included_config = os.path.join(project_root, 'config.js')
confidential_config = os.path.join(project_root, '_config.js')

with open(git_included_config, 'rU') as fh:
  git_included_contents = fh.read()

with open(confidential_config, 'rU') as fh:
  confidential_contents = fh.read()

with open(git_included_config, 'w') as fh:
  fh.write(confidential_contents)

with open(confidential_config, 'w') as fh:
  fh.write(git_included_contents)

os.system('git add %s' % git_included_config)

Also note the contents of post-commit are exactly the same, except without the final statement:

os.system('git add %s' % git_included_config).

So what is happening in this file:

Uses the Python os module to determine the absolute path to the root directory in your project by using the absolute path of the hook file, backing up two directories and again find that absolute path.
Determines the two files which need to swap contents
Loads the contents into string variables and then writes them to the opposite files
(only in pre-commit) Adds the included file to the changelist before the commit occurs.

Step 4 is actually the secret sauce. It puts cleaned, non-sensitive data into the checked in config.js file and then updates the changelist before making a commit, to ensure only the non-sensitive data goes in. Though you could do this yourself by making an initial commit with clean data and then never git adding the file with your actual data, these hooks prevent an accident and allow you to update your local _config.js file with more fields as your config spec changes.

But wait bossylobster, you say, what if one of the hooks doesn't occur? You are right! As pre-commit stands above, if the changelist is empty we have problems. Since the pre-commit hook changes config.js to the same value in HEAD, git will tell us either nothing to commit or no changes added to commit. In this case, the commit will exit and the post-commit hook will never occur. THIS IS VERY BAD, since the contents of config.js and _config.js will be switched but not switched back. So, to account for this, we need to append the following code to the end of pre-commit:

with os.popen('git st') as fh:
  git_status = fh.read()

if ('nothing to commit' in git_status or
    'no changes added to commit' in git_status or
    'nothing added to commit' in git_status):
  import sys

  msg = ('# From pre-commit hook: No commit necessary, '
         'sensitive config unchanged. #')
  hash_head = '#' * len(msg)
  print ('%s\n%s\n%s\n\n' % (hash_head, msg, hash_head)),

  with open(git_included_config, 'w') as fh:
    fh.write(git_included_contents)

  with open(confidential_config, 'w') as fh:
    fh.write(confidential_contents)

  sys.exit(1)

For final versions see the pre-commit and post-commit files. Thanks again to Rob Hawkes for the idea/work break over lunch!

Update 1:

One of Rob's followers, Paul King, found and tweeted a very different alternative that is also pretty cool. Check out the post he found by Rob Wilkerson.

Update 2:

I swapped out a screen shot of the tweet for a CSS-ified version, inspired by and based on a design used on Mashable.

Update 3:

Some change in git causes empty commits to be allowed. I either didn't notice this before or it just showed up in git. So I added sys.exit(1) to force the pre-commit script to fail when nothing is changed and added a check for the phrase nothing added to commit as well.

Bossy Lobster

A blog by Danny Hermes; musing on tech, mathematics, etc.

Protecting Sensitive Information in Public Git Repositories

Update 1:

Update 2:

Update 3:

Comments