15 September 2011

Making a "lite" git repository

I've had my research in a git repository for more than a year and a half. Because most of my computational experiments are small (i.e., single processor rather than supercluster), I've stored a number of results in the repository rather than in a separate directory. Well, 18 months and many obsolete and regenerated results later, my repository is rather bloated. Preferring to only modify my thesis, figures, and related files on my new MacBook Air, I chose to create a new "thesis" repository.

To preserve the history of modifications I'd already performed, git filter-branch is the way to go. (Stack Overflow was helpful in finding out the details.) The only sticking point was in getting the large HDF5 files to be removed in garbage collection (git gc): removing .git/refs/original and expiring the reflog.

#!/bin/bash

git clone --no-hardlinks /Users/seth/_research thesis
cd thesis
git filter-branch -f --index-filter "git rm -rf --cached --ignore-unmatch \
   research_* _testproblems* _python* meeting_notes* *.h5 *.silo \
   .gitignore .gitxattr_json anisotropic_ss ideas_include technotes" \
   --prune-empty -- --all
git remote rm origin
rm -rf .git/refs/original/
git reflog expire --expire=0 --all
git gc --prune=now

Anyhow, I now have a nice, clean, 20MB repository suitable for putting on an ultraportable and for later uploading to GitHub/elsewhere to serve mankind.

0 comments:

Post a Comment