I've had my research in a git repository for more than a year and a half. Because most of my computational experiments are small (i.e., single processor rather than supercluster), I've stored a number of results in the repository rather than in a separate directory. Well, 18 months and many obsolete and regenerated results later, my repository is rather bloated. Preferring to only modify my thesis, figures, and related files on my new MacBook Air, I chose to create a new "thesis" repository.
To preserve the history of modifications I'd already performed,
git filter-branch is the way to go. (Stack Overflow was helpful in finding out the details.) The only sticking point was in getting the large HDF5 files to be removed in garbage collection (
git gc): removing
.git/refs/original and expiring the reflog.
#!/bin/bash git clone --no-hardlinks /Users/seth/_research thesis cd thesis git filter-branch -f --index-filter "git rm -rf --cached --ignore-unmatch \ research_* _testproblems* _python* meeting_notes* *.h5 *.silo \ .gitignore .gitxattr_json anisotropic_ss ideas_include technotes" \ --prune-empty -- --all git remote rm origin rm -rf .git/refs/original/ git reflog expire --expire=0 --all git gc --prune=now
Anyhow, I now have a nice, clean, 20MB repository suitable for putting on an ultraportable and for later uploading to GitHub/elsewhere to serve mankind.