Friday, April 11, 2008

Why git rocks...

Over the past weeks I have started to really like git, but not because all the cool branch and merge stuff or because it is a distributed SCM, nope, for a much different reason: Simplicity.

Every now and then I start a new toy project and more often then not, it is just not worth the time to setup a proper SVN repository for it, it is just a few lines in the shell, sure, but you don't want that for 'hello world' and more importantly you don't want to do it after you already have a working directory, since moving all the files is a hassle and ensuring that everything actually lands in the repository even more so.

With git it is different, it is just one line: "git init" and you are done with creating the repository and moving your working directory 'into' it. After that it is just "git add", "git commit" and so on as you would do with SVN or basically any other SCM tool.

Now is git "perfect"? Nope, some things that irritate me is the lack of directory tracking and the handling of binary files. Lack of directory tracking isn't a big deal, but the binary file handling really gets me worried about what happens when you publish a git repository. When you delete or update binary files regularly you don't want a user to download all versions of a file, you want him to download just the latest one. But with git that isn't possible, since a 'git clone' copies the whole repository with all the history. Not a big deal for text, which can be well compressed and diffed, but with binaries? That could likely cause trouble. This becomes even more problematic when your repository not only contains binaries needed to run a program, but source files that might be much larger then the file needed to run the program. A short test shows that this turns a 50KB download easily into a 20MB download (game texture drawn in multilayer 1024x1024, resized to 256x256 for in-game use). Increasing the download size by 400 fault isn't fun.

Some of this could of course be handled by using multiple repositories, instead of just a single big one, but then even with a splited one you would still have the issue with the multiple versions of the same file that a user has to download. Maybe special filters that aid the compression would help, at least for images that might be doable in theory for some cases. But overall I think the best or even only solution would be a form of lazy-cloning from a remote repository, only copy what is actually needed for a working directory tree, not everything.

But that said, for small or text-only projects git just rocks, it is so simple to use, that even "hello world" is already enough to use it. git makes version control so cheap that there really isn't much reason left to not use it.


Plouj said...

Have you tried using git-repack ( to reduce the space used by the history of the binary files?

Grumbel said...

That won't help, since binary files and their diffs just don't get small enough. The only real solution would be a lazy-clone command.