Wednesday, August 17, 2011

Converting from SVN to Git

I just converted Galapix from SVN to Git, so here a quick overview about the process.

First thing one wants to do is make a copy of the repository, as the conversion can take quite a bit of time and made require a few restarts to get every detail right. A copy of the repository can be done with svnsync:

svnadmin create /tmp/your_svn_repository
svnsync init file:///tmp/your_svn_repository http://www.googlecode/your_projects_svn_dir
svnsync sync file:///tmp/your_svn_repository

This creates a local repository and copies all the content from the remote, here googlecode, repository in it. One done with that one can start creating the git repository from it, for that there exist "git svn", which not only allows conversions of repository, but also checkins from the Git repository back into the SVN repository if desired, however we are not going to allow this. This is meant as a one way conversion from SVN to Git without a way back.

So next step is:

git svn clone file:///tmp/your_svn_repository/ your_project.git \
--trunk trunk/galapix/ \
--tags tags/ \
--branches branches/ \
-A /tmp/authors.txt \

The --trunk, --tags and --branches do the obvious thing, they tell "git svn" where to look for your branches and tags, as they don't have a fixed locations in SVN. The authors.txt file is a simple text file mapping the SVN account to Git style names, it has the form of:

grumbel = Ingo Ruhnke <>

Where the left half is the SVN accountname and the right half the Git name.

Tho --no-metadata flag strips out meta data that "git svn" would normally insert into the commit messages to allow tracking the Git commits back to their SVN origin. This might have some use if you have references in documentation or bug reports to older revisions, but seems otherwise not be needed, so we strip it here.

The next step is a bit weird. With that "git svn clone" you now have a full functioning Git repository of your SVN content, but something is still wrong. All your SVN tags get converted to remote branches by "git svn", not tags and all the SVN branches are also remote branches, not local ones. I am not quite sure why that happens, part of the reason seems to be because SVN tags don't have to be constant, while Git tags have to be, but not really sure way then they are remote and not local ones.

Anyway, the conversion from remote branches to proper tags and local branches isn't that difficult, just a little ugly. To see everything "git svn" has produced use:

git branch -a -v

The interesting part are the remote branches, listable via:

git branch -r -v

Another way to inspect the repository situation is via:

git show-ref

Converting the branches to tags is a simple matter of doing something like this:

git tag galapix-0.1.0 tags/galapix-0.1.0

Converting the branches can be done with:

git branch local_branch_name remote_branch_name

Some branches and tags might be exist multiple times, once as tags/galapix-0.1.0 and once as something like tags/galapix-0.1.0@723. I assume that is the result of SVN "accidents", i.e. deleting a tag or branch and then recreating it or otherwise breaking clean continuity of the repository. The @{number} branches and tags seem to be the older one, so in case you know what you are doing, you can probably just delete them. There might also be a branch called trunk/, you can just delete that as it should be identical with the master branch you have.

Deleting the old now converted branches is a simple matter of:

git branch -d -r tags/galapix-0.1.0

The final step is cleaning up some remains of "git svn", in .git/config there is a section called [svn-remote] and [svn] that can be deleted and there is a subdirectory .git/svn/ which is no longer needed either.

If that last section sounded a bit messy, it's because it is. I couldn't really find any definitive documentation on how to do any of that the proper way, it all boilded down to some manual for-loop and grep'ing to manually translate the branches to local ones and tags.

Another issue which I haven't really looked deeper into is how the whole process reacts to less clean SVN repositories, i.e. repositories where the content of trunk/ might have been moved around to say trunk/{subproject} or where other accidents might have happened with the branches/ and tags/ directory.

I am also still a little clueless on why "git svn" creates everything as remote branches, not as regular local branches, its highly likely an artifact of "git svn" allowing commits back into the svn repository, but no idea why there isn't an easy way to disable that.


qubodup said...

There seem to be tools like svn2git around. gave them a test?

Grumbel said...