Sunday, June 13, 2010

IMDb to SQLite

Over the last few days I played around with importing IMDb data into SQLite, which thanks to the availability of IMDb data in plain-text is not that hard. There already exists software for doing that in the form of IMDbPy, but I decided to roll my own in Ruby for practice purpose.

The good news is that it is quite fun to have IMDb data in an SQL database, as it allows a great flexibility in terms of queries you can do, much more then you can get out of the website. So you can get a list of all Sci-Fi movies in which an actor plays that also played in Terminator 2 and things like that. Which makes it a great way to discover new movies, which otherwise would be hard to find.

The bad news however is that I found SQLite to be incredible slow for the task. The basic import, of just a subset of the data already takes around two hours, generating the SQL indices then takes another eight hours.

Code is a bit of an unfinished mess and might not receive much further development, but for those interested, it is available via:

git clone http://pingus.seul.org/~grumbel/imdb.git


Flattr this

1 comment:

Paul said...

Did you batch up your inserts into transactions of (say) 1000 or so? If you do that, you should find that SQLite is actually quite fast at importing data.