This article was published yesterday. It shows a way to extract data about a film from IMDB and put it into a local database. Actually, it doesn’t even do that. It produces SQL that you can then run to insert the data.
It’s all rather nasty stuff and indicative of the fact that most people using Perl are still using idioms that would have made us shudder ten years ago.
There were a few things that I didn’t like about the code. The use of curl
to grab data from the web site, the indirect object syntax when creating the XML::Simple object and, in particular, the huge amount of repetitive code used to create the SQL statements.
So I did what anyone would do in my situation. I rewrote it. Here’s my version.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
#!/usr/bin/perl use strict; use warnings; use XML::Simple; use LWP::Simple; @ARGV or die “Please provide movie title in quotes\n”; my $movie = shift; $movie =~ s/\s/+/g; my $movieData = get “http://www.imdbapi.com/?r=XML&t=$movie”; my $data = XMLin( $movieData ); my @fields = qw[released rating director genre writer runtime plot id title votes poster year rated actors]; my %film = %{$data->{movie}}; foreach (@fields) { $film{$_} =~ s/’/\\’/g; } my $tstamp = time(); my $sql = ‘INSERT INTO movie_collection (‘; $sql .= join ‘, ‘, @fields; $sql .= ‘) VALUES (‘; $sql .= join ‘, ‘, map { qq[‘$film{$_}’] } @fields; $sql .= “,'” . time . “‘);\n”; print $sql; |
I haven’t actually changed that much. I’ve tidied up a bit. Switched to using LWP::Simple, removed some unnecessary escaping, things like that. I have made two pretty big changes. I’ve got rid of all of the nasty variables containing data about a film. A film is a single object and therefore should be stored in a single variable. And, happily enough, the $data you get back from XMLin
contains a hash that does the trick perfectly.
The second change I made was to rejig the way that the SQL is created. By using an array that contains the names of all of the columns in the table, I can generate the SQL programmatically without all of that repetitive code. I’ve even made the SQL a little safer by explicitly listing the columns that we are inserting data into (this has the side effect of no longer needing to insert a NULL into the id column).
Of course, this would just be a first step. The whole idea of generating SQL to run against the database is ugly. You’d really want to use DBIx::Class (or, at the very least, DBI) to insert the data directly into the database. And why mess around with raw XML when you can us something like IMDB::Film to do it?
At that point in my thought process I had an epiphany. You don’t need the database at all. The IMDB data changes all the time. Why take a local copy? Why not just use the web service directly with IMDB::Film (or perhaps WebService::IMDB – I haven’t used either of them so I have no strong opinions on this).
In general, I think that the original code was too complicated. Which made it hard to maintain. My version is better (but I am, of course, biased) but it can be made even better by using more from CPAN.
CPAN is Perl’s killer app. If you’re not using CPAN then you’re not using half the power of Perl.
What do you think? How would you write this program?
Update: A few people have mentioned the fact that I’m directly interpolating random data into my SQL statements – which is generally seen as a bad thing as it opens the door to SQL injection attacks. In my defence, I’d like to make a couple of points.
Firstly, the data I’m using isn’t just any old data. It’s data that is returned from the IMDB API. So it would be hard to use this for a malicious attack on the system (at least until Hollywood makes a film about the life of Bobby Tables).
Secondly, I am cleaning the data before using it. I’m escaping any single quotes in the input data. I think that removes the possibility of attack. I could be wrong though, if that’s the case, please let me know what I’m missing.
But, in general, I agree that this approach is dangerous. This is one of the major advantages of using DBI. By using bound parameters in your SQL statements you can remove possibility of SQL injection attacks.
Update 2: You can, of course, rely on Zefram to point out the issues here. His comment is well worth reading.
Other people (on IRC) raised the potential of other Unicode characters that databases treat as quote characters but that aren’t covered by my substitution.
Leave a Reply to ZeframCancel reply