O'Reilly

PHP IMDB Scraper

By on  

It's been quite a while since I've written a PHP grabber and the itch finally got to me. This time the victim is the International Movie Database, otherwise known as IMDB. IMDB has info on every movie ever made (or so it seems). Their HTML source code is easy to parse so this one was a piece of cake.

The PHP

//url
$url = 'http://www.imdb.com/title/tt0367882/';

//get the page content
$imdb_content = get_data($url);

//parse for product name
$name = get_match('/<title>(.*)<\/title>/isU',$imdb_content);
$director = strip_tags(get_match('/<h5[^>]*>Director:<\/h5>(.*)<\/div>/isU',$imdb_content));
$plot = get_match('/<h5[^>]*>Plot:<\/h5>(.*)<\/div>/isU',$imdb_content);
$release_date = get_match('/<h5[^>]*>Release Date:<\/h5>(.*)<\/div>/isU',$imdb_content);
$mpaa = get_match('/<a href="\/mpaa">MPAA<\/a>:<\/h5>(.*)<\/div>/isU',$imdb_content);
$run_time = get_match('/Runtime:<\/h5>(.*)<\/div>/isU',$imdb_content);

//build content
$content.= '<h2>Film</h2><p>'.$name.'</p>';
$content.= '<h2>Director</h2><p>'.$director.'</p>';
$content.= '<h2>Plot</h2><p>'.substr($plot,0,strpos($plot,'<a')).'</p>';
$content.= '<h2>Release Date</h2><p>'.substr($release_date,0,strpos($release_date,'<a')).'</p>';
$content.= '<h2>MPAA</h2><p>'.$mpaa.'</p>';
$content.= '<h2>Run Time</h2><p>'.$run_time.'</p>';
$content.= '<h2>Full Details</h2><p><a href="'.$url.'" rel="nofollow">'.$url.'</a></p>';

echo $content;

//gets the match content
function get_match($regex,$content)
{
	preg_match($regex,$content,$matches);
	return $matches[1];
}

//gets the data from a URL
function get_data($url)
{
	$ch = curl_init();
	$timeout = 5;
	curl_setopt($ch,CURLOPT_URL,$url);
	curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
	curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
	$data = curl_exec($ch);
	curl_close($ch);
	return $data;
}

As with my other grabbers, the trick is always in the regular expressions. Note that the most important part of the URL is the string after "/title/". That string uniquely identifies the movie.

O'Reilly Velocity Conference
Save 20% with discount code AFF20

Recent Features

  • Create a CSS Cube

    CSS cubes really showcase what CSS has become over the years, evolving from simple color and dimension directives to a language capable of creating deep, creative visuals.  Add animation and you've got something really neat.  Unfortunately each CSS cube tutorial I've read is a bit...

  • Create a CSS Flipping Animation

    CSS animations are a lot of fun; the beauty of them is that through many simple properties, you can create anything from an elegant fade in to a WTF-Pixar-would-be-proud effect. One CSS effect somewhere in between is the CSS flip effect, whereby there's...

Incredible Demos

Recently on David Walsh Blog

  • Open Files from Command Line on OS X

    I'm as much of a fan of application UIs as anyone else but I'm finding myself working more and more from the command line lately.  Much of that is becoming obsessed with media manipulation but I'm forcing myself to use less UIs so that I...

  • Get Stock Quotes From Command Line

    When I conned my way into my first professional programming gig, I didn't really think much about money -- just that I was getting my foot in the door.  But as my career has gone on, I've been more aware of money, investing, and retirement.  I've recently...

  • Geolocation API

    One interesting aspect of web development is geolocation; where is your user viewing your website from? You can base your language locale on that data or show certain products in your store based on the user's location. Let's examine how you can...

  • Create an Image Preview from a Video

    Visuals are everything when it comes to media.  When I'm trying to decide whether to watch a video on Netflix, it would be awesome to see a trailer of some kind, but alas that isn't available.  When I'm looking to download a video on my computer,...

  • New:  Webdesigner News!

    A new and exciting website has recently been launched for web designers and developers. You likely spend hours every morning browsing through hundreds of posts on your RSS feeds, hoping to stumble across relevant stories. Webdesigner News was built to provide web designers and developers with...