PHP IMDB Scraper

By  on  

It's been quite a while since I've written a PHP grabber and the itch finally got to me. This time the victim is the International Movie Database, otherwise known as IMDB. IMDB has info on every movie ever made (or so it seems). Their HTML source code is easy to parse so this one was a piece of cake.


$url = '';

//get the page content
$imdb_content = get_data($url);

//parse for product name
$name = get_match('/<title>(.*)<\/title>/isU',$imdb_content);
$director = strip_tags(get_match('/<h5[^>]*>Director:<\/h5>(.*)<\/div>/isU',$imdb_content));
$plot = get_match('/<h5[^>]*>Plot:<\/h5>(.*)<\/div>/isU',$imdb_content);
$release_date = get_match('/<h5[^>]*>Release Date:<\/h5>(.*)<\/div>/isU',$imdb_content);
$mpaa = get_match('/<a href="\/mpaa">MPAA<\/a>:<\/h5>(.*)<\/div>/isU',$imdb_content);
$run_time = get_match('/Runtime:<\/h5>(.*)<\/div>/isU',$imdb_content);

//build content
$content.= '<h2>Film</h2><p>'.$name.'</p>';
$content.= '<h2>Director</h2><p>'.$director.'</p>';
$content.= '<h2>Plot</h2><p>'.substr($plot,0,strpos($plot,'<a')).'</p>';
$content.= '<h2>Release Date</h2><p>'.substr($release_date,0,strpos($release_date,'<a')).'</p>';
$content.= '<h2>MPAA</h2><p>'.$mpaa.'</p>';
$content.= '<h2>Run Time</h2><p>'.$run_time.'</p>';
$content.= '<h2>Full Details</h2><p><a href="'.$url.'" rel="nofollow">'.$url.'</a></p>';

echo $content;

//gets the match content
function get_match($regex,$content)
	return $matches[1];

//gets the data from a URL
function get_data($url)
	$ch = curl_init();
	$timeout = 5;
	$data = curl_exec($ch);
	return $data;

As with my other grabbers, the trick is always in the regular expressions. Note that the most important part of the URL is the string after "/title/". That string uniquely identifies the movie.

Recent Features

  • By
    Create Namespaced Classes with MooTools

    MooTools has always gotten a bit of grief for not inherently using and standardizing namespaced-based JavaScript classes like the Dojo Toolkit does.  Many developers create their classes as globals which is generally frowned up.  I mostly disagree with that stance, but each to their own.  In any event...

  • By
    LightFace:  Facebook Lightbox for MooTools

    One of the web components I've always loved has been Facebook's modal dialog.  This "lightbox" isn't like others:  no dark overlay, no obnoxious animating to size, and it doesn't try to do "too much."  With Facebook's dialog in mind, I've created LightFace:  a Facebook lightbox...

Incredible Demos

  • By
    MooTools Wall Plugin

    One of the more impressive MooTools plugins to hit the Forge recently was The Wall by Marco Dell'Anna.  The Wall creates an endless grid of elements which can be grabbed and dragged, fading in elements as they are encountered.  Let me show...

  • By
    Create a Photo Stack Effect with Pure CSS Animations or MooTools

    My favorite technological piece of Google Plus is its image upload and display handling.  You can drag the images from your OS right into a browser's DIV element, the images upload right before your eyes, and the albums page displays a sexy photo deck animation...