IMDB Grabber Class by Fabian Beiner

By  on  

One of my readers, Fabian Beiner, took my PHP-based IMDB Grabber script a step further and made it into a useful class. The class also has the ability to grab more information my script did. Check it out!

The PHP

<?php
/**
 * @author Fabian Beiner <fabianDOTbeinerATgmailDOTcom>
 * @version 2.1alpha
 *
 * @comment Original idea by David Walsh <davidATdavidwalshDOTname>, thanks! Your blog rocks ;)
 *          I did this script in the middle of the night while being ill, no guarantee for anything!
 *
 * @license http://creativecommons.org/licenses/by-sa/3.0/de/deed.en_US
 *          Creative Commons Attribution-Share Alike 3.0 Germany
 *
 * Yay, after two days IMDB changed their layout... Great! :( Also added a fallback if cURL is missing.
 */

class IMDB {
    function __construct($url) {
        $this->gotCurl      = extension_loaded('curl');
        $imdb_content       = $this->imdbHandler($url);
        $this->movie        = trim($this->getMatch('|<title>(.*) \((.*)\)</title>|Uis', $imdb_content));
        $this->director     = trim($this->getMatch('|<h5>Director:</h5><a href="(.*)">(.*)</a><br/>|Uis', $imdb_content, 2));
        $this->url_director = trim($this->getMatch('|<h5>Director:</h5><a href="(.*)">(.*)</a><br/>|Uis', $imdb_content));
        $this->plot         = trim($this->getMatch('|<h5>Plot:</h5>(.*) <a|Uis', $imdb_content));
        $this->release_date = trim($this->getMatch('|<h5>Release Date:</h5> (.*) \((.*)\) <a|Uis', $imdb_content));
        $this->mpaa         = trim($this->getMatch('|<h5><a href="/mpaa">MPAA</a>:</h5> (.*)</div>|Uis', $imdb_content));
        $this->run_time     = trim($this->getMatch('|Runtime:</h5>(.*) (.*)</div>|Uis',$imdb_content));
        $this->rating       = trim($this->getMatch('|<div class="meta"><b>(.*)</b>|Uis', $imdb_content));
        $this->votes        = trim($this->getMatch('|  <a href="ratings" class="tn15more">(.*) votes</a>|Uis', $imdb_content));
        $this->country      = trim($this->getMatch('|<h5>Country:</h5><a href="(.*)">(.*)</a></div>|Uis', $imdb_content, 2));
        $this->url_country  = trim($this->getMatch('|<h5>Country:</h5><a href="(.*)">(.*)</a></div>|Uis', $imdb_content));
    }

    function imdbHandler($input) {
        if (!$this->getMatch('|^http://(.*)$|Uis', $input)) {
            $tmpUrl = 'http://us.imdb.com/find?s=all&q='.str_replace(' ', '+', $input).'&x=0&y=0';
            if ($this->gotCurl) {
                $ch      = curl_init();
                curl_setopt_array($ch, array(CURLOPT_URL => $tmpUrl,
                                             CURLOPT_HEADER => false,
                                             CURLOPT_RETURNTRANSFER => true,
                                             CURLOPT_TIMEOUT => 10
                                            )
                                  );
                $data = curl_exec($ch);
                curl_close($ch);
            } else {
                $data = file_get_contents($tmpUrl);
            }
            $foundMatch = $this->getMatch('|<p style="margin:0 0 0.5em 0;"><b>Media from <a href="(.*)">(.*)</a> ((.*))</b></p>|Uis', $data);
            if ($foundMatch) {
                $this->url = 'http://www.imdb.com'.$foundMatch;
            } else {
                $this->url = '';
                return 0;
            }
        } else {
            $this->url = $input;
        }
        if ($this->gotCurl) {
            $ch      = curl_init();
            curl_setopt_array($ch, array(CURLOPT_URL => $this->url,
                                         CURLOPT_HEADER => false,
                                         CURLOPT_RETURNTRANSFER => true,
                                         CURLOPT_TIMEOUT => 10
                                        )
                              );
            $data = curl_exec($ch);
            curl_close($ch);
        } else {
            $data = file_get_contents($this->url);
        }
        return str_replace("\n",'',(string)$data);
    }

    function getMatch($regex, $content, $index=1) {
        preg_match($regex, $content, $matches);
        return $matches[(int)$index];
    }

    function showOutput() {
        if ($this->url) {
            $content.= '<h2>Film</h2><p>'.$this->movie.'</p>';
            $content.= '<h2>Director</h2><p><a href="http://www.imdb.com'.$this->url_director.'">'.$this->director.'</a></p>';
            $content.= '<h2>Plot</h2><p>'.$this->plot.'</p>';
            $content.= '<h2>Release Date</h2><p>'.$this->release_date.'</p>';
            $content.= '<h2>MPAA</h2><p>'.$this->mpaa.'</p>';
            $content.= '<h2>Run Time</h2><p>'.$this->run_time.' minutes</p>';
            $content.= '<h2>Full Details</h2><p><a href="'.$this->url.'">'.$this->url.'</a></p>';
            $content.= '<h2>Rating</h2><p>'.$this->rating.'</p>';
            $content.= '<h2>Votes</h2><p>'.$this->votes.' votes</p>';
            $content.= '<h2>Country</h2><p><a href="http://www.imdb.com'.$this->url_country.'">'.$this->country.'</a></p>';
            echo $content;
        } else {
            echo 'Sorry, nothing found! :(';
        }
    }
}

// Examples :)
$imdb = new IMDB('http://www.imdb.com/title/tt0367882/');
print($imdb->showOutput());
echo '<hr>';
$imdb = new IMDB('Cruel Intentions');
print($imdb->showOutput());
echo '<hr>';
$imdb = new IMDB('I guess this movie name doesnt exists');
print($imdb->showOutput());
?>

Thanks to Fabian for sharing! Enjoy!

Recent Features

  • By
    Facebook Open Graph META Tags

    It's no secret that Facebook has become a major traffic driver for all types of websites.  Nowadays even large corporations steer consumers toward their Facebook pages instead of the corporate websites directly.  And of course there are Facebook "Like" and "Recommend" widgets on every website.  One...

  • By
    Create a Sheen Logo Effect with CSS

    I was inspired when I first saw Addy Osmani's original ShineTime blog post.  The hover sheen effect is simple but awesome.  When I started my blog redesign, I really wanted to use a sheen effect with my logo.  Using two HTML elements and...

Incredible Demos

  • By
    Create a Simple News Scroller Using MooTools, Part I:  The Basics

    News scroller have been around forever on the internet. Why? Because they're usually classy and effective. Over the next few weeks, we'll be taking a simple scroller and making it into a flexible, portable class. We have to crawl before we...

  • By
    Using jQuery and MooTools Together

    There's yet another reason to master more than one JavaScript library: you can use some of them together! Since MooTools is prototype-based and jQuery is not, jQuery and MooTools may be used together on the same page. The XHTML and JavaScript jQuery is namespaced so the...

Discussion

  1. Hey Dave,

    Have you seen this DOM scraping/parser class, very handy for scraping any site!!

    http://simplehtmldom.sourceforge.net/

    Has an example for scraping Slashdot and Google!

    Paul

  2. Fabian Beiner

    I’m not Dave but I thought about using this class for my version of this script, too. But in the end there was no reason to use an almost 1000 lines long class for a script with <100 lines. ;)

  3. he he yeah fair point !!

    My post was more of a heads up and something that might be handy to have in the arsenal for future projects!!

  4. Jonas

    strange, I only see a white page when using this script..
    it only shows something when I comment the examples away and put echo "blabla";
    in the end of the script, it seems that as soon as I call new IMDB('..');
    it stops working…
    someone knows why ?

  5. Fabian Beiner

    Maybe put error_reporting(E_ALL); on the very top of my PHP file and see if there are any errors. You maybe don’t have the curl-extension installed – and depending on your setup there won’t be any error messages. Did this help?

  6. Fabian Beiner

    Maybe you’re missing the CURL-package?

  7. jonas

    oh dude that’s it, CURL was (is) missing…
    any way to use it without installing it ? (shared hosting with no power…)

  8. Fabian Beiner

    Yes, it’s quiet easy. I’ve change my script a little and pastet it here. It’s using file_get_contents now instead of cURL. I hope this works for you :-)

  9. Fabian Beiner

    Yes, it’s quiet easy. I’ve change my script a little and pastet it here. I hope this works for you :-)

  10. Fabian Beiner

    IMDB changd their layout a little bit, I’ve sent an updated script to David, in the meanwhile anyone interessted can get it there: http://mypaste.ja-s.de/1597 – I also added a simple fallback if cURL is missing (especially for Jonas ;)).

  11. Ben

    Very nice, I’m going to integrate this into my site :)

  12. Fabian Beiner

    If something doesn’t work anymore, tell us here. :)

  13. Your wonderful php class isn’t getting any information about the director, probably they changed something in the layout. I set up a workaround by manually inserting that informations, but it would be great to have it fixed. Thank you in advance!

  14. Hey Guido, you gave me finally a reason to update the class. Check out the new (but untested :P) version here: http://fingerkribbeln.de/src/imdb.phps.

  15. Mattias

    Nice work, but with some urls I don’t get any result at all, for example = http://us.imdb.com/Title?0118883 & http://imdb.com/title/tt1060277/

    is it because there is no www in them?

  16. Mattias

    I tried the first but with the www url and now it works! (http://www.imdb.com/title/tt0118883/) Is there someway we can cast all urls to that format?

  17. jmmv

    works fine here :-)

    what about the ability to grab the film poster too ?

  18. Hey Mattias, hey jmmv! Thanks for your feedback. Please give http://fingerkribbeln.de/src/imdb.phps a try. Cheers!

  19. Mattias

    Really nice, works like a charm now!

    Is it possible to make a function that grabs the title from a string that looks something like this = Crank.High.Voltage.2009.DVDRip.XviD-BeStDivX when we just want ‘crank high voltage’ ? I think we could do this with regular expressions but I’m totally green in that subject :(

  20. Fabian Beiner

    Of course, it’s quiet easy actually. But I don’t see any reasons to support that.

  21. Mattias

    Understand that,
    I made a similar method for grabbing the poster yesterday, but sometimes It don’t display the image seems like it’s problem to directlink it, have you also noticed this?

  22. Well, no idea. I don’t use nor heavily test this class… ;-)

  23. jmmv

    hm… stopped working with the new version :(

  24. jmmv

    I see this error now:

    Warning: curl_setopt_array() [function.curl-setopt-array]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set in httpdocs/imdb.php on line 48

    (with the version before today it was working)

  25. I have again changed a lot of stuff, please give it a try: http://fingerkribbeln.de/src/imdb.phps

  26. MP

    Got problem displaying the posters, IMDB must have changed something :(

  27. I don’t have the latest source with me, but I guess I’ll add it to GitHub or so these days and check again, if everything is working. Also let me know any missing stuff… ;)

  28. MP

    I don’t know if I got the latest source three posts up, but the one I got don’t display the posters in Firefox. But seems to work fine in Opera and Chrome!

  29. MP

    Sorry to repeat myself, but with the latest version I don’t got see the poster in any browser after trying to print it :(

  30. Oh, I just tried it. Looks like IMDB only allows you to view the image if you visited IMDB before (to prevent hotlinking). Maybe I find a solution… Maybe not.

  31. Okey, uploaded latest version to GitHub again. Poster stuff is working too, at least my tests here are fine: http://github.com/FabianBeiner/PHP-IMDB-Grabber :)

  32. MP

    Nice work man!

    Is it possible to get the cast of actors displayed any nice way?

  33. Basicly yes, but I wouldnt spam all actors, maybe only the first 3-5? May I ask where you’re using my script? Just curious. :)

  34. MP

    I’m trying to build a nice and easy catalog of my DVD-collection. I know there is software like DVD Profiler but I think they are bloted and ugly.

    But I find it hard trying to use cURL myself :(

  35. MP

    Hi, I haven’t given up yet :)

    Find this nice script http://www.edmondscommerce.co.uk/blog/php/php-save-images-using-curl/ that can save the posterimage to harddrive, works like a charm. Also improves speed for loading page if you have many posts you want to display simultaneously.

    Only problem is the url may look like this: http://ia.media-imdb.com/images/M/MV5BMTEwOTE1NzUzODNeQTJeQWpwZ15BbWU3MDczODQ3NDE@._V1._SX95_SY140_.jpg

    and when it gets saved the format is without http://ia.media-imdb.com/images/M/

    If you then want to link the img you just saved on the harddrive how do you solve that?

  36. MP

    Ahh ok it’s in the basename($i) in the example, my bad

  37. I’ve updated the GIT at GitHub (http://github.com/FabianBeiner/PHP-IMDB-Grabber), just added a small image-caching-function. Give it a try! :)

  38. eddai

    Hi Fabian & Walsh,
    your class had been working up till yesterday..I use your class to echo movies’ user rating only.
    but since yesterday it show blanks. Has IMDB changed its layout ? or something else ?
    I do not know php well, so I can not check it.
    I appreciate if you can look at it.
    Thanks

  39. Yes, IMDB changed their layout, again. I’ve updated my class at GitHub, give it a try: http://github.com/FabianBeiner/PHP-IMDB-Grabber :-)

  40. MP

    Seems like some data won’t be grabbed :(

  41. IMDB is annoying. Anyway, the class is updated at GitHub.

  42. MP

    Love your work, but seems like the ‘tagline’ sometimes also gets the ‘plot’, try with the movie Valkyrie for example

    any hint on how to grab genre and actors? :)

  43. Tagline: Many saw evil. They dared to stop it.
    Plot: Based on actual events, a plot to assassinate Hitler is unfurled during the height of WWII.

    Don’t see any problems with mixing up plot and tagline. Genre would be not that hard, then again, cast is. And currently I don’t have the time to update the class, but feel free to try yourself.

  44. Nevermind, Github is updated.

  45. KIM

    Hi Fabian ,

    Nice Work Man !!!

    its will be nice if you update the class to handle cast array (actors) list

    to get all actor name , all actor poster link and the other name

    thanks in advance

  46. Hey guys, I finally got the time to update the class (now at version 4.0)… ;-)

    Check it out at GitHub: http://github.com/FabianBeiner/PHP-IMDB-Grabber (and please start to use the issue tracker there!)

    Basicly I’ve added the Cast thingy and updated most of the functions (for example, it returns all countries now, not only the first one). I’ve also renamed a few functions (for example getDirectorUrl to getDirectorAsUrl) – make *SURE* you check the example!

    Have fun.

  47. Oh, and KIM: I just get the real names of the actors, not their images nor their name in the movie. I don’t see a reason to spam this out… ;)

  48. KIM

    hmmm , will give a try !

    Thanks for the updating ! it’s really hard to get all actors in (array) , but it’s not so hard for you as a developer for the class ,

    Nice Work and I will keep you inform for any changes @ IMDB ;)

    Regards

  49. KIM

    Hello Fabian ,

    Found some strange bug , please take any Sci-Fi Title

    for example : http://www.imdb.com/title/tt0499549/

    Genre: Action | Adventure | Sci-Fi

    The (Sci-Fi) word is the problem !!!!!

    does not matter it’s in the middle of the last

    there is no problem with the Filter !

    const IMDB_GENRE = ‘#(\w+)#Ui’;

    it’s grab everything except (Sci-Fi) WORD !

    Do you know Why ? :)

    Thanks

  50. Fixed in newest version.

  51. KIM

    Hi Fabian !

    Please Send me your email , I want to thank you by send you some donation to keep this project updated !

    Thanks again :)

  52. Howdy,

    since I got some enhancements of Vladimir Kovacevic, I’ve just added them and updated the script once again. Latest version again at GitHub: http://github.com/FabianBeiner/PHP-IMDB-Grabber

    My e-mail address is in the header, if you really want to do this. :) Thanks then. And tell me, per mail or here, where do you use this script? :) I’m so curious! ;)

  53. KIM

    I just Did !

    Please Check your Paypal ACC .

    Thanks :D

  54. Andreas

    Thank you for this great PHP class! I love it!

    And I think I found a small bug. When adding certain movies the getDirector() function returns not only the directors but also the rest of the IMDb page. Quite a big chunk of data. This only happens with a few movies, for instance the movie “Monsters, Inc.”. Have tried repeatedly and it is consistant, though I only have one setup to check this with so can not be sure that it is not my system that is to blame. I’m fairly new to PHP so I failed to see what, if anything, is wrong with the code.

    Keep up the good work! Thanks again!

  55. You’re right, Andreas. Thanks for telling! Fixed version is found at http://github.com/FabianBeiner/PHP-IMDB-Grabber :-)

  56. Fabian
    I saw that for you & yours friends the script work fine. But not for me. Got the following
    Parse error: syntax error, unexpected T_STRING, expecting T_OLD_FUNCTION or T_FUNCTION or T_VAR or ‘}’ in /imdb.class.php on line 18.
    Would appreciate if you can help me. thanks
    Carlos

  57. Are you using PHP4?

  58. @Fabian Beiner:
    Yes Fabian 4.4.9
    Thanks for your pront reply

  59. This class doesn’t work with PHP4, and it never will. Since August 2008 (or so) PHP4 isn’t even supported by PHP itself anymore, so it’s seriously time for an update (or another hoster). Sorry…

  60. @Fabian Beiner:
    Thanks, should go to another hoster.
    Rgds

  61. Andreas

    Thank you Fabian for the quick fix!
    Seems to be working great now!

  62. Andreas

    Just another small note. Sometimes different movies get the same poster.

    I’ve been importing and databasing a number of movies and sometimes the posters get mixed up and different movies use the same .jpg. (ie “Crazy in Alabama” and “Leap Year” does that for me) Maybe the newly imported poster file should get a different name if the file already exists rather than just return the one thats already there. Adding a couple of random numbers to the end would also do the trick i suppose.

    Again, if I knew how to fix it myself i’d do it and give you the code but I failed to get it right and even to see how this problem could occur.

    Thanks again for a nice imdb grabber!

  63. I can’t reproduce that. The images are named after the IMDB id which is unique anyway.

  64. Andreas

    I had another look at the code. The poster jpegs do not seem to be named after the IMDB id but rather by stripping everything but numbers from the name of the poster jpeg file at IMDB, which sometimes resulted in my problem of different movies using the same poster.

    I fixed this by replacing basename($sUrl)) with basename($this->_sUrl)) on the line in saveImage() that creates the $sFilename variable. Like so:

    $sFilename = getcwd() . '/posters/' . preg_replace("#[^0-9]#", "", basename($this->_sUrl)) . '.jpg';
  65. MP

    Hi guys, I try save the information to my own database first time, how to remove the “See more” link from tagline? It seems like it’s destroy my SQL-query.

  66. MP

    Ok, I think I got it covered with a easy fix :)

    Saved the tagline and plot in variables, and split them at ‘<' because I dont want the link "See more" after tagline, and sometimes there is a link after the plot that says "Summary" or something like it.

    $fixedTagline=explode('<',$tagline);
    $fixedPlot=explode('<',$plot);

    echo $fixedTagline[0]; does the trick

  67. MP

    Hi, anyone still reading here? :) Image function does not work again, seems like IMDB have renamed the div or something.

  68. Fabian Beiner’s latest code at github works fantastic except that it throws an error when you try to see a serial page. try searching “last airbender” it redirects serial page rather then movie page and then it gives an error when try to get cast. Nice work, thanks a lot!

  69. hxcracks

    @Fabian Beiner: code works great i have 1 questuion it gets .jpeg but can it also get .gif ?

  70. Nope, actually I’ve never seen a .gif as poster, so it only handles .jpg at the moment. Please add an issue at GitHub (http://github.com/FabianBeiner/PHP-IMDB-Grabber/issues) for *ANYTHING* you want.

  71. hxcracks

    @Fabian Beiner: Warning: imagecreatefromjpeg() [function.imagecreatefromjpeg]: ‘http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif’ is not a valid JPEG file in G:\xampp\htdocs\imdb.class.php on line 125

  72. hxcracks

    basicly it happens if IMDB dont have a poster on there web page.
    example…
    http://www.imdb.com/title/tt1274725/

  73. There is a new version on http://github.com/FabianBeiner/PHP-IMDB-Grabber – I hope it fixes all of the reported issues here. Changes: http://github.com/FabianBeiner/PHP-IMDB-Grabber/commits/master/ – and remember: Bugs only through GitHub!

  74. Claudiu Dan

    Hello!
    I would appreciate some help from you guys!
    I am trying to install this script on a WordPress blog.Can you tell me please exactly what should I do to make this work only in Single Post?
    Thank you!

  75. This is not meant as WordPress plug-in, sorry.

  76. Linus

    Hi Fabian!
    Awesome class you made!
    Everything works like a charm except pulling the Cast data.
    Can you do this?
    Can you also do a function that pulls the high resolution picture on imdb?

    Really appriciate it, I’m using it for my own internal movie database so I just can browse and search for a good movie.

    Best regards.

  77. Hi! I use this awesome class for one of my websites. Everything has worked fine until today. I got n/A for every information there is with your updated code, besides of “Title” and “URL”. Is it because IMDB has changed their website?

    Thanks!

  78. With the latest version (4.4.1) everything is working well. Only country was wrong this morning. And again, SUPPORT ONLY THROUGH GITHUB OR EMAIL!

  79. joseph

    Hi,
    I’ve been using your imdb grabber for months.. but it is not working for 2-3 days.. imdb has changed its layout again ? could you please make an update ?
    thanks~

  80. Fabian Beiner

    http://github.com/FabianBeiner/PHP-IMDB-Grabber – v5.0.1 is a rewrite with a few new functions and a small caching system.

  81. pru

    still not working :( tried with “batman” and “Tomorrow Never Dies”. imdb.tests.php not working either

  82. Gosh, how many times did I tell you that I want support questions not on this blog? This belongs to David and I guess he doesn’t want us to use his blog as bug reporting database.

    Anyway, it’s working as a charm. Check if the cache and posters directory is writeable and that curl is installed.

  83. pru

    I’m so sorry. Yes, it works as a charm, it was curl that was not installed :( shame on me

  84. Torbjorn Ljung

    Hello !
    I’m very new to PHP.
    When I run your code, the one without cURL. I get lots of error messages like:

    Warning: file_get_contents(http://www.imdb.com/title/tt0139134/" onclick="(new Image()).src='/rg/find-media-title/media_strip/images/b.gif?link=/title/tt0139134/';) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in C:\xampp\htdocs\fabian.php on line 43
    and
    Notice: Undefined offset: 1 in C:\xampp\htdocs\fabian.php on line 49
    

    Do you know what’s going on ?
    /Torbjorn Ljung
    Sweden

Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!