IMDB Grabber Class by Fabian Beiner
One of my readers, Fabian Beiner, took my PHP-based IMDB Grabber script a step further and made it into a useful class. The class also has the ability to grab more information my script did. Check it out!
The PHP
<?php /** * @author Fabian Beiner <fabianDOTbeinerATgmailDOTcom> * @version 2.1alpha * * @comment Original idea by David Walsh <davidATdavidwalshDOTname>, thanks! Your blog rocks ;) * I did this script in the middle of the night while being ill, no guarantee for anything! * * @license http://creativecommons.org/licenses/by-sa/3.0/de/deed.en_US * Creative Commons Attribution-Share Alike 3.0 Germany * * Yay, after two days IMDB changed their layout... Great! :( Also added a fallback if cURL is missing. */ class IMDB { function __construct($url) { $this->gotCurl = extension_loaded('curl'); $imdb_content = $this->imdbHandler($url); $this->movie = trim($this->getMatch('|<title>(.*) \((.*)\)</title>|Uis', $imdb_content)); $this->director = trim($this->getMatch('|<h5>Director:</h5><a href="(.*)">(.*)</a><br/>|Uis', $imdb_content, 2)); $this->url_director = trim($this->getMatch('|<h5>Director:</h5><a href="(.*)">(.*)</a><br/>|Uis', $imdb_content)); $this->plot = trim($this->getMatch('|<h5>Plot:</h5>(.*) <a|Uis', $imdb_content)); $this->release_date = trim($this->getMatch('|<h5>Release Date:</h5> (.*) \((.*)\) <a|Uis', $imdb_content)); $this->mpaa = trim($this->getMatch('|<h5><a href="/mpaa">MPAA</a>:</h5> (.*)</div>|Uis', $imdb_content)); $this->run_time = trim($this->getMatch('|Runtime:</h5>(.*) (.*)</div>|Uis',$imdb_content)); $this->rating = trim($this->getMatch('|<div class="meta"><b>(.*)</b>|Uis', $imdb_content)); $this->votes = trim($this->getMatch('| <a href="ratings" class="tn15more">(.*) votes</a>|Uis', $imdb_content)); $this->country = trim($this->getMatch('|<h5>Country:</h5><a href="(.*)">(.*)</a></div>|Uis', $imdb_content, 2)); $this->url_country = trim($this->getMatch('|<h5>Country:</h5><a href="(.*)">(.*)</a></div>|Uis', $imdb_content)); } function imdbHandler($input) { if (!$this->getMatch('|^http://(.*)$|Uis', $input)) { $tmpUrl = 'http://us.imdb.com/find?s=all&q='.str_replace(' ', '+', $input).'&x=0&y=0'; if ($this->gotCurl) { $ch = curl_init(); curl_setopt_array($ch, array(CURLOPT_URL => $tmpUrl, CURLOPT_HEADER => false, CURLOPT_RETURNTRANSFER => true, CURLOPT_TIMEOUT => 10 ) ); $data = curl_exec($ch); curl_close($ch); } else { $data = file_get_contents($tmpUrl); } $foundMatch = $this->getMatch('|<p style="margin:0 0 0.5em 0;"><b>Media from <a href="(.*)">(.*)</a> ((.*))</b></p>|Uis', $data); if ($foundMatch) { $this->url = 'http://www.imdb.com'.$foundMatch; } else { $this->url = ''; return 0; } } else { $this->url = $input; } if ($this->gotCurl) { $ch = curl_init(); curl_setopt_array($ch, array(CURLOPT_URL => $this->url, CURLOPT_HEADER => false, CURLOPT_RETURNTRANSFER => true, CURLOPT_TIMEOUT => 10 ) ); $data = curl_exec($ch); curl_close($ch); } else { $data = file_get_contents($this->url); } return str_replace("\n",'',(string)$data); } function getMatch($regex, $content, $index=1) { preg_match($regex, $content, $matches); return $matches[(int)$index]; } function showOutput() { if ($this->url) { $content.= '<h2>Film</h2><p>'.$this->movie.'</p>'; $content.= '<h2>Director</h2><p><a href="http://www.imdb.com'.$this->url_director.'">'.$this->director.'</a></p>'; $content.= '<h2>Plot</h2><p>'.$this->plot.'</p>'; $content.= '<h2>Release Date</h2><p>'.$this->release_date.'</p>'; $content.= '<h2>MPAA</h2><p>'.$this->mpaa.'</p>'; $content.= '<h2>Run Time</h2><p>'.$this->run_time.' minutes</p>'; $content.= '<h2>Full Details</h2><p><a href="'.$this->url.'">'.$this->url.'</a></p>'; $content.= '<h2>Rating</h2><p>'.$this->rating.'</p>'; $content.= '<h2>Votes</h2><p>'.$this->votes.' votes</p>'; $content.= '<h2>Country</h2><p><a href="http://www.imdb.com'.$this->url_country.'">'.$this->country.'</a></p>'; echo $content; } else { echo 'Sorry, nothing found! :('; } } } // Examples :) $imdb = new IMDB('http://www.imdb.com/title/tt0367882/'); print($imdb->showOutput()); echo '<hr>'; $imdb = new IMDB('Cruel Intentions'); print($imdb->showOutput()); echo '<hr>'; $imdb = new IMDB('I guess this movie name doesnt exists'); print($imdb->showOutput()); ?>
Thanks to Fabian for sharing! Enjoy!
Hey Dave,
Have you seen this DOM scraping/parser class, very handy for scraping any site!!
http://simplehtmldom.sourceforge.net/
Has an example for scraping Slashdot and Google!
Paul
I’m not Dave but I thought about using this class for my version of this script, too. But in the end there was no reason to use an almost 1000 lines long class for a script with <100 lines. ;)
he he yeah fair point !!
My post was more of a heads up and something that might be handy to have in the arsenal for future projects!!
strange, I only see a white page when using this script..
it only shows something when I comment the examples away and put
echo "blabla";
in the end of the script, it seems that as soon as I call
new IMDB('..');
it stops working…
someone knows why ?
Maybe put
error_reporting(E_ALL);
on the very top of my PHP file and see if there are any errors. You maybe don’t have the curl-extension installed – and depending on your setup there won’t be any error messages. Did this help?Maybe you’re missing the CURL-package?
oh dude that’s it, CURL was (is) missing…
any way to use it without installing it ? (shared hosting with no power…)
Yes, it’s quiet easy. I’ve change my script a little and pastet it here. It’s using file_get_contents now instead of cURL. I hope this works for you :-)
Yes, it’s quiet easy. I’ve change my script a little and pastet it here. I hope this works for you :-)
IMDB changd their layout a little bit, I’ve sent an updated script to David, in the meanwhile anyone interessted can get it there: http://mypaste.ja-s.de/1597 – I also added a simple fallback if cURL is missing (especially for Jonas ;)).
Very nice, I’m going to integrate this into my site :)
If something doesn’t work anymore, tell us here. :)
Your wonderful php class isn’t getting any information about the director, probably they changed something in the layout. I set up a workaround by manually inserting that informations, but it would be great to have it fixed. Thank you in advance!
Hey Guido, you gave me finally a reason to update the class. Check out the new (but untested :P) version here: http://fingerkribbeln.de/src/imdb.phps.
Nice work, but with some urls I don’t get any result at all, for example = http://us.imdb.com/Title?0118883 & http://imdb.com/title/tt1060277/
is it because there is no www in them?
I tried the first but with the www url and now it works! (http://www.imdb.com/title/tt0118883/) Is there someway we can cast all urls to that format?
works fine here :-)
what about the ability to grab the film poster too ?
Hey Mattias, hey jmmv! Thanks for your feedback. Please give http://fingerkribbeln.de/src/imdb.phps a try. Cheers!
Really nice, works like a charm now!
Is it possible to make a function that grabs the title from a string that looks something like this = Crank.High.Voltage.2009.DVDRip.XviD-BeStDivX when we just want ‘crank high voltage’ ? I think we could do this with regular expressions but I’m totally green in that subject :(
Of course, it’s quiet easy actually. But I don’t see any reasons to support that.
Understand that,
I made a similar method for grabbing the poster yesterday, but sometimes It don’t display the image seems like it’s problem to directlink it, have you also noticed this?
Well, no idea. I don’t use nor heavily test this class… ;-)
hm… stopped working with the new version :(
I see this error now:
Warning: curl_setopt_array() [function.curl-setopt-array]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set in httpdocs/imdb.php on line 48
(with the version before today it was working)
I have again changed a lot of stuff, please give it a try: http://fingerkribbeln.de/src/imdb.phps
Got problem displaying the posters, IMDB must have changed something :(
I don’t have the latest source with me, but I guess I’ll add it to GitHub or so these days and check again, if everything is working. Also let me know any missing stuff… ;)
I don’t know if I got the latest source three posts up, but the one I got don’t display the posters in Firefox. But seems to work fine in Opera and Chrome!
Check GitHub: http://github.com/FabianBeiner/PHP-IMDB-Grabber/blob/master/imdb.class.php :)
Sorry to repeat myself, but with the latest version I don’t got see the poster in any browser after trying to print it :(
Oh, I just tried it. Looks like IMDB only allows you to view the image if you visited IMDB before (to prevent hotlinking). Maybe I find a solution… Maybe not.
Okey, uploaded latest version to GitHub again. Poster stuff is working too, at least my tests here are fine: http://github.com/FabianBeiner/PHP-IMDB-Grabber :)
Nice work man!
Is it possible to get the cast of actors displayed any nice way?
Basicly yes, but I wouldnt spam all actors, maybe only the first 3-5? May I ask where you’re using my script? Just curious. :)
I’m trying to build a nice and easy catalog of my DVD-collection. I know there is software like DVD Profiler but I think they are bloted and ugly.
But I find it hard trying to use cURL myself :(
Hi, I haven’t given up yet :)
Find this nice script http://www.edmondscommerce.co.uk/blog/php/php-save-images-using-curl/ that can save the posterimage to harddrive, works like a charm. Also improves speed for loading page if you have many posts you want to display simultaneously.
Only problem is the url may look like this: http://ia.media-imdb.com/images/M/MV5BMTEwOTE1NzUzODNeQTJeQWpwZ15BbWU3MDczODQ3NDE@._V1._SX95_SY140_.jpg
and when it gets saved the format is without http://ia.media-imdb.com/images/M/
If you then want to link the img you just saved on the harddrive how do you solve that?
Ahh ok it’s in the basename($i) in the example, my bad
I’ve updated the GIT at GitHub (http://github.com/FabianBeiner/PHP-IMDB-Grabber), just added a small image-caching-function. Give it a try! :)
Hi Fabian & Walsh,
your class had been working up till yesterday..I use your class to echo movies’ user rating only.
but since yesterday it show blanks. Has IMDB changed its layout ? or something else ?
I do not know php well, so I can not check it.
I appreciate if you can look at it.
Thanks
Yes, IMDB changed their layout, again. I’ve updated my class at GitHub, give it a try: http://github.com/FabianBeiner/PHP-IMDB-Grabber :-)
Seems like some data won’t be grabbed :(
IMDB is annoying. Anyway, the class is updated at GitHub.
Love your work, but seems like the ‘tagline’ sometimes also gets the ‘plot’, try with the movie Valkyrie for example
any hint on how to grab genre and actors? :)
Tagline: Many saw evil. They dared to stop it.
Plot: Based on actual events, a plot to assassinate Hitler is unfurled during the height of WWII.
Don’t see any problems with mixing up plot and tagline. Genre would be not that hard, then again, cast is. And currently I don’t have the time to update the class, but feel free to try yourself.
Nevermind, Github is updated.
Hi Fabian ,
Nice Work Man !!!
its will be nice if you update the class to handle cast array (actors) list
to get all actor name , all actor poster link and the other name
thanks in advance
Hey guys, I finally got the time to update the class (now at version 4.0)… ;-)
Check it out at GitHub: http://github.com/FabianBeiner/PHP-IMDB-Grabber (and please start to use the issue tracker there!)
Basicly I’ve added the Cast thingy and updated most of the functions (for example, it returns all countries now, not only the first one). I’ve also renamed a few functions (for example getDirectorUrl to getDirectorAsUrl) – make *SURE* you check the example!
Have fun.
Oh, and KIM: I just get the real names of the actors, not their images nor their name in the movie. I don’t see a reason to spam this out… ;)
hmmm , will give a try !
Thanks for the updating ! it’s really hard to get all actors in (array) , but it’s not so hard for you as a developer for the class ,
Nice Work and I will keep you inform for any changes @ IMDB ;)
Regards
Hello Fabian ,
Found some strange bug , please take any Sci-Fi Title
for example : http://www.imdb.com/title/tt0499549/
Genre: Action | Adventure | Sci-Fi
The (Sci-Fi) word is the problem !!!!!
does not matter it’s in the middle of the last
there is no problem with the Filter !
const IMDB_GENRE = ‘#(\w+)#Ui’;
it’s grab everything except (Sci-Fi) WORD !
Do you know Why ? :)
Thanks
Fixed in newest version.
Hi Fabian !
Please Send me your email , I want to thank you by send you some donation to keep this project updated !
Thanks again :)
Howdy,
since I got some enhancements of Vladimir Kovacevic, I’ve just added them and updated the script once again. Latest version again at GitHub: http://github.com/FabianBeiner/PHP-IMDB-Grabber
My e-mail address is in the header, if you really want to do this. :) Thanks then. And tell me, per mail or here, where do you use this script? :) I’m so curious! ;)
I just Did !
Please Check your Paypal ACC .
Thanks :D
Thank you for this great PHP class! I love it!
And I think I found a small bug. When adding certain movies the getDirector() function returns not only the directors but also the rest of the IMDb page. Quite a big chunk of data. This only happens with a few movies, for instance the movie “Monsters, Inc.”. Have tried repeatedly and it is consistant, though I only have one setup to check this with so can not be sure that it is not my system that is to blame. I’m fairly new to PHP so I failed to see what, if anything, is wrong with the code.
Keep up the good work! Thanks again!
You’re right, Andreas. Thanks for telling! Fixed version is found at http://github.com/FabianBeiner/PHP-IMDB-Grabber :-)
Fabian
I saw that for you & yours friends the script work fine. But not for me. Got the following
Parse error: syntax error, unexpected T_STRING, expecting T_OLD_FUNCTION or T_FUNCTION or T_VAR or ‘}’ in /imdb.class.php on line 18.
Would appreciate if you can help me. thanks
Carlos
Are you using PHP4?
@Fabian Beiner:
Yes Fabian 4.4.9
Thanks for your pront reply
This class doesn’t work with PHP4, and it never will. Since August 2008 (or so) PHP4 isn’t even supported by PHP itself anymore, so it’s seriously time for an update (or another hoster). Sorry…
@Fabian Beiner:
Thanks, should go to another hoster.
Rgds
Thank you Fabian for the quick fix!
Seems to be working great now!
Just another small note. Sometimes different movies get the same poster.
I’ve been importing and databasing a number of movies and sometimes the posters get mixed up and different movies use the same .jpg. (ie “Crazy in Alabama” and “Leap Year” does that for me) Maybe the newly imported poster file should get a different name if the file already exists rather than just return the one thats already there. Adding a couple of random numbers to the end would also do the trick i suppose.
Again, if I knew how to fix it myself i’d do it and give you the code but I failed to get it right and even to see how this problem could occur.
Thanks again for a nice imdb grabber!
I can’t reproduce that. The images are named after the IMDB id which is unique anyway.
I had another look at the code. The poster jpegs do not seem to be named after the IMDB id but rather by stripping everything but numbers from the name of the poster jpeg file at IMDB, which sometimes resulted in my problem of different movies using the same poster.
I fixed this by replacing
basename($sUrl))
withbasename($this->_sUrl))
on the line insaveImage()
that creates the$sFilename
variable. Like so:Hi guys, I try save the information to my own database first time, how to remove the “See more” link from tagline? It seems like it’s destroy my SQL-query.
Ok, I think I got it covered with a easy fix :)
Saved the tagline and plot in variables, and split them at ‘<' because I dont want the link "See more" after tagline, and sometimes there is a link after the plot that says "Summary" or something like it.
$fixedTagline=explode('<',$tagline);
$fixedPlot=explode('<',$plot);
echo $fixedTagline[0]; does the trick
Hi, anyone still reading here? :) Image function does not work again, seems like IMDB have renamed the div or something.
Yes. Check http://github.com/FabianBeiner/PHP-IMDB-Grabber
Fabian Beiner’s latest code at github works fantastic except that it throws an error when you try to see a serial page. try searching “last airbender” it redirects serial page rather then movie page and then it gives an error when try to get cast. Nice work, thanks a lot!
@Fabian Beiner: code works great i have 1 questuion it gets .jpeg but can it also get .gif ?
Nope, actually I’ve never seen a .gif as poster, so it only handles .jpg at the moment. Please add an issue at GitHub (http://github.com/FabianBeiner/PHP-IMDB-Grabber/issues) for *ANYTHING* you want.
@Fabian Beiner: Warning: imagecreatefromjpeg() [function.imagecreatefromjpeg]: ‘http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif’ is not a valid JPEG file in G:\xampp\htdocs\imdb.class.php on line 125
basicly it happens if IMDB dont have a poster on there web page.
example…
http://www.imdb.com/title/tt1274725/
There is a new version on http://github.com/FabianBeiner/PHP-IMDB-Grabber – I hope it fixes all of the reported issues here. Changes: http://github.com/FabianBeiner/PHP-IMDB-Grabber/commits/master/ – and remember: Bugs only through GitHub!
Hello!
I would appreciate some help from you guys!
I am trying to install this script on a WordPress blog.Can you tell me please exactly what should I do to make this work only in Single Post?
Thank you!
This is not meant as WordPress plug-in, sorry.
Hi Fabian!
Awesome class you made!
Everything works like a charm except pulling the Cast data.
Can you do this?
Can you also do a function that pulls the high resolution picture on imdb?
Really appriciate it, I’m using it for my own internal movie database so I just can browse and search for a good movie.
Best regards.
Hi! I use this awesome class for one of my websites. Everything has worked fine until today. I got n/A for every information there is with your updated code, besides of “Title” and “URL”. Is it because IMDB has changed their website?
Thanks!
With the latest version (4.4.1) everything is working well. Only country was wrong this morning. And again, SUPPORT ONLY THROUGH GITHUB OR EMAIL!
Hi,
I’ve been using your imdb grabber for months.. but it is not working for 2-3 days.. imdb has changed its layout again ? could you please make an update ?
thanks~
http://github.com/FabianBeiner/PHP-IMDB-Grabber – v5.0.1 is a rewrite with a few new functions and a small caching system.
still not working :( tried with “batman” and “Tomorrow Never Dies”. imdb.tests.php not working either
Gosh, how many times did I tell you that I want support questions not on this blog? This belongs to David and I guess he doesn’t want us to use his blog as bug reporting database.
Anyway, it’s working as a charm. Check if the cache and posters directory is writeable and that curl is installed.
I’m so sorry. Yes, it works as a charm, it was curl that was not installed :( shame on me
Hello !
I’m very new to PHP.
When I run your code, the one without cURL. I get lots of error messages like:
Do you know what’s going on ?
/Torbjorn Ljung
Sweden