Treehouse

Download a URL’s Content Using PHP cURL

By on  

Downloading content at a specific URL is common practice on the internet, especially due to increased usage of web services and APIs offered by Amazon, Alexa, Digg, etc. PHP's cURL library, which often comes with default shared hosting configurations, allows web developers to complete this task.

The Code

/* gets the data from a URL */
function get_data($url) {
	$ch = curl_init();
	$timeout = 5;
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
	$data = curl_exec($ch);
	curl_close($ch);
	return $data;
}

The Usage

$returned_content = get_data('http://davidwalsh.name');

Alternatively, you can use the file_get_contents function remotely, but many hosts don't allow this.

ydkjs-6.png

Recent Features

  • 5 HTML5 APIs You Didn’t Know Existed

    When you say or read "HTML5", you half expect exotic dancers and unicorns to walk into the room to the tune of "I'm Sexy and I Know It."  Can you blame us though?  We watched the fundamental APIs stagnate for so long that a basic feature...

  • Write Better JavaScript with Promises

    You've probably heard the talk around the water cooler about how promises are the future. All of the cool kids are using them, but you don't see what makes them so special. Can't you just use a callback? What's the big deal? In this article, we'll...

Incredible Demos

  • Creating the Treehouse Frog Animation

    Before we start, I want to say thank you to David for giving me this awesome opportunity to share this experience with you guys and say that I’m really flattered. I think that CSS animations are really great. When I first learned how CSS...

  • CSS pointer-events

    The responsibilities taken on by CSS seems to be increasingly blurring with JavaScript. Consider the -webkit-touch-callout CSS property, which prevents iOS's link dialog menu when you tap and hold a clickable element. The pointer-events property is even more JavaScript-like, preventing: click actions from doing...

Discussion

  1. For your script we can also add a User Agent:

    $userAgent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)';
    curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
    
    Some other options I use:
    curl_setopt($ch, CURLOPT_FAILONERROR, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 10);
    
    • Ghulam Rasool

      Hi Shawn,
      using user agent as option helped me to sort out my problem.

      Thanks

  2. Alternatively you can use the PHP DOM:

    $keywords = array();
    $domain = array(‘http://davidwalsh.name’);
    $doc = new DOMDocument;
    $doc->preserveWhiteSpace = FALSE;
    foreach ($domain as $key => $value) {
    @$doc->loadHTMLFile($value);
    $anchor_tags = $doc->getElementsByTagName(‘a’);
    foreach ($anchor_tags as $tag) {
    $keywords[] = strtolower($tag->nodeValue);
    }
    }

    Keep in mind this is not a tested piece of code, I took parts from a working script I have created and cut out several of the checks I’ve put in to remove whitespace, duplicates, and more.

  3. Excellent additions Shawn — thank you for posting them!

  4. And with this great power, comes great responsibility =)

  5. Very true Chris. It’s up to the developer to use it for good or evil. I suppose I’ve used it for both in the past.

    For downloading remote XML or text files, this script has been golden.

  6. KP

    Great script! Does anyone know how to use that script to save the content it gathered and save it to a file locally on the server?

  7. @KP: Check out my other article, Basic PHP File Handling — Create, Open, Read, Write, Append, Close, and Delete, here:

    http://davidwalsh.name/basic-php-file-handling-create-open-read-write-append-close-delete

  8. Usman

    I am trying to use this function “get_data($url)”, but it gives blank page when I echoed it. Anybody can please help me?

  9. @Usman: There are a few reason why you may get a blank page. You may not have CURL installed on the server. The other possibility is that you need to “echo” the content before you close the connection — someone brought this issue to me the other day.

  10. Usman

    Hello David,
    I am still unable to get result of it, I have checked(using phpinfo()) that CURL is installed. But its giving blank page. When I tried it from php command line its working.

    • Hi, this script works for me but unfortunately fails on urls from same domain as calling script. i cant see any error in error.log

  11. Dru

    Works like a charm!

  12. Works just like…. file_get_contents! Thanks.

  13. Ajay

    The code is very effective. but the problem is it returns all the html tags like and others. so is there anyway to get rid of it?

  14. bit

    this code is way too short, even php.net probably has a longer version! beware if you use this to enable other users to make the URL requests, they can easily use it to upload malicious code/whole new pages/huge files, like mp3s or movies, that will eat up all your bandwidth.

  15. Do you know of a way to have it click a link on a page. I’m trying to work with another companies registration form. Its a stupid asp page. On the first page it puts ?DoctorId=13074 at the end of the url. On the next page with the registration form it dynamically makes a random string in a hidden input box that gets posted with the form. So is there any way I can have it click and link once it loads a page?

  16. Thomas Alexander

    hi,
    I’m using curl to get details from an api call, in one of my api call it returns a zip file,
    i like to force download that zip file , how can i do this with curl

  17. Joel Kiskola

    David Walsh code does not give anything to me.
    Why?

    I did include php tags before and after both codes.

  18. @Joel – cause you have to add :
    echo $returned_content
    after the last line ($returned_content = get_data(‘http://davidwalsh.name’);)

  19. i want to onclick a link after getting contents of webpage
    how to do it?

  20. George

    I would like to remove the xml declaration from the returned url.

    I am appending the gathered data to an existing php/xml file and do not want it.

    is there a simple solution??

  21. FMdB

    hi!

    im trying to parse AJAX with PHP, problem is:

    when i read the URL SOURCE, the AJAX part isn’t visible, and i only grab HTML from the rest of site.

    how to solve this problem? any ideas?

  22. Kelly

    Is there a way to use curl in php like you can in the command line. aka

    curl http://mydomain.com/picture.jpg -o “PATH_TO_SAVE_TO”

    This would download a picture from a website and put it in a folder on my server. It works from Terminal but i cannot find the equivalent in PHP.

    If anyone nows the answer to this I would greatly Appreciate it.

  23. Is there a way to use curl in php like you can in the command line. aka

    curl http://mydomain.com/picture.jpg -o “PATH_TO_SAVE_TO”

    This would download a picture from a website and put it in a folder on my server. It works from Terminal but i cannot find the equivalent in PHP.

    If anyone nows the answer to this I would greatly Appreciate it.

  24. Is there a way to use curl in php like you can in the command line. aka

    curl http://mydomain.com/picture.jpg -o “PATH_TO_SAVE_TO”

    This would download a picture from a website and put it in a folder on my server. It works from Terminal but i cannot find the equivalent in PHP.

    If anyone nows the answer to this I would greatly Appreciate it.

  25. george

    @Kelly: yes,

    place something like this in your php: 3 options,

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,
            "http://www.whateveryouwant.com.php.html.xml");
    curl_setopt($ch, CURLOPT_HEADER, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $xml_language = curl_exec($ch);
    curl_close($ch);
    echo "$xml.php.html_whatever";
    

    you have options using curl:
    return the data in with database driven string. returns the data and appends it to your php, html,xml etc. VERY HANDY – esp. for flash and others, see: worldwideweather.com – forum,

    this trick allows flash too read an external xml file for its language and database info. using php to call the user-specific info you can write the flash xml on the fly – this script returns the users language interface for flash, php calls the xml – the user is spanish (language) ES and appending to the php xml call, the the file is read and writes this into the php script itself with , very very fast

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,
            "http://www.verdegia.com/Files/System/TEST/Language/M_TEXT_" . $line{"Language"} . ".xml");
    curl_setopt($ch, CURLOPT_HEADER, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $xml_language = curl_exec($ch);
    curl_close($ch);
    echo "$xml_language";
    

    return the data in external xml file from php user specific database call ” string – gets data specific for user and generates file on the fly , xml, php, html whatever..:

    	
    $sql="SELECT * FROM $tbl_name WHERE username='$myusername'";
    				$results = mysql_query($sql);
    		while($line=mysql_fetch_assoc($results)) {
    				$file = "http://www.worldweatheronline.com/feed/weather.ashx?q=" . $line{"Postcode"} . "&format=xml&num_of_days=5&key=6c7e92e827155910100801";	
    				}
    				
    $ch = curl_init($file);
    $fp = @fopen("../Files/System/TEST/temp.xml", "w");
    curl_setopt($ch, CURLOPT_FILE, $fp);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_exec($ch);
    curl_close($ch);
    fclose($fp);
    $file = "../Files/System/TEST/temp.xml";
    $fp = fopen($file, "r");
    

    HOPE THIS HELPS

  26. Can we use this function to parse all content in a url?

  27. @Indonesia: Except there are a lot more options. I (believe) that it’s possible to get the whole HTTP response using CURL, and (believe) that that is not true with ‘file_get_contents())

  28. $sql = “UPDATE staff SET
    staffNo = $staff_no,
    f_name=$fname,
    l_name=$lname,
    sex=$sex,
    DOB=$dob,
    position=$position,
    salary=$salary,
    hiredate=$hiredate,
    contact_id=$contact_id,
    branchNo=$branch_no
    WHERE staffNo=$staff_no”;
    $query = mysql_query($sql) or die(“Cannot query the database.” . mysql_error());
    echo “Database Updated.”;

  29. Great,

    It is usefull to get xml or images from other site. if server is not able to get content from fopen.

    Thanks

  30. Great,

    It is useful to get xml or images from other site. if server is not able to get content from fopen.

    Thanks

  31. Nice,
    PHP provide other two method to fetch an URL – Curl and Fsockopen.
    To use they you can check this example : http://www.bin-co.com/php/scripts/load/

  32. how to write return into new file?

  33. Hi,

    Can you put the curl call in a loop, i have a list of about 1000 urls that i want to ‘hit’ so the caches build up, can i just chuck the above code into a loop or will that be too resource heavy?

    Thanks

    Pete

  34. Thanks for the code..Great!

  35. Myister

    It is very possible to put this into a automatic crawler for user inputted sites or even make a automatic crawl out of this… The code is short but it works for only one page at a time.. To make it look at multiple pages you have to do some minor PHP coding but nothing major…

    I am working on a script right now that works using the code above and just keeps crawling based on the links that on on the initial web page Crawled. A non stop Spider script! They are already out there but I like to say I can make one too…

    The script will also take the Meta tags ( Description and Keywords and place them into a database too. Thus giving me a search engine and not a user submitted directory…

    if you would like to join the team simply e-mail me at justin2009@gmail.com

  36. Learnphp123

    I want to extract the images present in the URL and first paragraph from the url. How can I do that?

    • Hi there I can’t post code here so I can provide you mine class which extract the purticular tag from the return page it could be any html tag.

      Regards,
      M. Singh

  37. Comrade

    A simple question..how to accelerate the downloading process using cURL it is damn slow…takes sometimes 45sec to download 4kb page

  38. Comrade

    A simple question..how to accelerate the downloading process using cURL it is damn slow…takes sometimes 45sec to download 4kb page

    • Ivan Ivković

      It depends upon the configuration of the host server.

  39. How to work with https. I have site which not loading when I try to open https it simple return 405 error. any help please mail me at msingh@ekomkaar.com

  40. http//www.zerospeedsensor.com/

  41. ie it doesnt return any output.

  42. Octav

    Very strange. This function returns only relative small pages.
    It works if the source code has under 200 lines.
    If the web page is bigger won’t return anything. Not even errors.
    Same thing happens with file_get_contents.

    PHP Version 5.2.11
    memory_limit – 128M
    max_execution_time – 30
    error_reporting(E_ALL)

    Any idea?

  43. Hey David,

    I did searched on net to find rough code by which i can get “Reciprocal” back links status.

    This helps me finally. :)

    I do modify it as per my need.

    To check backlinks
    0)
    {
    echo ‘found’;
    }else{
    echo ‘Not found’;
    }
    }

    $remote_url = ‘http://www.listsdir.com/’;
    $mmb_url = ‘http://mymoviesbuzz.com/titles/’;

    $returned_content = get_data($remote_url,$mmb_url);
    ?>

    Thanks.

  44. Thanks, this script help me to move my wordpress content to new host.

  45. harish

    I think good practice to use CURLOPT_USERAGENT in cURL scripts…

  46. In order to read sites encrypted by SSL, like Google Calendar feeds, you must set these CURL options:

    curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,false);
    curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
    
  47. Hello David,
    How can I download a file from remote url? I’ve try using your method but no luck :(

  48. how can login by curl

  49. How repeat the process

  50. Thunderbird

    How can I download the contents of a website that requires login??

    • You need to set session for that and pass them with header so they can use as normal login process. For further details you can contact me at msingh@ekomkaar.com

  51. Vinoth Kumar

    I’m running Web hosting Website. There My Domain Provider gave me some HTTP API’s. I tried to implement them but i’m getting empty response from curl. Its a HTTPS url and i used

    curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,false);
    curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
    

    params in my curl. But still getting empty response. Can anyone help me in this! I’m new to cURL :’(

  52. Giu87

    It is possible to retrieve the code inserted into html tag (i.e. flash)?

    More precisely, @ the linkedin page of a skill:

    http://www.linkedin.com/skills/skill/Java?trk=skills-pg-search,

    there is a graphic obtained by an tag, which returns an image.

    When I take source code of the page or when I use file_get_contents() php function, I can obtain only the returned tag.

    I can see on the Firefox analysis of the page all these information, but I want an automatic script.

    Any solution?

  53. Mika Andrianarijaona

    Thank you, the code is working fine for me. You are saving me ;)

  54. andrei

    thank you! i looked for this code quite some time.

  55. David

    Thank you!

    I wasn’t fully getting it, and just want a script I could copy and paste, make sure things work working, then modify from there. This was the only one I could find that actually return the content. Thank you!

    I’ve been here on a few occasions and appreciate every aspect of the user friendly design! And every article is quality. Thanks!
    (Wow, being positive in a somewhat general way like that kind of resembles the ever infamous spam comments. Sorry :-/ )

  56. Thanks a lot for the script!

  57. bookaazaaaka
    community where computer graphics artists share their latest work
    bambarmia

  58. I am trying to run curl on localhost, I have changed php.ini. No errors a blank page only coming..is there any other settings in php.ini or apache settings?

  59. [...] you use file_gets_contents($website) or cURL to load a website, does it load the whole website? I am mostly interested about using [...]

  60. [...] get_data is a wrapper function for curl. [...]

  61. Tom

    When I use the PHP curl function, it always wants to first return (as in echo) the contents of the URL when I only want it assigned to a variable. Is there a way to stop this from happening? I used your code exactly and simply called it from the main program. The behavior is the same if I call the php program from the command line or from via a browser.

  62. Tom

    Please ignore my previous post. For some unknown reason, I was overlooking a simple echo statement in the midst of my sloppy code. Duh…

    It works just fine!

  63. Thanks . It works. i have one question. is it possible to filter the result. i mean, i want to publish some contents and some not.
    Thank You

  64. Only thing that may be missing is potential redirects, potential sessions, and maybe a few other thing (browser as mentioned),. E.g. if you will download a file you will often be redirected or you will need to use sessions. The solution to this will something like this:

    curl_setopt($s,CURLOPT_FOLLOWLOCATION,1);
    curl_setopt($s,CURLOPT_COOKIEJAR, '/tmp/cookie.txt');
    curl_setopt($s,CURLOPT_COOKIEFILE,'/tmp/cookie.txt');
    
  65. Thanks . It works.

  66. $url = "https://api.dailymotion.com/video/xz9frh?fields=price_details";        
            $ch = curl_init();
    	$timeout = 5;
    	curl_setopt($ch, CURLOPT_URL, $url);
    	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    	curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    	curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    	$data = curl_exec($ch);
    	curl_close($ch);
    	return $data;
    $returned_content = $data;
    echo $returned_content;
    $error = substr($returned_content, 2, 5);
    echo $error;
  67. [...] benissimo. Dovrai cambiare user agent ed usarne uno diverso. Puoi anche usare CURL magari http://davidwalsh.name/curl-download MyWebExpression – Realizzazione siti web Rispondi [...]

  68. I have a question, sorry but the code above don’t work for me because i’m not familiar with PHP CURL.
    I have a form and an image within the form, but basically its a certificate. The user has two choices, either print or download. how can i download into am image/jpeg content-type..

  69. Breen

    Nice. This is the preferred way to get HTML.

    file_get_contents for URLs is getting close to being a train wreck. It can be turned off or on unpredictably by hosts, and it seems incompatible with many modern linux distributions out of the box. file_get_contents seems to use its own rules for name resolution and often times out or is extremely slow. THere seems to be no consistent fix for this.

    Don’t use file_get_contents. Use cURL. Combined with the Simple DOM Parser, it is powerful stuff.

  70. shafiul

    thanks for the code,it works well when i try to store the contents of a page from the intranet or local server but it is not working when i m trying to load a page from the internet say http://www.google.com or any other sites. So, if there is any solution to this problem please mail me.

Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!