Download a URL’s Content Using PHP cURL
Downloading content at a specific URL is common practice on the internet, especially due to increased usage of web services and APIs offered by Amazon, Alexa, Digg, etc. PHP's cURL library, which often comes with default shared hosting configurations, allows web developers to complete this task.
The Code
/* gets the data from a URL */
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}The Usage
$returned_content = get_data('http://davidwalsh.name');Alternatively, you can use the file_get_contents function remotely, but many hosts don't allow this.
Comments
Be Heard!
Share your thoughts without being a jerk! And wrap your code in <code> tags, f00!
Alternatively you can use the PHP DOM:
$keywords = array();
$domain = array(‘http://davidwalsh.name’);
$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
foreach ($domain as $key => $value) {
@$doc->loadHTMLFile($value);
$anchor_tags = $doc->getElementsByTagName(‘a’);
foreach ($anchor_tags as $tag) {
$keywords[] = strtolower($tag->nodeValue);
}
}
Keep in mind this is not a tested piece of code, I took parts from a working script I have created and cut out several of the checks I’ve put in to remove whitespace, duplicates, and more.
For your script we can also add a User Agent:
$userAgent = ‘Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)’;
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
Some other options I use:
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
Excellent additions Shawn — thank you for posting them!
And with this great power, comes great responsibility =)
Very true Chris. It’s up to the developer to use it for good or evil. I suppose I’ve used it for both in the past.
For downloading remote XML or text files, this script has been golden.
Great script! Does anyone know how to use that script to save the content it gathered and save it to a file locally on the server?
@KP: Check out my other article, Basic PHP File Handling — Create, Open, Read, Write, Append, Close, and Delete, here:
http://davidwalsh.name/basic-php-file-handling-create-open-read-write-append-close-delete
I am trying to use this function “get_data($url)”, but it gives blank page when I echoed it. Anybody can please help me?
@Usman: There are a few reason why you may get a blank page. You may not have CURL installed on the server. The other possibility is that you need to “echo” the content before you close the connection — someone brought this issue to me the other day.
Hello David,
I am still unable to get result of it, I have checked(using phpinfo()) that CURL is installed. But its giving blank page. When I tried it from php command line its working.
Hi, this script works for me but unfortunately fails on urls from same domain as calling script. i cant see any error in error.log
Works like a charm!
Works just like…. file_get_contents! Thanks.
The code is very effective. but the problem is it returns all the html tags like and others. so is there anyway to get rid of it?
this code is way too short, even php.net probably has a longer version! beware if you use this to enable other users to make the URL requests, they can easily use it to upload malicious code/whole new pages/huge files, like mp3s or movies, that will eat up all your bandwidth.
Do you know of a way to have it click a link on a page. I’m trying to work with another companies registration form. Its a stupid asp page. On the first page it puts ?DoctorId=13074 at the end of the url. On the next page with the registration form it dynamically makes a random string in a hidden input box that gets posted with the form. So is there any way I can have it click and link once it loads a page?
hi,
I’m using curl to get details from an api call, in one of my api call it returns a zip file,
i like to force download that zip file , how can i do this with curl
David Walsh code does not give anything to me.
Why?
I did include php tags before and after both codes.
@Joel – cause you have to add :
echo $returned_content
after the last line ($returned_content = get_data(‘http://davidwalsh.name’);)
i want to onclick a link after getting contents of webpage
how to do it?
I would like to remove the xml declaration from the returned url.
I am appending the gathered data to an existing php/xml file and do not want it.
is there a simple solution??
hi!
im trying to parse AJAX with PHP, problem is:
when i read the URL SOURCE, the AJAX part isn’t visible, and i only grab HTML from the rest of site.
how to solve this problem? any ideas?
Is there a way to use curl in php like you can in the command line. aka
curl http://mydomain.com/picture.jpg -o “PATH_TO_SAVE_TO”
This would download a picture from a website and put it in a folder on my server. It works from Terminal but i cannot find the equivalent in PHP.
If anyone nows the answer to this I would greatly Appreciate it.
Is there a way to use curl in php like you can in the command line. aka
curl http://mydomain.com/picture.jpg -o “PATH_TO_SAVE_TO”
This would download a picture from a website and put it in a folder on my server. It works from Terminal but i cannot find the equivalent in PHP.
If anyone nows the answer to this I would greatly Appreciate it.
Is there a way to use curl in php like you can in the command line. aka
curl http://mydomain.com/picture.jpg -o “PATH_TO_SAVE_TO”
This would download a picture from a website and put it in a folder on my server. It works from Terminal but i cannot find the equivalent in PHP.
If anyone nows the answer to this I would greatly Appreciate it.
@Kelly: yes,
place something like this in your php: 3 options,
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,
“http://www.whateveryouwant.com.php.html.xml”);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xml_language = curl_exec($ch);
curl_close($ch);
echo “$xml.php.html_whatever”;
}
you have options using curl:
return the data in with database driven string. returns the data and appends it to your php, html,xml etc. VERY HANDY – esp. for flash and others, see: worldwideweather.com – forum,
this trick allows flash too read an external xml file for its language and database info. using php to call the userspecif info you can write the flash xml on the fly – this script returns the users languge interface for flash, php calls the xml – the user is spanish (language) ES and appending to the php xml call, the the file is read and writes this into the php script itself with , very very fast
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,
“http://www.verdegia.com/Files/System/TEST/Language/M_TEXT_” . $line{“Language”} . “.xml”);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xml_language = curl_exec($ch);
curl_close($ch);
echo “$xml_language”;
}
return the data in external xml file from php user specific database call ” string – gets data specif for user and generates file on the fly , xml, php, html whatever..:
$sql=”SELECT * FROM $tbl_name WHERE username=’$myusername’”;
$results = mysql_query($sql);
while($line=mysql_fetch_assoc($results)) {
$file = “http://www.worldweatheronline.com/feed/weather.ashx?q=” . $line{“Postcode”} . “&format=xml&num_of_days=5&key=6c7e92e827155910100801″;
}
$ch = curl_init($file);
$fp = @fopen(“../Files/System/TEST/temp.xml”, “w”);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
$file = “../Files/System/TEST/temp.xml”;
$fp = fopen($file, “r”);
?>
HOPE THIS HELPS
Can we use this function to parse all content in a url?
@Indonesia: Except there are a lot more options. I (believe) that it’s possible to get the whole HTTP response using CURL, and (believe) that that is not true with ‘file_get_contents())
$sql = “UPDATE staff SET
staffNo = $staff_no,
f_name=$fname,
l_name=$lname,
sex=$sex,
DOB=$dob,
position=$position,
salary=$salary,
hiredate=$hiredate,
contact_id=$contact_id,
branchNo=$branch_no
WHERE staffNo=$staff_no”;
$query = mysql_query($sql) or die(“Cannot query the database.” . mysql_error());
echo “Database Updated.”;
Great,
It is usefull to get xml or images from other site. if server is not able to get content from fopen.
Thanks
Great,
It is useful to get xml or images from other site. if server is not able to get content from fopen.
Thanks
Nice,
PHP provide other two method to fetch an URL – Curl and Fsockopen.
To use they you can check this example : http://www.bin-co.com/php/scripts/load/
how to write return into new file?
Hi,
Can you put the curl call in a loop, i have a list of about 1000 urls that i want to ‘hit’ so the caches build up, can i just chuck the above code into a loop or will that be too resource heavy?
Thanks
Pete
Thanks for the code..Great!
It is very possible to put this into a automatic crawler for user inputted sites or even make a automatic crawl out of this… The code is short but it works for only one page at a time.. To make it look at multiple pages you have to do some minor PHP coding but nothing major…
I am working on a script right now that works using the code above and just keeps crawling based on the links that on on the initial web page Crawled. A non stop Spider script! They are already out there but I like to say I can make one too…
The script will also take the Meta tags ( Description and Keywords and place them into a database too. Thus giving me a search engine and not a user submitted directory…
if you would like to join the team simply e-mail me at justin2009@gmail.com
I want to extract the images present in the URL and first paragraph from the url. How can I do that?
Hi there I can’t post code here so I can provide you mine class which extract the purticular tag from the return page it could be any html tag.
Regards,
M. Singh
A simple question..how to accelerate the downloading process using cURL it is damn slow…takes sometimes 45sec to download 4kb page
A simple question..how to accelerate the downloading process using cURL it is damn slow…takes sometimes 45sec to download 4kb page
It depends upon the configuration of the host server.
How to work with https. I have site which not loading when I try to open https it simple return 405 error. any help please mail me at msingh@ekomkaar.com
http//www.zerospeedsensor.com/
ie it doesnt return any output.
Very strange. This function returns only relative small pages.
It works if the source code has under 200 lines.
If the web page is bigger won’t return anything. Not even errors.
Same thing happens with file_get_contents.
PHP Version 5.2.11
memory_limit – 128M
max_execution_time – 30
error_reporting(E_ALL)
Any idea?
Hey David,
I did searched on net to find rough code by which i can get “Reciprocal” back links status.
This helps me finally. :)
I do modify it as per my need.
To check backlinks
0)
{
echo ‘found’;
}else{
echo ‘Not found’;
}
}
$remote_url = ‘http://www.listsdir.com/’;
$mmb_url = ‘http://mymoviesbuzz.com/titles/’;
$returned_content = get_data($remote_url,$mmb_url);
?>
Thanks.
Thanks, this script help me to move my wordpress content to new host.
I think good practice to use CURLOPT_USERAGENT in cURL scripts…
In order to read sites encrypted by SSL, like Google Calendar feeds, you must set these CURL options:
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,false);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
Hello David,
How can I download a file from remote url? I’ve try using your method but no luck :(
how can login by curl
How repeat the process
How can I download the contents of a website that requires login??
You need to set session for that and pass them with header so they can use as normal login process. For further details you can contact me at msingh@ekomkaar.com
I’m running Web hosting Website. There My Domain Provider gave me some HTTP API’s. I tried to implement them but i’m getting empty response from curl. Its a HTTPS url and i used
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,false);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
params in my curl. But still getting empty response. Can anyone help me in this! I’m new to cURL :’(
It is possible to retrieve the code inserted into html tag (i.e. flash)?
More precisely, @ the linkedin page of a skill:
http://www.linkedin.com/skills/skill/Java?trk=skills-pg-search,
there is a graphic obtained by an tag, which returns an image.
When I take source code of the page or when I use file_get_contents() php function, I can obtain only the returned tag.
I can see on the Firefox analysis of the page all these information, but I want an automatic script.
Any solution?