Yahoo SEO Domain Result Grabber

By David Walsh on February 5, 2008

I released my PHP Google Grabber script about a month ago and it was a big hit, even spawning Python and Groovy versions. Obtaining the number of pages indexed in Google by simply providing a domain name (or multiple, if you loop the function) can save you a lot of time. I run this script on a monthly basis to keep track of my customers' websites -- many of them use CMS' we've built so I get to take a peak at how they're doing SEO-wise.

Although Yahoo! isn't nearly as relevant as Google in the search department, Yahoo! is still the most visited website on the internet. Since I already had the basic framework of the code built (from my Google Grabber), I thought it might be beneficial to take a few moments to Yahoo!ize it.

The Code

/* return result number */
function get_yahoo_results($domain = 'davidwalsh.name')
{
	// get the result content
	$content = file_get_contents('https://siteexplorer.search.yahoo.com/search?p=http%3A%2F%2F'.$domain.'&bwm=p&bwms=p&fr2=seo-rd-se');

	// parse to get results
	$pages = str_replace(array(' ',')','('),'',get_match('/Pages (.*) /isU',$content));
	$inlinks = str_replace(array(' ',')','('),'',get_match('/Inlinks (.*) /isU',$content));

	$return['pages'] = $pages ? $pages : 0;
	$return['inlinks'] = $inlinks? $inlinks : 0;

	// return result
	return $return;
}

/* helper: does the regex */
function get_match($regex,$content)
{
	preg_match($regex,$content,$matches);
	return $matches[1];
}

The Usage

domains = array('davidwalsh.name','digg.com','yahoo.com','cnn.com','dzone.com','some-domain-that-doesnt-exist.com');
foreach($domains as $domain)
{
	$result = get_yahoo_results($domain);
	echo $domain,': ',$result['pages'],' pages, ',$result['inlinks'],' inlinks';
}

//davidwalsh.name: 204 pages, 518 inlinks
//digg.com: 20,700,000 pages, 14,300,000 inlinks
//yahoo.com: 1,290,000,000 pages, 4,650,000 inlinks
//cnn.com: 7,510,000 pages, 1,090,000 inlinks
//dzone.com: 776,000 pages, 15,000 inlinks
//some-domain-that-doesnt-exist.com: 0 pages, 0 inlinks

Much like my Google Grabber, you may need to adjust the method of connecting to Yahoo! based on your hosting environment. cURL may be the best option for you.

Recent Features

By David WalshAugust 15, 2011

Send Text Messages with PHP

Kids these days, I tell ya. All they care about is the technology. The video games. The bottled water. Oh, and the texting, always the texting. Back in my day, all we had was...OK, I had all of these things too. But I still don't get...

By David WalshFebruary 18, 2013

Create a Sheen Logo Effect with CSS

I was inspired when I first saw Addy Osmani's original ShineTime blog post. The hover sheen effect is simple but awesome. When I started my blog redesign, I really wanted to use a sheen effect with my logo. Using two HTML elements and...

Incredible Demos

By David WalshJanuary 29, 2009

Animated AJAX Record Deletion Using MooTools

I'm a huge fan of WordPress' method of individual article deletion. You click the delete link, the menu item animates red, and the item disappears. Here's how to achieve that functionality with MooTools JavaScript. The PHP - Content & Header The following snippet goes at the...

By David WalshOctober 5, 2009

MooTools-Like Element Creation in jQuery

I really dislike jQuery's element creation syntax. It's basically the same as typing out HTML but within a JavaScript string...ugly! Luckily Basil Goldman has created a jQuery plugin that allows you to create elements using MooTools-like syntax. Standard jQuery Element Creation Looks exactly like writing out...