Yahoo SEO Domain Result Grabber

By  on  

I released my PHP Google Grabber script about a month ago and it was a big hit, even spawning Python and Groovy versions. Obtaining the number of pages indexed in Google by simply providing a domain name (or multiple, if you loop the function) can save you a lot of time. I run this script on a monthly basis to keep track of my customers' websites -- many of them use CMS' we've built so I get to take a peak at how they're doing SEO-wise.

Although Yahoo! isn't nearly as relevant as Google in the search department, Yahoo! is still the most visited website on the internet. Since I already had the basic framework of the code built (from my Google Grabber), I thought it might be beneficial to take a few moments to Yahoo!ize it.

The Code

/* return result number */
function get_yahoo_results($domain = 'davidwalsh.name')
{
	// get the result content
	$content = file_get_contents('https://siteexplorer.search.yahoo.com/search?p=http%3A%2F%2F'.$domain.'&bwm=p&bwms=p&fr2=seo-rd-se');

	// parse to get results
	$pages = str_replace(array(' ',')','('),'',get_match('/Pages (.*) /isU',$content));
	$inlinks = str_replace(array(' ',')','('),'',get_match('/Inlinks (.*) /isU',$content));

	$return['pages'] = $pages ? $pages : 0;
	$return['inlinks'] = $inlinks? $inlinks : 0;

	// return result
	return $return;
}

/* helper: does the regex */
function get_match($regex,$content)
{
	preg_match($regex,$content,$matches);
	return $matches[1];
}

The Usage

domains = array('davidwalsh.name','digg.com','yahoo.com','cnn.com','dzone.com','some-domain-that-doesnt-exist.com');
foreach($domains as $domain)
{
	$result = get_yahoo_results($domain);
	echo $domain,': ',$result['pages'],' pages, ',$result['inlinks'],' inlinks';
}

//davidwalsh.name: 204 pages, 518 inlinks
//digg.com: 20,700,000 pages, 14,300,000 inlinks
//yahoo.com: 1,290,000,000 pages, 4,650,000 inlinks
//cnn.com: 7,510,000 pages, 1,090,000 inlinks
//dzone.com: 776,000 pages, 15,000 inlinks
//some-domain-that-doesnt-exist.com: 0 pages, 0 inlinks

Much like my Google Grabber, you may need to adjust the method of connecting to Yahoo! based on your hosting environment. cURL may be the best option for you.

Recent Features

  • By
    CSS Filters

    CSS filter support recently landed within WebKit nightlies. CSS filters provide a method for modifying the rendering of a basic DOM element, image, or video. CSS filters allow for blurring, warping, and modifying the color intensity of elements. Let's have...

  • By
    7 Essential JavaScript Functions

    I remember the early days of JavaScript where you needed a simple function for just about everything because the browser vendors implemented features differently, and not just edge features, basic features, like addEventListener and attachEvent.  Times have changed but there are still a few functions each developer should...

Incredible Demos

  • By
    Control Element Outline Position with outline-offset

    I was recently working on a project which featured tables that were keyboard navigable so obviously using cell outlining via traditional tabIndex=0 and element outlines was a big part of allowing the user navigate quickly and intelligently. Unfortunately I ran into a Firefox 3.6 bug...

  • By
    MooTools Zebra Table Plugin

    I released my first MooTools class over a year ago. It was a really minimalistic approach to zebra tables and a great first class to write. I took some time to update and improve the class. The XHTML You may have as many tables as...

Discussion

  1. kenny

    Hi, David,

    how do I use this please? do you have a example or something like that?

    Thanks for sharing.

  2. this code working fine, thanks

Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!