Building Resilient Systems on AWS: Learn how to design and implement a resilient, highly available, fault-tolerant infrastructure on AWS.

Object Indexing vs. Array Collection

By David Walsh on August 26, 2010

The Setup & Goal

Let's say that we have one large text document and we have a a bunch of keywords that we want to parse the document for. We don't care how many times times the keyword appears -- we just care that it's been used. When we find a keyword, we need to record that keyword been found so that we may check at a later time.

Inefficient Method: Array Collection and Search

The first method to record that a keyword has been found is by pushing the keyword into one array:

//Assume an array called "foundKeywords" was defined above
if(shouldSave(keyword)) {
	foundKeywords.push(keyword);
}

At the end of the document search, we'd end up with an array like:

//the foundKeywords array looks like:
//['keyword1','keyword2','keyword2',...]

When it comes to checking this array for the existence of a given keyword, this method will prove inefficient. Why? Because we'd need to loop through the array and search until we found the given keyword (if ever). Those are a lot of "wasted" or fruitless cycles, even if we break the loop when the keyword was found. Inefficient is the only word to describe this process.

The Efficient Method: Object with Index

The fastest method of checking a storing keywords for later reference is by object (in JavaScript) or associative array (in PHP). Instead of adding the keyword to an array, we add the keyword as an index to a master object, giving the value as 1:

//Assume an object {} called "foundKeywords" was defined above
if(shouldSave(keyword)) {
	foundKeywords[keyword] = 1;
}

Why is this faster? No wasted cycles looking blindly through an array. The check is quick and simple:

if(foundKeywords[keyword]) { //FOUND!
	//do something
}

It's either an index or it's not! In PHP, we'd save the keyword to an associative array:

//Assume an array called "$found_keywords" was defined above
if(shouldSave($keyword)) {
	$found_keywords[$keyword] = 1;
}

//Later: checking if the keyword was there...
if($found_keywords[$keyword]) { //or array_key_exists($keyword,$found_keywords)
	//FOUND!
}

In a word...awesome. Not only fast but simple!

I cannot provide a benchmark because the speed of execution depends on how large the keyword array is. Suffice to say, for the sake of simplicity and speed, using an object with keyword index is definitely the way to go!

Recent Features

By Chris CoyierAugust 13, 2013
Chris Coyier’s Favorite CodePen Demos
David asked me if I'd be up for a guest post picking out some of my favorite Pens from CodePen. A daunting task! There are so many! I managed to pick a few though that have blown me away over the past few months. If you...
By Landon SchroppMarch 16, 2015
Regular Expressions for the Rest of Us
Sooner or later you'll run across a regular expression. With their cryptic syntax, confusing documentation and massive learning curve, most developers settle for copying and pasting them from StackOverflow and hoping they work. But what if you could decode regular expressions and harness their power? In...

Incredible Demos

By David WalshMay 8, 2018
Vertically Centering with Flexbox
Vertically centering sibling child contents is a task we've long needed on the web but has always seemed way more difficult than it should be. We initially used tables to accomplish the task, then moved on to CSS and JavaScript tricks because table layout was horribly...
By David WalshAugust 10, 2009
jQuery Link Nudge Plugin
A while back I debuted a tasteful mouseover/mouseout technique called link nudging. It started with a MooTools version and shortly thereafter a jQuery version. Just recently Drew Douglass premiered a jQuery plugin that aimed at producing the same type of effect.

Discussion

seelts
PHP has a “in_array()” function…

David Walsh
Yes, but I imagine that internal method does a similar operation.
Daniel15
Indeed, array lookups (like what in_array does) are linear (O(n)) whereas hashtable lookups (like what David does) are close to constant (O(1)). That means that as the array size increases, the in_array search will get slower on average, whereas the hashtable lookup will stay consistant.
EllisGL
in_array() = slow
isset($arr[$key]) = fast

seelts

i’ve made a test:

function rndstr(){
	$str='';
	for($i=0;$i<10;$i++){
		$str.=chr(rand(32,126));
	}
	return($str);
}

// ---------------------------indexes
$t=microtime(true);
$b=array();
for($j=0;$j<100000;$j++){
	$b[]=rndstr();
}
in_array('0123456789',$b);
$dt=microtime(true)-$t;
echo '"indexes" time: '.$dt;

echo '';

// ---------------------------keys
$t=microtime(true);
$a=array();
for($i=0;$i<100000;$i++){
	$a[rndstr()]=1;
}
isset($a['0123456789']);
$dt=microtime(true)-$t;
echo '"keys" time: '.$dt;

echo '';

// ---------------------------manual
$t=microtime(true);
$c=array();
for($k=0;$k

the results are:
“indexes” time: 1.93250012398
“keys” time: 1.93542504311
“manual” time: 1.94083094597

they are almost identical, though “indexes” are little bit faster

2David,
concerning JS, your method has one more lack – as you operating with Objects, but not Arrays, you can’t use “.length” to know the amount of unique found words.
so it is better not just “foundKeywords[keyword] = 1;”, but

if(!foundKeywords[keyword]){
	foundKeywords[keyword]=1;
}
foundKeywords[keyword]++;

but it looks not very pretty (imho).

seelts

i’ve little bit changed the test:

<?php
function rndstr(){
	$str='';
	for($i=0;$i<10;$i++){
		$str.=chr(rand(32,126));
	}
	return($str);
}

// ---------------------------indexes
$t=microtime(true);
$b=array();
for($j=0;$j<100000;$j++){
	$b[]=rndstr();
}
$dt=microtime(true)-$t;
echo '"indexes". populating time: '.$dt;

echo '';

// ---------------------------keys
$t=microtime(true);
$a=array();
for($i=0;$i

i’ve tried it about 30-40 times. the most noticable difference is:
“indexes”. populating time: 1.89304304123
“keys”. populating time: 1.9226899147

it means that the main difference – is the speed of arrays populating. and simple arrays are faster then hashes, but anyway – almost identical.

seelts

something has happend with code:

function rndstr(){
	$str='';
	for($i=0;$i<10;$i++){
		$str.=chr(rand(32,126));
	}
	return($str);
}

// ---------------------------indexes
$t=microtime(true);
$b=array();
for($j=0;$j<100000;$j++){
	$b[]=rndstr();
}
$dt=microtime(true)-$t;
echo '"indexes". populating time: '.$dt;

echo '';

// ---------------------------keys
$t=microtime(true);
$a=array();
for($i=0;$i

seelts
David, it seems that parser is not working. Can you please edit/merge.
… $i < 100000;$i++){
$a[rndstr()]=1;
}
$dt=microtime(true)-$t;
echo ‘”keys”. populating time: ‘.$dt;
?>

tkazec
I believe using “keyword in foundKeywords” is faster than “foundKeywords[keyword]” in JS.

David Walsh
That’s a good thought. Have metrics?
tkazec
I just tested it. If the key exists, “obj[key]” is faster, if not, “key in obj” is. However, the difference is negligible. The object size does not seem to affect the speed of either method.

Q_the_novice
@David, very interesting tip – after reading other comments, it would be nice to see what your test results are.
chris
My opinion:
@seelts: I don’t know what you want to proof with your code. In both cases you just insert to an array, but you never try to access the data! Therefore your runtime will always be O(n) [big O notation] where n is the number of your words you insert. Every single insert is O(1) and your marginal differences in the populating time follow from the runtime of your rand().

@david: theoretical you should be right. If you use indices you will need O(1) to get the data or to know that there is none. If you have to cycle through the array you will need O(n) in the worst case (last or not in the array) or O(1) in the best case (first element) [if the array is not sorted!]. The real tuntimes depend on the implementation of array/object and a lot on the amount of array elements.
Daniel15
Good tip. Although, instead of setting the value to 1, I’d personally set it to “true”. :)
Arieh
What about array.indexOf()?
It returns -1 when none is found, so you can do:
```
['a','c','dfg'].indexOf('a') //0

['a','c','dfg'].indexOf('abc') //-1
```
It’s bound to be faster than manual index search. Not sure if this is faster (probably not).

Also, for php arrays, unless assuming null values, checking for a key can be done with isset, which is kind of like JS:
```
if (isset($arr[$key])){}
```
but if value is null it returns false. Isset is extremely fast.

Daniel15
I think indexOf still does a linear search. A hashtable lookup would be quicker. So something like this: {'a': true, 'c': true, 'dfg: true} and then check if data[whatever] is true. :)
Arieh
I’m pretty sure you’re right (that hash lookup is faster). But the use case is stripped down. In real life, you would find yourself requiring other operations. And for that a true representation of your stack could be much more useful.
For example – Object iteration is much slower than an arrays. And it will lack many array tools, as simple as lacking the length property.
And although indexOf might be slower, I’m not so sure it will be slower enough to think about dumping it. JS is quite fast nowadays. There is no reason why not to use the native tools it provides instead of hacking it up.

NetDust
Pretty cool. Going to try these out soon.
Richard K. Szabo
As pointed out,
if($found_keywords[$keyword]) { //or array_key_exists($keyword,
requires isset() to test.

Good ideas about checking up the items in JS!
Adrien
And what about associative Array in JavaScript ?
Sukima
I’m using this very technique to handle a string match for looking up common words in a dictionary. Basically I want to know does a string exists in a list of terms (in JavaScript). The dictionary is well over 10,000 terms. Even a sorted array with a binary search was slow to look up and since this check would be done on user’s interaction with a keyup event it needed to be fast. So I defined my dictionary as an object literal: {"foo":true,"bar":true}. This was fast.

However it did raise the question, Just how large could that dictionary object get? And since booleans are represented as an integer (right?) that’s 32 bits of memory per term when all I need is one bit. Is that a waste of memory and should I be concerned with the execution environment. Slim and lean is the key here. Does using null or void 0 offer better memory usage ({"foo":null,"bar":null})? Which is best: dict["foo"] === null vs dict["foo"] === undefined vs dict["foo"] === true

Object Indexing vs. Array Collection

The Setup & Goal

Inefficient Method: Array Collection and Search

The Efficient Method: Object with Index

Recent Features

Chris Coyier’s Favorite CodePen Demos

Regular Expressions for the Rest of Us

Incredible Demos

Vertically Centering with Flexbox

jQuery Link Nudge Plugin

Discussion