Tag Archives: search algorithm

Computer Algorithms: Shortest Path in a Graph

Introduction

Since with graphs we can represent real-life problems it’s almost clear why we would need an efficient algorithm that calculates the shortest path between two vertices. Getting back to our example of a road map we can use such an algorithm in order to find the shortest path between two cities. This example, of course, is very basic indeed, but it can give us a clear example of where shortest path can be applied.

In the other hand, we can model an enormous field of real-life problems using graphs – not only road maps. As we already know, whenever we have relations between different abstract objects we can refer an efficient graph algorithm.

OK, so we need a shortest path algorithm, but before we proceed with the exact algorithm first we’ll need to answer some questions and give some definitions.

Overview

First we need a definition of the terms distance and path between two nodes. A path is considered to be the sequence of vertices (or edges if you wish) between two vertices i and j. Of course we assume that there might be no path between any to vertices in the graph! Also we assume that this definition relates both for directed and undirected graphs. After we have the definition of a path we can proceed by defining a “distance”, which is said to be the number of edges in the path between i and j.

Path and Distance
First we need to define what’s a path and a distance between two vertices in order to continue searching for the shortest path!
Continue reading Computer Algorithms: Shortest Path in a Graph

Computer Algorithms: Interpolation Search

Overview

I wrote about binary search in my previous post, which is indeed one very fast searching algorithm, but in some cases we can achieve even faster results. Such an algorithm is the “interpolation search” – perhaps the most interesting of all searching algorithms. However we shouldn’t forget that the data must follow some limitations. In first place the array must be sorted. Also we must know the bounds of the interval.

Why is that? Well, this algorithm tries to follow the way we search a name in a phone book, or a word in the dictionary. We, humans, know in advance that in case the name we’re searching starts with a “B”, like “Bond” for instance, we should start searching near the beginning of the phone book. Thus if we’re searching the word “algorithm” in the dictionary, you know that it should be placed somewhere at the beginning. This is because we know the order of the letters, we know the interval (a-z), and somehow we intuitively know that the words are dispersed equally. These facts are enough to realize that the binary search can be a bad choice. Indeed the binary search algorithm divides the list in two equal sub-lists, which is useless if we know in advance that the searched item is somewhere in the beginning or the end of the list. Yes, we can use also jump search if the item is at the beginning, but not if it is at the end, in that case this algorithm is not so effective.

So the interpolation search is based on some simple facts. The binary search divides the interval on two equal sub-lists, as shown on the image bellow.

Binary search basic approach
The binary search algorithm divides the list in two equal sub-lists!

What will happen if we don’t use the constant ½, but another more accurate constant “C”, that can lead us closer to the searched item.

Interpolation search
The interpolation search algorithm tries to improve the binary search!

Continue reading Computer Algorithms: Interpolation Search

Computer Algorithms: Binary Search

Overview

The binary search is perhaps the most famous and best suitable search algorithm for sorted arrays. Indeed when the array is sorted it is useless to check every single item against the desired value. Of course a better approach is to jump straight to the middle item of the array and if the item’s value is greater than the desired one, we can jump back again to the middle of the interval. Thus the new interval is half the size of the initial one.

Binary search basic implementation
Basic implementation of binary search

If the searched value is greater than the one placed at the middle of the sorted array, we can jump forward. Again on each step the considered list is getting half as long as the list on the previous step, as shown on the image bellow.

Binary search - basic implementation
Binary search - basic implementation

Implementation

Here’s a sample implementation of this algorithm on PHP. Obviously the nature of this approach is guiding us to a recursive implementation, but as we know, sometimes recursion can be dangerous. That’s why here we can see either the recursive and iterative solution. Continue reading Computer Algorithms: Binary Search

Computer Algorithms: Sequential Search

Overview

This is the easiest to implement and the most frequently used search algorithm in practice. Unfortunately the sequential search is also the most ineffective searching algorithm. However, it is so commonly used that it is appropriate to consider several ways to optimize it. In general the sequential search, also called linear search, is the method of consecutively check every value in a list until we find the desired one.

Basic Implementation

The most natural approach is to loop through the list until we find the desired value. Here’s an implementation on PHP using FOR loop, something that can be easily written into any other computer language.

This is really the most ineffective implementation. There are two big mistakes in this code. First of all we calculate the length of the list on every iteration of the array, and secondly after we find the desired element, we don’t break the loop, but continue to loop through the array.

Forward Linear Search

Yes, if the element is repeated without the “break” we can find its last occurrence, but if not the loop will iterate over the end of the array with no practical value.

Optimization of the forward sequential search

… and javascript:

Optimized forward linear search

Even with this little optimization the algorithm remains ineffective. As we can see, on every iteration we have two conditional expressions. First we check whether we’ve reached the end of the list, and then we check whether the current element equals to the searched element. So the question is can we reduce the number of the conditional expressions?

Searching in reverse order

Yes, we can reduce the number of comparison instructions from the forward approach of the linear search algorithm by using reverse order searching. Although it seems to be pretty much the same by reversing the order of the search we can discard one of the conditional expressions.

Note that we need to adjust index because of $index—expression.

Indeed here we have only one conditional expression, but the problem is that this implementation is correct ONLY when the element exists in the list, which is not always true. If the element doesn’t appears into the list, then this code can lead to an infinite loop. OK, but how can we stop the loop even when the list doesn’t contain the desired value? The answer is, by adding the searched value to the list.

Sentinel

The above problem can be solved by inserting the desired item as a sentinel value. Thus we’re sure that the list contains the value, so the loop will stop for sure even if at the beginning the value didn’t appear to be part of the list.

Using setinel in sequential search

This approach can be used to overcome the problem of the reverse linear search approach from the previous section.

Complexity

As I said at the beginning of this post this is one of the most ineffective searching algorithms. Of course the best case is when the searched value is at the very beginning of the list. Thus on the first comparison we can find it. On the other hand the worst case is when the element is located at the very end of the list. Assuming that we don’t know where the element is and the possibility to be anywhere in the list is absolutely equal, then the complexity of this algorithm is O(n).

Different cases

We must remember, however, that the algorithm’s complexity can vary depending on whether the element occurs once.

Is it so ineffective?

Sequential search can be very slow compared to binary search on an ordered list. But actually this is not quite true. Sequential search can be faster than binary search for small arrays, but it is assumed that for n < 8 the sequential search is faster.

Application

The linear search is really very simple to implement and most web developers go to the forward implementation, which is the most ineffective one. On the other hand this algorithm is quite useful when we search in an unordered list. Yes, searching in an ordered list is something that can dramatically change the search algorithm. Actually searching and sorting algorithms are often used together.

A typical case is pulling something from a database, usually in form of a list and then search for some value in it. Unfortunately in most of the cases the database orders the returned result set and yet most of the developers perform a consecutive search over the list. Yet again when the list is ordered it is better to use binary search instead of sequential search.
Let’s say we have a CSV file containing the usernames and the names of our users.

Username,Name
jamesbond007,James Bond
jsmith,John Smith
...

Now we fetch these values into an array.

// work case
$arr = array(
    array('name' =&gt; 'James Bond', 'username' =&gt; 'jamesbond007'),
    array('name' =&gt; 'John Smith', 'username' =&gt; 'jsmith')
);

Now using sequential search …

// using a sentinel
$x = 'jsmith';
$arr[] = array('username' =&gt; $x, 'name' =&gt; '');
$index = 0;

while ($arr[$index++]['username'] != $x);

if ($index &lt; count($arr)) {
    echo "Hello, {$arr[$index-1]['name']}";
} else {
    echo "Hi, guest!";
}