# Computer Algorithms: Insertion Sort

## Overview

Sorted data can dramatically change the speed of our program, therefore sorting algorithms are something quite special in computer science. For instance searching in a sorted list is faster than searching in an unordered list.

There are two main approaches in sorting – by comparing the elements and without comparing them. A typical algorithm from the first group is insertion sort. This algorithm is very simple and very intuitive to implement, but unfortunately it is not so effective compared to other sorting algorithms as quicksort and merge sort. Indeed insertion sort is useful for small sets of data with no more than about 20 items.

Insertion sort it is very intuitive method of sorting items and we often use it when we play card games. In this case the player often gets an unordered set of playing cards and intuitively starts to sort it. First by taking a card, making some comparisons and then putting the card on the right position.

So let’s say we have an array of data. In the first step the array is unordered, but we can say that it consists of two sub-sets: sorted and unordered, where on the first step the only item in the sorted sub-set is its first item. If the length of the array is n the algorithm is considered completed in n-1 steps. On each step our sorted subset is growing with one item. The thing is that we take the first item from the unordered sub-set and with some comparisons we put it into its place in the sorted sub-set, like on the diagram bellow.

Continue reading Computer Algorithms: Insertion Sort

# Computer Algorithms: Jump Search

## Overview

In my previous article I discussed how the sequential (linear) search can be used on an ordered lists, but then we were limited by the specific features of the given task. Obviously the sequential search on an ordered list is ineffective, because we consecutively check every one of its elements. Is there any way we can optimize this approach? Well, because we know that the list is sorted we can check some of its items, but not all of them. Thus when an item is checked, if it is less than the desired value, we can skip some of the following items of the list by jumping ahead and then check again. Now if the checked element is greater than the desired value, we can be sure that the desired value is hiding somewhere between the previously checked element and the currently checked element. If not, again we can jump ahead. Of course a good approach is to use a fixed step. Let’s say the list length is n and the step’s length is k. Basically we check list(0), then list(k-1), list(2k-1) etc. Once we find the interval where the value might be (m*k-1 < x <= (m+1)*k – 1), we can perform a sequential search between the last two checked positions. By choosing this approach we avoid a lot the weaknesses of the sequential search algorithm. Many comparisons from the sequential search here are eliminated.

## How to choose the step’s length

We know that it is a good practice to use a fixed size step. Actually when the step is 1, the algorithm is the traditional sequential search. The question is what should be the length of the step and is there any relation between the length of the list (n) and the length of the step (k)? Indeed there is such a relation and often you can see sources directly saying that the best length k = √n. Why is that?

Well, in the worst case, we do n/k jumps and if the last checked value is greater than the desired one, we do at most k-1 comparisons more. This means n/k + k – 1 comparisons. Now the question is for what values of k this function reaches its minimum. For those of you who remember maths classes this can be found with the formula -n/(k^2) + 1 = 0. Now it’s clear that for k = √n the minimum of the function is reached.

Of course you don’t need to prove this every time you use this algorithm. Instead you can directly assign √n to be the step length. However it is good to be familiar with this approach when trying to optimize an algorithm.

Let’s cosider the following list: (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610). Its length is 16. Jump search will find the value of 55 with the following steps.

## Implementation

Let’s see an example of jump search, written in PHP. Continue reading Computer Algorithms: Jump Search

# Computer Algorithms: Linear Search in Sorted Lists

## Overview

The expression “linear search in sorted lists” itself sounds strange. Why should we use this algorithm for sorted lists when there are lots of other algorithms that are far more effective? As I mentioned in

my previous post the sequential search is very ineffective in most of the cases and it is primary used for unordered lists. Indeed sometimes it is more useful first to sort the data and then use a faster algorithm like the binary search. On the other hand the analysis shows that for lists with less than ten items the linear search is much faster than the binary search. Although, for instance, binary search is more effective on sorted lists, sequential search can be a better solution in some specific cases with minor changes. The problem is that when developers hear the expression “sorted list” they directly choose an algorithm different from the linear search. Perhaps the problem lays in the way we understand what an ordered list is?

### What is a sorted list?

We used to think that this list (1, 1, 2, 3, 5, 8, 13) is sorted. Actually we think so because it is … sorted, but the list (3, 13, 1, 3, 3.14, 1.5, -1) is also sorted, except that we don’t know how. Thus we can think that any array is sorted, although it is not always obvious how. There are basically two cases when sequential search can be very useful. First when the list is very short or when we know in advance that there are some values that are very frequently searched. Continue reading Computer Algorithms: Linear Search in Sorted Lists

# Computer Algorithms: Sequential Search

## Overview

This is the easiest to implement and the most frequently used search algorithm in practice. Unfortunately the sequential search is also the most ineffective searching algorithm. However, it is so commonly used that it is appropriate to consider several ways to optimize it. In general the sequential search, also called linear search, is the method of consecutively check every value in a list until we find the desired one.

## Basic Implementation

The most natural approach is to loop through the list until we find the desired value. Here’s an implementation on PHP using FOR loop, something that can be easily written into any other computer language.

This is really the most ineffective implementation. There are two big mistakes in this code. First of all we calculate the length of the list on every iteration of the array, and secondly after we find the desired element, we don’t break the loop, but continue to loop through the array.

Yes, if the element is repeated without the “break” we can find its last occurrence, but if not the loop will iterate over the end of the array with no practical value.

### Optimization of the forward sequential search

… and javascript:

Even with this little optimization the algorithm remains ineffective. As we can see, on every iteration we have two conditional expressions. First we check whether we’ve reached the end of the list, and then we check whether the current element equals to the searched element. So the question is can we reduce the number of the conditional expressions?

## Searching in reverse order

Yes, we can reduce the number of comparison instructions from the forward approach of the linear search algorithm by using reverse order searching. Although it seems to be pretty much the same by reversing the order of the search we can discard one of the conditional expressions.

Note that we need to adjust index because of \$index—expression.

Indeed here we have only one conditional expression, but the problem is that this implementation is correct ONLY when the element exists in the list, which is not always true. If the element doesn’t appears into the list, then this code can lead to an infinite loop. OK, but how can we stop the loop even when the list doesn’t contain the desired value? The answer is, by adding the searched value to the list.

## Sentinel

The above problem can be solved by inserting the desired item as a sentinel value. Thus we’re sure that the list contains the value, so the loop will stop for sure even if at the beginning the value didn’t appear to be part of the list.

This approach can be used to overcome the problem of the reverse linear search approach from the previous section.

## Complexity

As I said at the beginning of this post this is one of the most ineffective searching algorithms. Of course the best case is when the searched value is at the very beginning of the list. Thus on the first comparison we can find it. On the other hand the worst case is when the element is located at the very end of the list. Assuming that we don’t know where the element is and the possibility to be anywhere in the list is absolutely equal, then the complexity of this algorithm is O(n).

### Different cases

We must remember, however, that the algorithm’s complexity can vary depending on whether the element occurs once.

### Is it so ineffective?

Sequential search can be very slow compared to binary search on an ordered list. But actually this is not quite true. Sequential search can be faster than binary search for small arrays, but it is assumed that for n < 8 the sequential search is faster.

## Application

The linear search is really very simple to implement and most web developers go to the forward implementation, which is the most ineffective one. On the other hand this algorithm is quite useful when we search in an unordered list. Yes, searching in an ordered list is something that can dramatically change the search algorithm. Actually searching and sorting algorithms are often used together.

A typical case is pulling something from a database, usually in form of a list and then search for some value in it. Unfortunately in most of the cases the database orders the returned result set and yet most of the developers perform a consecutive search over the list. Yet again when the list is ordered it is better to use binary search instead of sequential search.
Let’s say we have a CSV file containing the usernames and the names of our users.

``````Username,Name
jamesbond007,James Bond
jsmith,John Smith
...
``````

Now we fetch these values into an array.

``````// work case
\$arr = array(
array('name' =&gt; 'James Bond', 'username' =&gt; 'jamesbond007'),
array('name' =&gt; 'John Smith', 'username' =&gt; 'jsmith')
);
``````

Now using sequential search …

``````// using a sentinel
\$x = 'jsmith';
\$arr[] = array('username' =&gt; \$x, 'name' =&gt; '');
\$index = 0;