Tag Archives: sorting algorithm

Computer Algorithms: Sorting in Linear Time

Radix Sort

The first question when we see the phrase “sorting in linear time” should be – where’s the catch? Indeed there’s a catch and the thing is that we can’t sort just anything in linear time. Most of the time we can speak on sorting integers in linear time, but as we can see later this is not the only case.

Since we speak about integers, we can think of a faster sorting algorithm than usual. Such an algorithm is the counting sort, which can be very fast in some cases, but also very slow in others, so it can be used carefully. Another linear time sorting algorithm is radix sort.

Introduction

Count sort is absolutely brilliant and easy to implement. In case we sort integers in the range [n, m] on the first pass we just initialize a zero filled array with length m-n. Than on the second pass we “count” the occurrence of each integer. On the third pass we just sort the integers with an ease.

However we have some problems with that algorithm. What if we have only few items to sort that are very far from each other like [2, 1, 10000000, 2]. This will result in a very large unused data. So we need a dense integer sequence. This is important because we must know in advance the nature of the sequence which is rarely sure.

That’s why we need to use another linear time sorting algorithm for integers that doesn’t have this disadvantage. Such an algorithm is the radix sort.

Overview

The idea behind the radix sort is simple. We must look at our “integer” sequence as a string sequence. OK, to become clearer let me give you an example. Our sequence is [12, 2, 23, 33, 22]. First we take the leftmost digit of each number. Thus we must compare [_2, 2, _3, _3, _2]. Clearly we can assume that since the second number “2” is only a one digit number we can fill it up with a leading “0”, to become 02 or _2 in our example: [_2, _2, _3, _3, _2]. Now we sort this sequence with a stable sort algorithm.

What is a Stable Sort Algorithm

A stable sort algorithm is an algorithm that sorts a list by preserving the positions of the elements in case they are equal. In terms of PHP this means that:

array(0 => 12, 1=> 13, 2 => 12);

Will be sorted as follows:

array(0 => 12, 2 => 12, 1 => 13);

Thus the third element becomes second following the first element. Note that the third and the first element are equal, but the third appears later in the sequence so it remains later in the sorted sequence.

In the radix sort example, we need a stable sort algorithm, because we need to worry about only one position of digit we explore.

So what happens in our example after we sort the sequence?

As we can see we’re far from a sorted sequence, but what if we proceed with the next “position” – the decimal digit?

Than we end up with this:

Now we have a sorted sequence, so let’s summarize the algorithm in a short pseudo code.

Pseudo Code

The simple approach behind the radix sort algorithm can be described as pseudo code, assuming that we’re sorting decimal integers.

1. For each digit at position 10^0 to 10^n
1.1. Sort the numbers by this digit using a stable sort algorithm;

The thing is that here we talk about decimal, but actually this algorithm can be applied equally on any numeric systems. That is why it’s called “radix” sort.

Thus we can sort binary numbers, hexadecimals etc.

It’s important to note that this algorithm can be also used to sort strings alphabetically.

[ABC, BBC, ABA, AC]
[__C, __C, __A, __C] => [ABA, ABC, BBC, AC]
[_B_, _B_, _B_, _A_] => [AC, ABA, ABC, BBC]
[___, A__, A__, B__] => [AC, ABA, ABC, BBC]

That is simply correct because we can assume that our alphabet is another 27 digit numeric system (in case of the Latin alphabet).

Complexity

As I said in the beginning radix sort is a linear time sorting algorithm. Let’s see why. First we depend on the numeric system. Let’s assume we have a decimal numeric system – then we have N passes sorting 10 digits which is simply 10*N. In case of K digit numeric system our algorithm will be O(K*N) which is linear.

However you must note that in case we sort N numbers in an N digit numeric system the complexity will become O(N^2)!

We must also remember that in order to implement radix sort and a supporting stable sort algorithm we need an extra space.

Application

Sorting integers can be faster than sorting just anything, so any time we need to implement a sorting algorithm we must carefully investigate the input data. And that’s also the big disadvantage of this algorithm – we must know the input in advance, which is rarely the case.

Computer Algorithms: Heap and Heapsort

Introduction

Heapsort is one of the general sorting algorithms that performs in O(n.log(n)) in the worst-case, just like merge sort and quicksort, but sorts in place – as quicksort. Although quicksort’s worst-case sorting time is O(n2) it’s often considered that it beats other sorting algorithms in practice. Thus in practice quicksort is “faster” than heapsort. In the same time developers tend to consider heapsort as more difficult to implement than other n.log(n) sorting algorithms.

In the other hand heapsort uses a special data structure, called heap, in order to sort items in place and this data structure is quite useful in some specific cases. Thus to understand heapsort we first need to understand what is a heap.

So first let’s take a look at what is a heap.

Overview

A heap is a complete binary tree, where all the parents are greater than their children (max heap). If all the children are greater than their parents it is considered to call the heap a min-heap. But first what is a complete binary tree? Well, this is a binary tree, where all the levels are full, except the last one, where all the items are placed on the left (just like on the image below).

Complete Binary Tree
A complete binary tree is a structure where all the levels are completely full, except the last level, where all the items are placed on the left!
Continue reading Computer Algorithms: Heap and Heapsort

Computer Algorithms: Jump Search

Overview

In my previous article I discussed how the sequential (linear) search can be used on an ordered lists, but then we were limited by the specific features of the given task. Obviously the sequential search on an ordered list is ineffective, because we consecutively check every one of its elements. Is there any way we can optimize this approach? Well, because we know that the list is sorted we can check some of its items, but not all of them. Thus when an item is checked, if it is less than the desired value, we can skip some of the following items of the list by jumping ahead and then check again. Now if the checked element is greater than the desired value, we can be sure that the desired value is hiding somewhere between the previously checked element and the currently checked element. If not, again we can jump ahead. Of course a good approach is to use a fixed step. Let’s say the list length is n and the step’s length is k. Basically we check list(0), then list(k-1), list(2k-1) etc. Once we find the interval where the value might be (m*k-1 < x <= (m+1)*k – 1), we can perform a sequential search between the last two checked positions. By choosing this approach we avoid a lot the weaknesses of the sequential search algorithm. Many comparisons from the sequential search here are eliminated.

How to choose the step’s length

We know that it is a good practice to use a fixed size step. Actually when the step is 1, the algorithm is the traditional sequential search. The question is what should be the length of the step and is there any relation between the length of the list (n) and the length of the step (k)? Indeed there is such a relation and often you can see sources directly saying that the best length k = √n. Why is that?

Well, in the worst case, we do n/k jumps and if the last checked value is greater than the desired one, we do at most k-1 comparisons more. This means n/k + k – 1 comparisons. Now the question is for what values of k this function reaches its minimum. For those of you who remember maths classes this can be found with the formula -n/(k^2) + 1 = 0. Now it’s clear that for k = √n the minimum of the function is reached.

Of course you don’t need to prove this every time you use this algorithm. Instead you can directly assign √n to be the step length. However it is good to be familiar with this approach when trying to optimize an algorithm.

Let’s cosider the following list: (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610). Its length is 16. Jump search will find the value of 55 with the following steps.

Jump search basic implementation
Jump search skips some of the items of the list in order to improve performance!

Implementation

Let’s see an example of jump search, written in PHP. Continue reading Computer Algorithms: Jump Search

Computer Algorithms: Linear Search in Sorted Lists

Overview

The expression “linear search in sorted lists” itself sounds strange. Why should we use this algorithm for sorted lists when there are lots of other algorithms that are far more effective? As I mentioned in

my previous post the sequential search is very ineffective in most of the cases and it is primary used for unordered lists. Indeed sometimes it is more useful first to sort the data and then use a faster algorithm like the binary search. On the other hand the analysis shows that for lists with less than ten items the linear search is much faster than the binary search. Although, for instance, binary search is more effective on sorted lists, sequential search can be a better solution in some specific cases with minor changes. The problem is that when developers hear the expression “sorted list” they directly choose an algorithm different from the linear search. Perhaps the problem lays in the way we understand what an ordered list is?

What is a sorted list?

We used to think that this list (1, 1, 2, 3, 5, 8, 13) is sorted. Actually we think so because it is … sorted, but the list (3, 13, 1, 3, 3.14, 1.5, -1) is also sorted, except that we don’t know how. Thus we can think that any array is sorted, although it is not always obvious how. There are basically two cases when sequential search can be very useful. First when the list is very short or when we know in advance that there are some values that are very frequently searched. Continue reading Computer Algorithms: Linear Search in Sorted Lists

Friday Algorithms: Sorting a Set of Integers – Far Quicker than Quicksort!

Yes! It’s really really fast, and it’s far quicker than the quicksort algorithm, which is considered as the fastest sorting algorithm in practice. However how it’s possible to be faster than the quicksort, which is the fastest algorithm?! Is that true? Actually it’s true, but only in few cases. It works with integers, you’ve to know the first and the last element from that set and you’ve to be sure that every element is unique

Imagine you’ve a set of numbers all of them greater than 1 and lesser than 1000. Of course you’re not suppose to have all of the integers between 1 and 1000, but only few of them – think of 500 numbers between 1 and 1000! Here’s important to note – that this is only an example, you can have far more than only few numbers between 1 and 1000 – what about the numbers between 1 and 1,000,000 – this is a big set, isn’t it.

The question is – if there are so many constraints, why should I use that algorithm instead of quicksort, or another sorting algorithm, that works with everything. The answer is clear – yes, you’d prefer quicksort if you’ve to sort some arbitrary data, but when it comes to integers, and you’ve, let’s say, 1,000,000 integers, my advice is – use this algorithm!

Sorting the Set

1. First Pass

First we have an unsorted array, but we know the minimum and maximum of the set.

an unsorted array of integers

On the first pass initialize an empty array with as many elements, as they are between the first and the last element of the set – for a set between 1 and 1000 – that will be an array with 1000 elements – each of which will be a zero in the beginning.

temporary array with 0 on every element

Than loop trough the set and for every element in the set – you should put a 1 on it’s place

sort a set - first pass

Now we have an array of 0 and 1.

2. Second Pass

After the first pass, you’d guess what you’ve to do – loop trough the second array and print the keys of the elements different from 0 – those that are 1.

a sorted set of integers

Now the array is sorted!

Source Code

var a = [34, 203, 3, 746, 200, 984, 198, 764];
 
function setSort(arr)
{
    var t = [], len = arr.length;
    for (var i = 0; i < 1000; i++) {
        t[i] = 0;
    }
 
    for (i = 0; i < len; i++) {
        t[arr[i]] = 1;
    }
 
    for (i = 0; i < 1000; i++) {
        if (1 == t[i]) {
            console.log(i);
        }
    }
}
 
setSort(a);

Now that you’ve seen that algorithm, perhaps you’d guess that it’s no so difficult to change from integers to any other set, and once again I should say that in many cases this is the best algorithm for sorting! Very often quicksort is preferred, but not always there isn’t something faster!