All posts by Stoimen

Computer Algorithms: Sorting in Linear Time

Radix Sort

The first question when we see the phrase “sorting in linear time” should be – where’s the catch? Indeed there’s a catch and the thing is that we can’t sort just anything in linear time. Most of the time we can speak on sorting integers in linear time, but as we can see later this is not the only case.

Since we speak about integers, we can think of a faster sorting algorithm than usual. Such an algorithm is the counting sort, which can be very fast in some cases, but also very slow in others, so it can be used carefully. Another linear time sorting algorithm is radix sort.

Introduction

Count sort is absolutely brilliant and easy to implement. In case we sort integers in the range [n, m] on the first pass we just initialize a zero filled array with length m-n. Than on the second pass we “count” the occurrence of each integer. On the third pass we just sort the integers with an ease.

However we have some problems with that algorithm. What if we have only few items to sort that are very far from each other like [2, 1, 10000000, 2]. This will result in a very large unused data. So we need a dense integer sequence. This is important because we must know in advance the nature of the sequence which is rarely sure.

That’s why we need to use another linear time sorting algorithm for integers that doesn’t have this disadvantage. Such an algorithm is the radix sort.

Overview

The idea behind the radix sort is simple. We must look at our “integer” sequence as a string sequence. OK, to become clearer let me give you an example. Our sequence is [12, 2, 23, 33, 22]. First we take the leftmost digit of each number. Thus we must compare [_2, 2, _3, _3, _2]. Clearly we can assume that since the second number “2” is only a one digit number we can fill it up with a leading “0”, to become 02 or _2 in our example: [_2, _2, _3, _3, _2]. Now we sort this sequence with a stable sort algorithm.

What is a Stable Sort Algorithm

A stable sort algorithm is an algorithm that sorts a list by preserving the positions of the elements in case they are equal. In terms of PHP this means that:

array(0 => 12, 1=> 13, 2 => 12);

Will be sorted as follows:

array(0 => 12, 2 => 12, 1 => 13);

Thus the third element becomes second following the first element. Note that the third and the first element are equal, but the third appears later in the sequence so it remains later in the sorted sequence.

In the radix sort example, we need a stable sort algorithm, because we need to worry about only one position of digit we explore.

So what happens in our example after we sort the sequence?

As we can see we’re far from a sorted sequence, but what if we proceed with the next “position” – the decimal digit?

Than we end up with this:

Now we have a sorted sequence, so let’s summarize the algorithm in a short pseudo code.

Pseudo Code

The simple approach behind the radix sort algorithm can be described as pseudo code, assuming that we’re sorting decimal integers.

1. For each digit at position 10^0 to 10^n
1.1. Sort the numbers by this digit using a stable sort algorithm;

The thing is that here we talk about decimal, but actually this algorithm can be applied equally on any numeric systems. That is why it’s called “radix” sort.

Thus we can sort binary numbers, hexadecimals etc.

It’s important to note that this algorithm can be also used to sort strings alphabetically.

[ABC, BBC, ABA, AC]
[__C, __C, __A, __C] => [ABA, ABC, BBC, AC]
[_B_, _B_, _B_, _A_] => [AC, ABA, ABC, BBC]
[___, A__, A__, B__] => [AC, ABA, ABC, BBC]

That is simply correct because we can assume that our alphabet is another 27 digit numeric system (in case of the Latin alphabet).

Complexity

As I said in the beginning radix sort is a linear time sorting algorithm. Let’s see why. First we depend on the numeric system. Let’s assume we have a decimal numeric system – then we have N passes sorting 10 digits which is simply 10*N. In case of K digit numeric system our algorithm will be O(K*N) which is linear.

However you must note that in case we sort N numbers in an N digit numeric system the complexity will become O(N^2)!

We must also remember that in order to implement radix sort and a supporting stable sort algorithm we need an extra space.

Application

Sorting integers can be faster than sorting just anything, so any time we need to implement a sorting algorithm we must carefully investigate the input data. And that’s also the big disadvantage of this algorithm – we must know the input in advance, which is rarely the case.

Computer Algorithms: Topological Sort Revisited

Introduction

We already know what’s topological sort of a directed acyclic graph. So why do we need a revision of this algorithm? First of all I never mentioned its complexity, thus to understand why we do need a revision let’s get again on the algorithm.

We have a directed acyclic graph (DAG). There are no cycles so we must go for some kind of order putting all the vertices of the graph in such an order, that if there’s a directed edge (u, v), u must precede v in that order.

Topological Sort
 

The process of putting all the vertices of the DAG in such an order is called topological sorting. It’s commonly used in task scheduling or while finding the shortest paths in a DAG.

The algorithm itself is pretty simple to understand and code. We must start from the vertex (vertices) that don’t have predecessors.

Topological Sort - step 1
 
Continue reading Computer Algorithms: Topological Sort Revisited

Computer Algorithms: Longest Increasing Subsequence

Introduction

A very common problem in computer programming is finding the longest increasing (decreasing) subsequence in a sequence of numbers (usually integers). Actually this is a typical dynamic programming problem.

Dynamic programming can be described as a huge area of computer science problems that can be categorized by the way they can be solved. Unlike divide and conquer, where we were able to merge the fairly equal sub-solutions in order to receive one single solution of the problem, in dynamic programming we usually try to find an optimal sub-solution and then grow it.

Once we have an optimal sub-solution on each step we try to upgrade it in order to cover the whole problem. Thus a typical member of the dynamic programming class is finding the longest subsequence.

However this problem is interesting because it can be related to graph theory. Let’s find out how. Continue reading Computer Algorithms: Longest Increasing Subsequence

Computer Algorithms: Strassen’s Matrix Multiplication

Introduction

The Strassen’s method of matrix multiplication is a typical divide and conquer algorithm. We’ve seen so far some divide and conquer algorithms like merge sort and the Karatsuba’s fast multiplication of large numbers. However let’s get again on what’s behind the divide and conquer approach.

Unlike the dynamic programming where we “expand” the solutions of sub-problems in order to get the final solution, here we are talking more on joining sub-solutions together. These solutions of some sub-problems of the general problem are equal and their merge is somehow well defined.

A typical example is the merge sort algorithm. In merge sort we have two sorted arrays and all we want is to get the array representing their union again sorted. Of course, the tricky part in merge sort is the merging itself. That’s because we’ve to pass through the two arrays, A and B, and we’ve to compare each “pair” of items representing an item from A and from B. A bit off topic, but this is the weak point of merge sort and although its worst-case time complexity is O(n.log(n)), quicksort is often preferred in practice because there’s no “merge”. Quicksort just concatenates the two sub-arrays. Note that in quicksort the sub-arrays aren’t with an equal length in general and although its worst-case time complexity is O(n^2) it often outperforms merge sort.

This simple example from the paragraph above shows us how sometimes merging the solutions of two sub-problems actually isn’t a trivial task to do. Thus we must be careful when applying any divide and conquer approach.

History

Volker Strassen is a German mathematician born in 1936. He is well known for his works on probability, but in the computer science and algorithms he’s mostly recognized because of his algorithm for matrix multiplication that’s still one of the main methods that outperforms the general matrix multiplication algorithm.

Strassen firstly published this algorithm in 1969 and proved that the n^3 algorithm isn’t the optimal one. Actually the given solution by Strassen is slightly better, but his contribution is enormous because this resulted in many more researches about matrix multiplication that led to some faster approaches, i.e. the Coppersmith-Winograd algorithm with O(n^2,3737). Continue reading Computer Algorithms: Strassen’s Matrix Multiplication

Computer Algorithms: Prim’s Minimum Spanning Tree

Introduction

Along with the Kruskal’s minimum spanning tree algorithm, there’s another general algorithm that solves the problem. The algorithm of Prim.

As we already know the algorithm of Kruskal works in a pretty natural and logical way. Since we’re trying to build a MST, which is naturally build by the minimal edges of the graph (G), we sort them in a non-descending order and we start building the tree.

The algorithm of Kruskal
 

During the whole process of building the final minimum spanning tree Kruskal’s algorithm keeps a forest of trees. The number of trees in that forest decreases on each step and finally we get the minimum weight spanning tree.

A key point in the Kruskal’s approach is the way we get the “next” edge from G that should be added to one of the trees of the forest (or to connect two trees from the forest). The only thing we should be aware of is to choose an edge that’s connecting two vertices – u and v and these two shouldn’t be in the same tree. That’s all.

The Kruskal's Tricky Part
 

An important feature of the Kruskal’s algorithm is that it builds the MST just by sorting the edges by their weight and doesn’t care about a particular starting vertex.

In the same time there’s another algorithm that builds a MST – the algorithm of Prim designed by Robert Prim in 1957. Continue reading Computer Algorithms: Prim’s Minimum Spanning Tree