Tag Archives: PHP

Computer Algorithms: Sorting in Linear Time

Radix Sort

The first question when we see the phrase “sorting in linear time” should be – where’s the catch? Indeed there’s a catch and the thing is that we can’t sort just anything in linear time. Most of the time we can speak on sorting integers in linear time, but as we can see later this is not the only case.

Since we speak about integers, we can think of a faster sorting algorithm than usual. Such an algorithm is the counting sort, which can be very fast in some cases, but also very slow in others, so it can be used carefully. Another linear time sorting algorithm is radix sort.

Introduction

Count sort is absolutely brilliant and easy to implement. In case we sort integers in the range [n, m] on the first pass we just initialize a zero filled array with length m-n. Than on the second pass we “count” the occurrence of each integer. On the third pass we just sort the integers with an ease.

However we have some problems with that algorithm. What if we have only few items to sort that are very far from each other like [2, 1, 10000000, 2]. This will result in a very large unused data. So we need a dense integer sequence. This is important because we must know in advance the nature of the sequence which is rarely sure.

That’s why we need to use another linear time sorting algorithm for integers that doesn’t have this disadvantage. Such an algorithm is the radix sort.

Overview

The idea behind the radix sort is simple. We must look at our “integer” sequence as a string sequence. OK, to become clearer let me give you an example. Our sequence is [12, 2, 23, 33, 22]. First we take the leftmost digit of each number. Thus we must compare [_2, 2, _3, _3, _2]. Clearly we can assume that since the second number “2” is only a one digit number we can fill it up with a leading “0”, to become 02 or _2 in our example: [_2, _2, _3, _3, _2]. Now we sort this sequence with a stable sort algorithm.

What is a Stable Sort Algorithm

A stable sort algorithm is an algorithm that sorts a list by preserving the positions of the elements in case they are equal. In terms of PHP this means that:

array(0 => 12, 1=> 13, 2 => 12);

Will be sorted as follows:

array(0 => 12, 2 => 12, 1 => 13);

Thus the third element becomes second following the first element. Note that the third and the first element are equal, but the third appears later in the sequence so it remains later in the sorted sequence.

In the radix sort example, we need a stable sort algorithm, because we need to worry about only one position of digit we explore.

So what happens in our example after we sort the sequence?

As we can see we’re far from a sorted sequence, but what if we proceed with the next “position” – the decimal digit?

Than we end up with this:

Now we have a sorted sequence, so let’s summarize the algorithm in a short pseudo code.

Pseudo Code

The simple approach behind the radix sort algorithm can be described as pseudo code, assuming that we’re sorting decimal integers.

1. For each digit at position 10^0 to 10^n
1.1. Sort the numbers by this digit using a stable sort algorithm;

The thing is that here we talk about decimal, but actually this algorithm can be applied equally on any numeric systems. That is why it’s called “radix” sort.

Thus we can sort binary numbers, hexadecimals etc.

It’s important to note that this algorithm can be also used to sort strings alphabetically.

[ABC, BBC, ABA, AC]
[__C, __C, __A, __C] => [ABA, ABC, BBC, AC]
[_B_, _B_, _B_, _A_] => [AC, ABA, ABC, BBC]
[___, A__, A__, B__] => [AC, ABA, ABC, BBC]

That is simply correct because we can assume that our alphabet is another 27 digit numeric system (in case of the Latin alphabet).

Complexity

As I said in the beginning radix sort is a linear time sorting algorithm. Let’s see why. First we depend on the numeric system. Let’s assume we have a decimal numeric system – then we have N passes sorting 10 digits which is simply 10*N. In case of K digit numeric system our algorithm will be O(K*N) which is linear.

However you must note that in case we sort N numbers in an N digit numeric system the complexity will become O(N^2)!

We must also remember that in order to implement radix sort and a supporting stable sort algorithm we need an extra space.

Application

Sorting integers can be faster than sorting just anything, so any time we need to implement a sorting algorithm we must carefully investigate the input data. And that’s also the big disadvantage of this algorithm – we must know the input in advance, which is rarely the case.

Computer Algorithms: Prim’s Minimum Spanning Tree

Introduction

Along with the Kruskal’s minimum spanning tree algorithm, there’s another general algorithm that solves the problem. The algorithm of Prim.

As we already know the algorithm of Kruskal works in a pretty natural and logical way. Since we’re trying to build a MST, which is naturally build by the minimal edges of the graph (G), we sort them in a non-descending order and we start building the tree.

The algorithm of Kruskal
 

During the whole process of building the final minimum spanning tree Kruskal’s algorithm keeps a forest of trees. The number of trees in that forest decreases on each step and finally we get the minimum weight spanning tree.

A key point in the Kruskal’s approach is the way we get the “next” edge from G that should be added to one of the trees of the forest (or to connect two trees from the forest). The only thing we should be aware of is to choose an edge that’s connecting two vertices – u and v and these two shouldn’t be in the same tree. That’s all.

The Kruskal's Tricky Part
 

An important feature of the Kruskal’s algorithm is that it builds the MST just by sorting the edges by their weight and doesn’t care about a particular starting vertex.

In the same time there’s another algorithm that builds a MST – the algorithm of Prim designed by Robert Prim in 1957. Continue reading Computer Algorithms: Prim’s Minimum Spanning Tree

Computer Algorithms: Shortest Path in a Directed Acyclic Graph

Introduction

We saw how to find the shortest path in a graph with positive edges using the Dijkstra’s algorithm. We also know how to find the shortest paths from a given source node to all other nodes even when there are negative edges using the Bellman-Ford algorithm. Now we’ll see that there’s a faster algorithm running in linear time that can find the shortest paths from a given source node to all other reachable vertices in a directed acyclic graph, also known as a DAG.

Because the DAG is acyclic we don’t have to worry about negative cycles. As we already know it’s pointless to speak about shortest path in the presence of negative cycles because we can “loop” over these cycles and practically our path will become shorter and shorter.

Negative Cycles
The presence of a negative cycles make our atempt to find the shortest path pointless!

Thus we have two problems to overcome with Dijkstra and the Bellman-Ford algorithms. First of all we needed only positive weights and on the second place we didn’t want cycles. Well, we can handle both cases in this algorithm. Continue reading Computer Algorithms: Shortest Path in a Directed Acyclic Graph

Computer Algorithms: Bellman-Ford Shortest Path in a Graph

Introduction

As we saw in the previous post, the algorithm of Dijkstra is very useful when it comes to find all the shortest paths in a weighted graph. However it has one major problem! Obviously it doesn’t work correctly when dealing with negative lengths of the edges.

We know that the algorithm works perfectly when it comes to positive edges, and that is absolutely normal because we try to optimize the inequality of the triangle.

Dijkstra's Approach
Since all the edges are positive we get the closest one!

Since Dijkstra’s algorithm make use of a priority queue normally we get first the shortest adjacent edge to the starting point. In our very basic example we’ll get first the edge with the length of 3 -> (S, A).

However when it comes to negative edges we can’t use any more priority queues, so we need a different, yet working solution. Continue reading Computer Algorithms: Bellman-Ford Shortest Path in a Graph

Computer Algorithms: Dijkstra Shortest Path in a Graph

Introduction

We already know how we can find the shortest paths in a graph starting from a given vertex. Practically we modified breadth-first search in order to calculate the distances from s to all other nodes reachable from s. We know that this works because BFS walks through the graph level by level.

BFS Shortest Paths
BFS is often used to find shortest paths between a starting node (s) and all other reachable nodes in a graph!

Some sources give a very simple explanation of how BFS finds the shortest paths in a graph. We must just think of the graph as a set of balls connected through strings.

The Graph as Balls and Strings
We can think of a graph as a set of balls connected through strings!

As we can see by lifting the ball called “S” all other balls fall down. The closest balls are directly connected to “s” and this is the first level, while the outermost balls are those with longest paths.

The Graph as Balls and Strings Levels
Breadth-first search works much like the image above – it explores the graph level by level, thus we’re sure that all the paths are the shortest!

Clearly edges like those between A and B doesn’t matter for our BFS algorithm because they don’t make the path from S to C through B shorter. This is also known as the triangle inequality, where the sum of the lengths of two of the sides of the triangle is always greater than the length of the third side.

Triangle inequality
What the triangle inequality says us is that if we have a direct edge between two nodes – that must be the shortest path between them!

We must only answer the question is BFS the best algorithm that finds the shortest path between any two nodes of the graph? This is a reasonable question because as we know by using BFS we don’t find only the shortest path between given vertices i and j, but we also get the shortest paths between i and all other vertices of G. This is an information that we actually don’t need, but can we find the shortest path between i and j without that info? Continue reading Computer Algorithms: Dijkstra Shortest Path in a Graph