Computer Algorithms: Radix Sort

Introduction

Algorithms always depend on the input. We saw that general purpose sorting algorithms as insertion sort, bubble sort and quicksort can be very efficient in some cases and inefficient in other. Indeed insertion and bubble sort are considered slow, with best-case complexity of O(n2), but they are quite effective when the input is fairly sorted. Thus when you have a sorted array and you add some “new” values to the array you can sort it quite effectively with insertion sort. On the other hand quicksort is considered one of the best general purpose sorting algorithms, but while it’s a great algorithm when the data is randomized it’s practically as slow as bubble sort when the input is almost or fully sorted.

Now we see that depending on the input algorithms may be effective or not. For almost sorted input insertion sort may be preferred instead of quicksort, which in general is a faster algorithm.

Just because the input is so important for an algorithm efficiency we may ask are there any sorting algorithms that are faster than O(n.log(n)), which is the average-case complexity for merge sort and quicksort. And the answer is yes there are faster, linear complexity algorithms, that can sort data faster than quicksort, merge sort and heapsort. But there are some constraints!

Everything sounds great but the thing is that we can’t sort any particular data with linear complexity, so the question is what rules the input must follow in order to be sorted in linear time.

Such an algorithm that is capable of sorting data in linear O(n) time is radix sort and the domain of the input is restricted – it must consist only of integers.

Overview

Let’s say we have an array of integers which is not sorted. Just because it consists only of integers and because array keys are integers in programming languages we can implement radix sort.

First for each value of the input array we put the value of “1” on the key-th place of the temporary array as explained on the following diagram.

Radix sort first pass
Radix sort first pass

If there are repeating values in the input array we increment the corresponding value in the temporary array. After “initializing” the temporary array with one pass (with linear complexity) we can sort the input.

Radix sort second pass
Radix sort second pass

Implementation

Implementing radix sort is very easy in fact, which is great. The thing is that old-school programming languages weren’t so flexible and we needed to initialize the entire temporary array. That leads to another problem – we must know the interval of values from the input. Fortunately nowadays programming languages and libraries are more flexible so we can initialize our temporary array even if we don’t know the interval of input values, as on the example bellow. PHP is somewhere in the middle – it’s flexible enough to build-up arrays in the memory without knowing their size in advance, but we still must ksort them.

$list = array(4, 3, 5, 9, 7, 2, 4, 1, 6, 5);
 
function radix_sort($input)
{
    $temp = $output = array();
	$len = count($input);
 
    for ($i = 0; $i < $len; $i++) {
		$temp[$input[$i]] = ($temp[$input[$i]] > 0) 
			? ++$temp[$input[$i]]
			: 1;
    }
 
    ksort($temp);
 
    foreach ($temp as $key => $val) {
		if ($val == 1) {
			$output[] = $key; 
		} else {
			while ($val--) {
				$output[] = $key;
			}
        }
    }
 
    return $output;
}
 
// 1, 2, 3, 4, 4, 5, 5, 6, 7, 9
print_r(radix_sort($list));

The problem is that PHP needs ksort – which is completely foolish as we’re trying to sort an array using “another” sorting method, but to overcome this you must know the interval of values in advance and initialize a temporary array with 0s, as on the example bellow.

define(MIN, 1);
define(MAX, 9);
$list = array(4, 3, 5, 9, 7, 2, 4, 1, 6, 5);
 
function radix_sort(&$input)
{
    $temp = array();
	$len = count($input);
 
	// initialize with 0s
    $temp = array_fill(MIN, MAX-MIN+1, 0);
 
    foreach ($input as $key => $val) {
    	$temp[$val]++;
    }
 
    $input = array();
    foreach ($temp as $key => $val) {
	if ($val == 1) {
		$input[] = $key;
	} else {
		while ($val--) {
			$input[] = $key;
		}
	}
    }
}
 
// 4, 3, 5, 9, 7, 2, 4, 1, 6, 5
var_dump($list);
 
radix_sort(&$list);
 
// 1, 2, 3, 4, 5, 5, 6, 7, 8, 9
var_dump($list);

Here the input is modified during the sorting process and it’s used as result.

Complexity

The complexity of radix sort is linear, which in terms of omega means O(n). That is a great benefit in performance compared to O(n.log(n)) or even worse with O(n2) as we can see on the following chart.

Linear function compared to n.log(n) and n^2
Linear function compared to n.log(n) and n^2

Why using radix sort

1. It’s fast

Radix sort is very fast compared to other sorting algorithms as we saw on the diagram above. This algorithm is very useful in practice because in practice we often sort sets of integers.

Pros of radix sort

2. It’s easy to understand and implement

Even a beginner can understand and implement radix sort, which is great. You need no more than few loops to implement it.

Why NOT using radix sort

1. Works only with integers

If you’re not sure about the input better do not use radix sort. We may think that our input consists only of integers and we can go for radix sort, but what if in the future someone passes floats or strings to our routine.

Cons of radix sort

2. Requires additional space

Radix sort needs additional space – at least as much as the input.

Final Words

Radix sort is restricted by the input’s domain, but I must say that in practice there are tons of cases where only integers are sorted. This is when we get some data from the db based on primary keys – typically primary in database tables are integers as well. So practically there are lots of cases of sorting integers, so radix sort may be one very, very useful algorithm and it is so cool that it is also easy to implement.

7 thoughts on “Computer Algorithms: Radix Sort

  1. Two comments:

    In any language where you don’t have to explicitly initialize an array’s size, it is implicitly being initialized to a default size, and any time you try to add an element beyond the size of the array, a new, larger array must be allocated, and the contents of the first array copied to the second array. There’s no such thing as magic arrays that can change size at no cost.

    Because of this, your solution is actually potentially VERY slow, depending on the highest value of an integer in the list to sort, and how often the array must be reallocated and copied. It’s important to know how things work under the covers to understand their performance implications.

    Second comment: this algorithm is NOT linear. There’s no such thing as a linear sorting algorithm. O(n log n) is the best that’s possible. Here is an explanation on why radix sort is actually O(n log n): http://www.jimloy.com/computer/radix.htm

  2. I may have been hasty in declaring that radix sort is not linear. Right after posting the comment, I found another page explaining why it IS linear, or close to it. I don’t know which is true, so you can ignore my second comment. The first one still stands though. At least until that one is proven wrong too. 🙂

  3. I think this is counting sort. Radix sort sorts the data based on the digits. Eg: sort all the digits based on its LSB then move towards sorting based on its MSB.
    Radix sort internally uses counting sort(the one you’ve mentioned above) to sort the data. Reference: ‘Introduction To Algorithms’ by CLRS

  4. works only with arrays of *small* *positive* integers. If your data is made of a sparse set of random large integers, you end up consuming a very large amount of memory in the first pass.

  5. If I am not wrong this version will create a stack with only 10 values, (each value will be an array of values with the same current digit). This will sort any kind of array as long as they have the same count of digits! For integers with leading zeros (if any) quotes are required: e.g: ‘00012’


    -1)
    goto LOOP;

    echo implode(', ', $L)."\n\n";
    }
    ?>

Leave a Reply

Your email address will not be published. Required fields are marked *