Computer Algorithms: Order Statistics

Introduction

We know that finding the minimum in a list of integers is a fairly simple task, but what about finding the i-th smallest element? Then the task isn’t that trivial and we have to think for a different approach.

First of all there are some very basic and intuitive approaches. Since finding the minimum is so easy, can we just find the minimum, than exclude it from the list and then search the minimum again until we find the i-th smallest element.

Finding the Minimums

That is a pure brute-force-like algorithm and it is extremely slow. In this case if we’re looking for the 99-th smallest element into an array of 100 items it will be quite inefficient. In other words this isn’t the best approach.

Another fairly intuitive approach is to sort the list in first place and then search the i-th element.

Sort and Seach
First we can sort the list and then search for the i-th element!

This is better than the our first attempt because we’ll need the time to sort the array and then search (in linear time) the i-th element.

In this case we need to find out which of the sorting algorithms we will use. Will it be merge sort (with constant O(n.lg(n)) complexity) or quicksort (with O(n2) in the worst case, but O(n.lg(n)) average complexity) or bubble sort (O(n^n) in the best-case scenario) it’s a developer choice.

However there is one very clever and yet more efficient approach, based on some observations.

Overview

If we’re looking for the i-th element and we decided that the list must be sorted first, we don’t need to fully sort it in order to find the desired element.

In case the list is sorted it’s easy to find which is the i-th element. However if the i-th element is in its place, the only thing we need to know is that the items on the left side of the i-th element are smaller and the items on the right side are greater. We don’t need the left and the right side ordered.

Don't need ordered sub-lists

In the other hand this approach looks very much like quicksort. There during the sorting process we put the items smaller than the “pivot” on its left and the items greater than the pivot on its right. After that partitioning we executed quicksort on the left and on the right sub-lists.

Here the approach is similar with very small changes. First we choose a pivot. Then we make two partitions of the list – one left sub-list with all the elements with smaller values than the pivot and one right sub-list with all the elements with a greater value than the pivot.

Choose a pivot and partition
Just like quicksort we chose a pivot and then we partition the list into two sub-lists!

Now we check the length of the left sub-list. If it is greater than i we continue recursively with the left sub-list and again we’re searching for the i-th element.

In case the length of the left sub-list is smaller than i, we continue with the right sub-list. However this time we don’t search for the i-th element, but for the i – length(LEFT).

Implementation

The following implementation is in PHP. It’s important to note that at each step we need two non-empty sub-list. That is why we take the pivot (by extracting the last item of the list) and then making two sub-lists. In case one of the sub-lists is empty we append the pivot in it. Thus we’re always partitioning the list into two non-empty sub-lists.

$list = array(3,4,5,7,8,2,5,6,9,0,1);
 
function partition($list, $pivot)
{
	$left = $right = array();
 
	$len = count($list);
	for ($i = 0; $i < $len; $i++) {
		if ($list[$i] <= $pivot) {
			$left[] = $list[$i];
		} else {
			$right[] = $list[$i];
		}
	}
 
	if (count($left) == 0) {
		$left[] = $pivot;
	} else {
		$right[] = $pivot;
	} 
 
	return array($left, $right);
}
 
function order_statistic($list, $i)
{
	if (count($list) == 1) {
		return $list[0];
	}
 
	// ceate a non empty partitions
	// extract the pivot from the list and
	// in case one of the sub-lists is empty
	// add the pivot there!
	$pivot = array_pop($list);
	list($left, $right) = partition($list, $pivot);
 
	if (count($left) >= $i) {
		return order_statistic($left, $i);
	} else {
		return order_statistic($right, $i - count($left));
	}
}
 
// 4
echo order_statistic($list, 5);

Application

Finding the minimum and maximum is easy, however sometimes we don’t search for them, but for the second, third or i-th smallest element. Then our task becomes a bit more difficult. This algorithm can be useful in many practical cases and shows us how different kind of algorithms may be related – exactly as this algorithm is related to quicksort in its principles.

Leave a Reply

Your email address will not be published. Required fields are marked *