We know that finding the minimum in a list of integers is a fairly simple task, but what about finding the i-th smallest element? Then the task isn’t that trivial and we have to think for a different approach.
First of all there are some very basic and intuitive approaches. Since finding the minimum is so easy, can we just find the minimum, than exclude it from the list and then search the minimum again until we find the i-th smallest element.
To find the minimum value into an array of items itsn’t difficult. There are not many options to do that. The most natural approach is to take the first item and to compare its value against the values of all other elements. Once we find a smaller element we continue the comparisons with its value. Finally we find the minimum.
First thing to note is that we pass through the array with n steps and we need exactly n-1 comparisons. It’s clear that this is the optimal solution, because we must check all the elements. For sure we can’t be sure that we’ve found the minimum (maximum) value without checking every single value. Continue reading Computer Algorithms: Minimum and Maximum→
When it comes to sorting items by comparing them merge sort is one very natural approach. It is natural, because simply divides the list into two equal sub-lists then sort these two partitions applying the same rule. That is a typical divide and conquer algorithm and it just follows the intuitive approach of speeding up the sorting process by reducing the number of comparisons. However there are other “divide and conquer” sorting algorithms that do not follow the merge sort scheme, while they have practically the same success. Such an algorithm is quicksort.
Back in 1960 C. A. R. Hoare comes with a brilliant sorting algorithm. In general quicksort consists of some very simple steps. First we’ve to choose an element from the list (called a pivot) then we must put all the elements with value less than the pivot on the left side of the pivot and all the items with value greater than the pivot on its right side. After that we must repeat these steps for the left and the right sub-lists. That is quicksort! Simple and elegant!
I wrote about binary search in my previous post, which is indeed one very fast searching algorithm, but in some cases we can achieve even faster results. Such an algorithm is the “interpolation search” – perhaps the most interesting of all searching algorithms. However we shouldn’t forget that the data must follow some limitations. In first place the array must be sorted. Also we must know the bounds of the interval.
Why is that? Well, this algorithm tries to follow the way we search a name in a phone book, or a word in the dictionary. We, humans, know in advance that in case the name we’re searching starts with a “B”, like “Bond” for instance, we should start searching near the beginning of the phone book. Thus if we’re searching the word “algorithm” in the dictionary, you know that it should be placed somewhere at the beginning. This is because we know the order of the letters, we know the interval (a-z), and somehow we intuitively know that the words are dispersed equally. These facts are enough to realize that the binary search can be a bad choice. Indeed the binary search algorithm divides the list in two equal sub-lists, which is useless if we know in advance that the searched item is somewhere in the beginning or the end of the list. Yes, we can use also jump search if the item is at the beginning, but not if it is at the end, in that case this algorithm is not so effective.
So the interpolation search is based on some simple facts. The binary search divides the interval on two equal sub-lists, as shown on the image bellow.
What will happen if we don’t use the constant ½, but another more accurate constant “C”, that can lead us closer to the searched item.
In my previous article I discussed how the sequential (linear) search can be used on an ordered lists, but then we were limited by the specific features of the given task. Obviously the sequential search on an ordered list is ineffective, because we consecutively check every one of its elements. Is there any way we can optimize this approach? Well, because we know that the list is sorted we can check some of its items, but not all of them. Thus when an item is checked, if it is less than the desired value, we can skip some of the following items of the list by jumping ahead and then check again. Now if the checked element is greater than the desired value, we can be sure that the desired value is hiding somewhere between the previously checked element and the currently checked element. If not, again we can jump ahead. Of course a good approach is to use a fixed step. Let’s say the list length is n and the step’s length is k. Basically we check list(0), then list(k-1), list(2k-1) etc. Once we find the interval where the value might be (m*k-1 < x <= (m+1)*k – 1), we can perform a sequential search between the last two checked positions. By choosing this approach we avoid a lot the weaknesses of the sequential search algorithm. Many comparisons from the sequential search here are eliminated.
How to choose the step’s length
We know that it is a good practice to use a fixed size step. Actually when the step is 1, the algorithm is the traditional sequential search. The question is what should be the length of the step and is there any relation between the length of the list (n) and the length of the step (k)? Indeed there is such a relation and often you can see sources directly saying that the best length k = √n. Why is that?
Well, in the worst case, we do n/k jumps and if the last checked value is greater than the desired one, we do at most k-1 comparisons more. This means n/k + k – 1 comparisons. Now the question is for what values of k this function reaches its minimum. For those of you who remember maths classes this can be found with the formula -n/(k^2) + 1 = 0. Now it’s clear that for k = √n the minimum of the function is reached.
Of course you don’t need to prove this every time you use this algorithm. Instead you can directly assign √n to be the step length. However it is good to be familiar with this approach when trying to optimize an algorithm.
Let’s cosider the following list: (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610). Its length is 16. Jump search will find the value of 55 with the following steps.