Introduction
What’s the fastest way to sort the following sequence [9, 3, 0, 5, 4, 1, 2, 6, 8, 7]? Well, the question is a bit tricky since the input is somehow “predefined”. First of all we have only integers, and fortunately they are all different. That’s great and we know that in practice it’s almost impossible to count on such lucky coincidence. However here we can sort the sequence very quickly.
First of all we can pass through all these integers and by using an auxiliary array we can just put them at their corresponding index. We know in advance that that is going to work really well, because they are all different.
There is only one major problem in this solution. That’s because we assume all the integers are different. If not – we can just put all them in one single corresponding index.
That is why we can use bucket sort.
Overview
Bucket sort it’s the perfect sorting algorithm for the sequence above. We must know in advance that the integers are fairly well distributed over an interval (i, j). Then we can divide this interval in N equal sub-intervals (or buckets). We’ll put each number in its corresponding bucket. Finally for every bucket that contains more than one number we’ll use some linear sorting algorithm.
The thing is that we know that the integers are well distributed, thus we expect that there won’t be many buckets with more than one number inside.
That is why the sequence [1, 2, 3, 2, 1, 2, 3, 1] won’t be sorted faster than [4, 3, 1, 2, 9, 5, 4, 8].
Pseudo Code
1. Let n be the length of the input list L; 2. For each element i from L 2.1. If B[i] is not empty 2.1.1. Put A[i] into B[i] using insertion sort; 2.1.2. Else B[i] := A[i] 3. Concatenate B[i .. n] into one sorted list;
Complexity
The complexity of bucket sort isn’t constant depending on the input. However in the average case the complexity of the algorithm is O(n + k) where n is the length of the input sequence, while k is the number of buckets.
The problem is that its worst-case performance is O(n^2) which makes it as slow as bubble sort.
Application
As the other two linear time sorting algorithms (radix sort and counting sort) bucket sort depends so much on the input. The main thing we should be aware of is the way the input data is dispersed over an interval.
Another crucial thing is the number of buckets that can dramatically improve or worse the performance of the algorithm.
This makes bucket sort ideal in cases we know in advance that the input is well dispersed.
how do we decide on the interval size here? in your second example we could have decided to choose buckets of size (0-5,5-10,10-15,15-20,20-25). What is the best approach to decide bucket size?
Hi,
I have found your explanation quite useful and I have tried to apply them to test the different sort presented in your blog. The result (in golang) can be found at: https://github.com/thoroc/go_test/blob/master/sort.go
why we use bucket sort? what problem is encountered in other sorting algorithms that we need to use bucket sort?…problem statement for bucket sort??
How do you count number of swaps and comparison in Bucket sort? Do you count swaps and comparisons from Insertion sort only or something more?
For the why question above, because most sorting algorithms are at best O(n log n) and at worst O(n^2) or worse. Bucket sort is O(n) (Assuming n is somewhat large and you’re buckets are relatively small as mentioned in the article) That’s the worst time under those assumptions. http://stackoverflow.com/questions/7311415/how-is-the-complexity-of-bucket-sort-is-onk-if-we-implement-buckets-using-lin
For an array with 10,000 elements, that’s a difference between 10,000 and ~100,000 in the best case and between 10,000 and 100,000,000 in the worst case.
How did you decide on how many buckets to use ? And how do u calculate which array element goes into which bucket ? If we go by your second example of using ranges to represent bucket, don’t you think there is an overhead in calculating which bucket an element belongs to ..