Typical data sequences for testing sorting algorithms

There is not "best sorting algorithm" for all data. Various algorithms have their own strength and weaknesses. For example some "sense" already sorted or "almost sorted" data sequences and perform faster on such sets. In this sense Knuth math analysis is insufficient althouth "worst time" estimates are useful.

As for input data it is useful to distinguish between the following broad categories that all should be used in testing (random number sorting is a very artificial test and as such the estimate it provides does not have much practical value, unless we know that other cases behave similarly or better):

Completely randomly reshuffled array (this is the only test that naive people use in evaluating sorting algorithms) .
Already sorted array (you need to see how horrible Quicksort is on this case and how good shellsort is ;-). This is actually a pretty important case as often sorting is actually resorting of previously sorted data done after minimal modifications of the data set. There are three imortant case of already sorted array
- Array of distinct elements sorted in "right" direction for which no any reordering is required (triangular array).
- Already sorted array consisting of small number of identical elements (stairs). The worst case is retangular array in when single element is present (all values are identical).
- Already sorted in reverse order array (many algorithms, such as insertion sort work slow on such an array)
Array that consisted of merged already sorted arrays(Chainsaw array). Arrays can be sorted
1. in right direction
2. in opposite direction of have arrays
3. both sorted in right and opposite direction (one case is "symmetrical chainsaw").
Array consisting of small number of identical elements (sometimes called or "few unique" case). If number of distint elements is large this is the case similar to chainsaw but without advantage of preordeing. So it can ge generated by "inflicting" certain number of permuations on chainsaw array. Worst case is when there is just two values of elements in the array (binary array). Quicksort is horrible on such data. Many other algorithms work slow on such an array.
Already sorted in right direction array with N permutations (with N from 0.1 to 10% of the size). Insrtion soft does well on such arrays. Shellsort also is quick. Quick sort do not adapt well to nearly sorted data.
Already sorted array in reverse order array with N permutations
Large data sets with normal distribution of keys.
Pseudorandom data (daily values of S&P500 or other index for a decade or two might be a good test set here

Learn Technology

Wednesday, May 18, 2011

Typical data sequences for testing sorting algorithms

Artikel Terkait:

Comments :

Post a Comment

About Me

Archive

Blog List