Sigo a vueltas con la ordenación.

Una pequeña discusión el la talk-page de sorting algoritms de Wikipedia, sobre un argumento que ya me dió Javifields en su día.

Comparison sort lower bound

I think a reference to Timsort (Python adaptive mergesort) should be done. Also, with adaptive comparison sorting algorithms in mind, I think there’s not yet a proof that log(n!) (n log n) is the best average order achievable. All that’s been proved is that worst case for any comparison sorting algorithm is n log n.

Re: Timsort. It’s certainly an interesting algorithm, particularly because it comes very close to the lower limit on the number of comparisons. The problem may be that there has been very litte scientific research into it. Tim Peters published some notes on it, but they consist mainly of benchmarks. Any article on it would be very brief, or fall foul of the “no original research” rule.
Re: O(n log n) bound. As far as I can see it’s basic information theory. Assuming n distinct keys, every comparison yields at most 1 bit of information, and that only in the case when you perform a comparison that splits the remaining permutation space exactly in two halves. So, assuming you have n! permutations with equal probability, lg(n!) comparisons on average is the absolute minimum. Grotendeels Onschadelijk 01:38, 19 August 2007 (UTC)
Re: I’m the one above. I know this basic information theory, Grotendeels, but if I’m right thats only a proof for the worst case, since by making a first sequential pass on the array you can use adaptive algorithms (even quicksort is adaptive!) to do less tan log(n!) comps. Also, an optimum alogithm, MUST do a first sequential pass on the array ( for O(n) case if ordered or inverted). Then the bound is O(n log n) ONLY if u can’t use the array pre-order. Maybe I’m wrong, but I think it’s possible to make an algorithm with an O(n log n)worst case bound, half that comparations on average, and O(n) best case. I have an idea to make it, but i’m unable to translate that to code without an enormous use of memory (at least n log(n)).Azrael60 23:28, 20 August 2007 (UTC)
Yes, it is possible to get algorithms with O(n log n) worst case. Mergesort is an example. Yes, it is possible to get O(n) for a limited number of cases. However, under the assumptions
  • all keys are distinct, i.e. every comparison will give a>b or a<b, and
  • the input is a random permutation, chosen uniformly from the set of all possible permutations,
it is impossible to even determine which order the input is in, in less that log2(n!) comparisons on average.
The short argument is this: The Shannon entropy of a random permutation, distributed uniformly over all permutations of n elements is log2(n!) bits. Since a comparison can give only two results, the maximum amount of information it can give is 1 bit. Therefore after k comparisons the remaining entropy of the permutation, given the results of those comparisons, is at least log2(n!) - k on average. To perform the sort, we need complete information, that is, we need the remaining entropy to be 0. So we need at least k >= log2(n!). As far as I can see I have made no worst case assumtions here, just the two assumptions noted above. Note, however, that this differs from the worst case argument, in that it does not allow rounding up to the nearest integer. For example, for n = 3, the lower bound for the worst case is 3, the lower bound for the average case as shown above is approximately 2.5849625007211561, while the highest lower bound for the average case is 8/3. Grotendeels Onschadelijk 01:18, 25 August 2007 (UTC)
I have added a slightly improved version of this argument under Comparison sort#Lower bound for the average number of comparisons. Grotendeels Onschadelijk 15:46, 25 August 2007 (UTC)”
A continuación el argumento en mayor profundidad (tendré que pensar en ello), en la página de cota mínima.

Lower bound for the average number of comparisons

The Shannon entropy of a random permutation, distributed uniformly over all permutations of n elements is log2(n!) bits. Since a comparison can give only two results, the maximum amount of information it provides is 1 bit. Therefore after k comparisons the remaining entropy of the permutation, given the results of those comparisons, is at least log2(n!) - k on average. To perform the sort, complete information is needed, so the remaining entropy must be 0. It follows that k must be at least log2(n!).

Note that this differs from the worst case argument given above, in that it does not allow rounding up to the nearest integer. For example, for n = 3, the lower bound for the worst case is 3, the lower bound for the average case as shown above is approximately 2.58, while the highest lower bound for the average case is 8/3.

In the case that multiple items may have the same key, there is no obvious statistical interpretation for the term “average case”, so an argument like the above cannot be applied without making specific assumptions about the distribution of keys.”

Leave a Reply