# VBForums CodeBank > CodeBank - Visual Basic 6 and earlier >  VB - Quick Sort algorithm (very fast sorting algorithm)

## CVMichael

The example is sorting string arrays, but you can modify it to use any type of arrays, just replace "As String" to the type you need...

VB Code:
Option Explicit
 Private Sub Form_Load()
    Dim MyStrArray() As String, K As Long, Q As Long
    ReDim MyStrArray(1 To 10)
    
    Randomize
    
    Debug.Print "Unsorted strings:"
    For K = LBound(MyStrArray) To UBound(MyStrArray)
        
        ' create a random string
        MyStrArray(K) = String(10, " ")
        For Q = 1 To 10
            Mid$(MyStrArray(K), Q, 1) = Chr(Asc("A") + Fix(26 * Rnd))
        Next Q
        
        ' print the string to the immediate window
        Debug.Print MyStrArray(K)
    Next K
    
    ' sort the array
    QuickSort MyStrArray, LBound(MyStrArray), UBound(MyStrArray)
    
    ' print the sorted string to the immediate window
    Debug.Print vbNewLine & "Sorted strings:"
    For K = LBound(MyStrArray) To UBound(MyStrArray)
        Debug.Print MyStrArray(K)
    Next K
End Sub
 Private Sub QuickSort(C() As String, ByVal First As Long, ByVal Last As Long)
    '
    '  Made by Michael Ciurescu (CVMichael from vbforums.com)
    '  Original thread: [url]http://www.vbforums.com/showthread.php?t=231925[/url]
    '
    Dim Low As Long, High As Long
    Dim MidValue As String
    
    Low = First
    High = Last
    MidValue = C((First + Last) \ 2)
    
    Do
        While C(Low) < MidValue
            Low = Low + 1
        Wend
        
        While C(High) > MidValue
            High = High - 1
        Wend
        
        If Low <= High Then
            Swap C(Low), C(High)
            Low = Low + 1
            High = High - 1
        End If
    Loop While Low <= High
    
    If First < High Then QuickSort C, First, High
    If Low < Last Then QuickSort C, Low, Last
End Sub
 Private Sub Swap(ByRef A As String, ByRef B As String)
    Dim T As String
    
    T = A
    A = B
    B = T
End Sub

----------


## aidan

Good stuff

----------


## freescale

Yes, it's very usefull.

But i've a small (?) problem with it...

My array has 35 elements.

Every time when i try to search in this one, at element 4, my program ends in an endless loop.

The debugger says, that Mid is 32, and the High Value 33.
So it starts always again at 32 (rounding error).

Is there any way to fix that?

PS: Sorry 4 my bad english  :Wink:  I#m still working on it....

Regards from Frankfurt/Germany,
 :Duck:  freescale

----------


## CVMichael

I tested it for 35 elements, and it worked perfectrly. I also tried with 34, and 36 elements (just in case), and it still worked fine...

Maybe there's something wrong with your computer ?

What VB version do you have ?

----------


## freescale

I am currently using Visual Basic 6 without any Service Packs.
But i've just saw that i've posted in the wrong thread. I ment the Function ArrBinarySearch 
in the thread http://vbforums.com/showthread.php?t=231934

----------


## Hack

> I am currently using Visual Basic 6 without any Service Packs.
> But i've just saw that i've posted in the wrong thread. I ment the Function ArrBinarySearch 
> in the thread http://vbforums.com/showthread.php?t=231934


You really should apply at least SP5 to your VB installation.

----------


## freescale

Yes, at home i had this.
But here at work it isn't possible, because all the software runs on a software server.
So i haven't the possibility to install anything.

----------


## mc900ken

Ok so I set up a small program that uses this to sort data files containing address information to sort by zip code.

I read the lines in and put the zips into the array.  At the same time I put the record number in a separate array.  

I send the zip array into the sorting procedure.  And anytime I run the swap routine on the zip code array I run the swap on the record number array.

At then end I have a list of record numbers sorted in the correct order for zip code sort.

I then open the input file for random access read and get records one at a time in the order of the record number array and then print them to an output file.

My problem and question is this.  When I retrieve the records and print them at the end the process in very slow compared to every other step in the process.  Is there anything anyone knows I can do to make this process much faster?

Thank you,

----------


## Ben321

Does VB have a built in function that can either sort arrays, or automatically find the minimum or maximum values in arrays?

----------


## CVMichael

Not VB6... that's why we have to make our own functions for that.

VB.NET does ! that's why we all should switch to .NET and don't look back.

----------


## lilley_peter

Just an observation that made the code a fraction quicker for me:

On line 54 which reads:
Swap C(Low), C(High)

This is being called even when Low and High "meet" e.g. they both refer to the same array element.

I just put an if statement around them e.g.

if low <> high then
Swap C(Low), C(High)
endif

----------


## yrwyddfa

This sort routine has a bug in it.

(Low + High) \ 2 might overflow, replace with Low + (High - Low) \ 2 won't.

----------


## Jonney

> this sort routine has a bug in it.
> 
> (low + high) \ 2 might overflow, replace with low + (high - low) \ 2 won't.


good catch!

----------


## CVMichael

> good catch!


Correct! it MIGHT overflow! ... let's see WHEN?

So, a Long data type is 32 bits, but we are using a signed integer, so 31 bits, and that is 2,147,483,648... those are *array items*...
Now lets see how much memory you would need to store that many items (so that you get the overflow)

So, internally, each item is a pointer to another location in the memory (basically an array of pointers) each one pointing to a string structure (a BSTR, I can't find anywhere the actual structure of BSTR), similar to this one, and that structure is 12 bytes (on a x86 processor) (and there is another pointer to the actual string... and lets say on average the string is 10 bytes? so let's do the math...

2^31 * 4 * 12 * 10 = 1.0307922e+12
that number is in bytes... so let's convert that into GBytes:
(2^31 * 4 * 12 * 10) / 1073741824 = 960 GBytes

So... you will get an overflow error if the total memory your array takes is approximately (with a few assumptions) 960 GBytes that has to be stored in RAM (by the way)... and again... this is if your average string is 10 bytes long... and since all strings are unicode, that means 5 characters each string

So... in conclusion... let's add another operation to the loop to slow it down, because we might get an overflow sometime in the future when computers will have a few terabytes of RAM... it might not be that much far away...

But.... good catch though  :Wink: 


[Edit]... if you port this code to VB.net then the Long data type is 64 bits.... so.... do you still think you might get an overflow?

----------


## georgekar

For strings  


```
Private Sub QuicksortString(List() As String, ByVal min As Integer, ByVal max As Integer)
Dim med_value As String
Dim hi As Integer
Dim lo As Integer
Dim i As Integer
    If max <= min Then Exit Sub
    i = Int((max - min + 1) * Rnd + min)
    med_value = List(i)
     List(i) = List(min)
    lo = min
    hi = max
    Do
        Do While List(hi) >= med_value
            hi = hi - 1
            If hi <= lo Then Exit Do
        Loop
        If hi <= lo Then
            List(lo) = med_value
            Exit Do
        End If
        List(lo) = List(hi)
       lo = lo + 1
        Do While List(lo) < med_value
            lo = lo + 1
            If lo >= hi Then Exit Do
        Loop
        If lo >= hi Then
            lo = hi
            List(hi) = med_value
            Exit Do
        End If
        
        ' Swap the lo and hi values.
        List(hi) = List(lo)
    Loop
    QuicksortString List(), min, lo - 1
    QuicksortString List(), lo + 1, max
End Sub
```

----------


## CVMichael

> For strings...


OK... so the code I posted in this thread 11 years ago, is also sorting strings.... so what's the point of your post? you have to be more descriptive than that! Or is this the quality of posts we should expect these days?

Woow... why is there a "Rnd" in there? is this what my code evolved into?  :EEK!:  since when randomising the search gives faster result? (assuming that's why you post the code?)

----------


## georgekar

You have fun...
This is not a code to make someone smarter..it just a sorting routine. I am not the inventor of quicksort. It is very fast, but as you can see the random isn't actually the good point  (it is a point of interest). The bad news first: This function uses integer offsets and not long. So anyone can change that with long offsets. The good news: This function uses integers so it is faster. The very good news: This is very fast because no waste time provided for swapping the value that it is in semi swap state. As you can see there isn't a swap function or a a call like that one on the code that you provide. Why? Because we get value as a "middle" (or by random...not the middle always, why middle is better....do you know why?) and that value putting in the right position...because at that position we hold the value in an other "middle" value..So not needed for swap function. And that’s why the point of my post is for improving the readers knowledge and not to conflict with your scope of posting
Hello From Sunny Greece...
Have a nice day.

----------


## georgekar

Don't get it personal CVMichalel but your code is fault. I check it without running, while I explain the code I post...



```
    i = Int((max - min + 1) * Rnd + min)
    med_value = List(i)
     List(i) = List(min)    ' this is the beautiful point of the code.
```

When we enter the outer Do Loop we have change the List(i) with List(min) so we have 2 times List(min) item.
The Rnd function needed to us to pick any value not the value at the middle..There is  always a best item to pick, and that item isn't in the middle. So with the rnd function...maybe we found it..In worse scenario we are the same as for the choice of middle...In that scenario all items are in the wrong position. 
The outer loop must find the place of chosen as "middle".
In the first inner loop we setting the hi pointer to the item less than"middle".
If we don't find any smaller then this is true hi<=lo  and we put in List(lo) the "middle" item. So  we found the right position...we perform a swap in three steps, two steps out of outer loop and one in the inner loop.

Lets see what your routine do at the same situation.
You pick the middle as actually middle (not a problem here), so we say that this item is the lowest from all other
You skip the first inner loop, because you get an early false
You going through all items in the second loop....

if you don't find any item less that the middle....you get a BIG error because your pointer goes beyond limit...
So you have to place as in code that i post (i can't remember which part is mine...but I remember that I do some changes), a line with a if clause like that
If high < low Then Exit Do ' I change <= to < because you have this in the outer loop "Loop While Low <= High"

So perhaps you correct your code, we have a High<Low condition.
So the next "if" clause is skipped. Because you "Loop While Low <= High"  your loop end without nothing doing
next ...the First < High  is false (high<low  and first is low now)
next ....the Low<Last  is true (low is first now and last is the first high)...
So you call exactly the same QuickSort C, Low, Last  .FOREVER 
And for all your life...Your code is not working my friend...for 11 years.
Have you ever put your code in an application??

----------


## yrwyddfa

> Correct! it MIGHT overflow! ... let's see WHEN?
> 
> So, a Long data type is 32 bits, but we are using a signed integer, so 31 bits, and that is 2,147,483,648... those are *array items*...
> Now lets see how much memory you would need to store that many items (so that you get the overflow)
> 
> So, internally, each item is a pointer to another location in the memory (basically an array of pointers) each one pointing to a string structure (a BSTR, I can't find anywhere the actual structure of BSTR), similar to this one, and that structure is 12 bytes (on a x86 processor) (and there is another pointer to the actual string... and lets say on average the string is 10 bytes? so let's do the math...
> 
> 2^31 * 4 * 12 * 10 = 1.0307922e+12
> that number is in bytes... so let's convert that into GBytes:
> ...


Yes, it's always possible. 

This calculation refers to the array index, not the array contents; you only need greater than 2^30 items in the array, regardless of element size which is just over a billion items, which is not insofar as your gigantic amount that unreasonable to expect.

Nevertheless, even if this limit is way beyond the expected maximum ever likley, having that bug in to save one operation (which is likely optimised out by the compiler) is premature optimisation and is therefore evil.

Still saving 0.0000000000000000000000000001 extra seconds available might be worthwhile, I suppose - especially if you're sorting on greater than a billion records ....

 :Wink:

----------


## yrwyddfa

For reference, anyway. you shouldn't be middling the partitions - although it's a very common thing to do. The principle is quite clear; if the list is almost sorted the partitions will be partitioned into one each side leading to performance of n-squared which is no better than a bubblesort. Using a random number as shown above protects against this, but, even better is to use the median of three technique.

Sorting nearly sorted lists is incredibly common: for instance, naive programmers will put a new item in a sorted list, and then run quicksort on it to insert it into it's correct place. The code will work, but it's very (very) inefficient.

----------


## georgekar

The quicksort example in #15 uses the median of three technique.

----------


## asbo

*CVMichael*

Nice job.

For the first time in my life, I attended to sorting and got such beautiful satisfaction :)

Works from box.
Super fast and efficiently.

I got the total list of files & folders of %SystemDrive%
~15'000 objects with total lentgh of list as ~800'000 bytes,
preliminary sorted in Far by size.

*Timing below:*  
1-st line - without sorting;
2-nd line - the same code with calling sort.
Tiks - QueryPerfomanceCounter



```
~200'000'000 ticks / ~ 0.075 sec
~400'000'000       / ~ 0.150
```

I expected much worse results 
(not from the code, but from the *sorting of strings* itself, especially in such volumes)

Next I compared both lists - source, aftersorted in Notepad++, and target, sorted by this routine.
They are identical.

CVMichael, thanks once more.
.

----------


## wqweto

> The quicksort example in #15 uses the median of three technique.


For posterity: this is *not* true as of today (see post date).

cheers,
</wqw>

----------


## passel

> For posterity: this is *not* true as of today (see post date).
> 
> cheers,
> </wqw>


Since post #15 hasn't been edited, I don't think it is not true as of today, but hasn't been true since posted.
But, I guess you mean not true up to today. 

He is picking a value at random in the range, rather than taking the median of three values in the range. That is simpler codewise than dealing with the issue of not having three values left to take the median of.

----------


## wqweto

Ooops, sorry for the poor language -- "hasn't been true since posted" is what I meant.

Quicksort algorithm is nice to know but then everyone is using ADO.Recordset Sort method to do the job :-))

cheers,
</wqw>

----------

