# Visual Basic > Visual Basic 6 and Earlier >  Duplicate image file finder (Help required)

## k_zeon

Hi. I have lots of pictures and am trying to create a library.
I have made an app that will use meta data to rename and move files to this library.
However when i started , i did not think through the naming convention.

I now have quite a few dupes that i would like to remove and then i will rename the files again once i only have one of each image.

I was thinking of going through a folder and hashing each file then seeing which ones are duplicates. then delete these files.
What would be the best & fastest way to compare image files (ie jpg) the filenames would be different but some images would be dupes.

I have seen other apps that run and then group the images to show dupes. however the one i like is not free. (Duplicate Photo Finder 1.7.0.0) 
this will scan a folder and then show images of dupes with filename and a tick box. then each group will leave the 1st image of each group and allow to delete all the rest


What would be the logic to use on something like this. How would it look.

tks

----------


## baka

I have done small tools for my needs when I need to check 100k+ of pictures
(its assets for my game, so it could be a button, or bitmapfonts, small stuff, but also bigger pictures etc)

so, what I do is to just take filename, size and I also scan dimension using
GdipLoadImageFromFile, GdipGetImageHeight and GdipGetImageWidth
and thats it. so far it works.
and I use FindFirstFile API (recursive) to scan the folders

but say, u get 2 identical size + dimension, at that point u could do a simple crc32 between the two.

----------


## k_zeon

> I have done small tools for my needs when I need to check 100k+ of pictures
> (its assets for my game, so it could be a button, or bitmapfonts, small stuff, but also bigger pictures etc)
> 
> so, what I do is to just take filename, size and I also scan dimension using
> GdipLoadImageFromFile, GdipGetImageHeight and GdipGetImageWidth
> and thats it. so far it works.
> and I use FindFirstFile API (recursive) to scan the folders
> 
> but say, u get 2 identical size + dimension, at that point u could do a simple crc32 between the two.


So what logic do you use to list dupes. Do you scan all files first then add to an array where a file matches etc.
Would you be willing to share some code .tks

----------


## baka

yeah. create the list first.
read all files into an array,
that array should include:
filename
size
width * (optional)
height * (optional)
crc32 (empty for now)
flag * (optional)

after u added all the files into the array u sort it using size, 
I always use quicksort

after that u scan all files. example:

for a = 1 to last - 1
for b = a + 1 to last 
if data(b).size = data(a).size then analyze a, b else exit for
next b
next a

in analyze u check:
- if u have the dimension (width/height) u compare that (usually if u have a program to show pictures, u also have that parameter)
if not, better do a crc32

- scan both files using crc32 (theres examples u can find here) and after u got the long value, u compare the two

now if u want to show the picture u found, u can either do that directly with a "delete" button like <left> delete: name, location         <right> delete: name, location
or scan all files first, and show result like: (17 duplicate found) where u can click "auto-delete" or "selected-delete" or something, your choice.

I mean, its quite straight forward, nothing hard to do.
u can use many different recursive function to get folder/files (I use firstfirstfile)
and crc32 should be if u do a search here.

everything else is just hard work to add all code for the UI

also quicksort should also be here somewhere: here an example of one of my own:



```
Private Sub QuickSort(c() As bpicData, ByVal first As Long, ByVal Last As Long)
    Dim Low As Long, High As Long
    Dim MidValue As Long
        
    Low = first
    High = Last
    MidValue = c((first + Last) \ 2).Sort
        
    Do
        While c(Low).Sort < MidValue
            Low = Low + 1
        Wend
            
        While c(High).Sort > MidValue
            High = High - 1
        Wend
            
        If Low <= High Then
            Tmp = c(Low): c(Low) = c(High): c(High) = Tmp
            Low = Low + 1
            High = High - 1
        End If
    Loop While Low <= High
        
    If first < High Then QuickSort c, first, High
    If Low < Last Then QuickSort c, Low, Last
End Sub
```

here I use .sort that is a special value that I created for this purpose, but u can use .size. 
(also TMP need to be public and the same type (bpicData here))

----------


## k_zeon

> yeah. create the list first.
> read all files into an array,
> that array should include:
> filename
> size
> width * (optional)
> height * (optional)
> crc32 (empty for now)
> flag * (optional)
> ...


tks for reply. in meantime i did have a go.

First i read all files and hashed them.
then i created a GroupArray and if the hash number already exists i add this to the same group.
If the Hash did not exist, then i create a new GroupArray  ie 2 and so on.

here's a snippet



```
  FileCount = UBound(MyDupes(DupeArray).FileName) + 1
                ReDim Preserve MyDupes(DupeArray).FileName(FileCount)
                MyDupes(DupeArray).FileName(FileCount) = FileName
```

once i have my array filled with groups of more than 1 file I then add them to a grid.

as i add them, i get the image with GDI and put a small image in column 1.

then its a case of clicking a button and then selecting all the copies and leave one

then the delete button will put these files in the recyclebin.

----------


## baka

yeah. it looks good. 
and theres many ways of doing this.
using a listview that u fill is good if theres 2+ found. that way u can compare them all.

----------

