01-17-2011, 10:17 PM
Here's a duplicate file finder perl script that i've modified a few times. It will look through files of a similar size and compare them by looking through their MD5 hashes to find duplicate files. This is VERY useful if you have copies of music or something within a same directory but with a different name, because this will detect those copies, and you can delete them manually to free up some space on your hard drive.
I didn't make it to automatically remove files just so that you have the option yourself to decide whether or not to delete them.
Code:
#!/usr/bin/perl -w
use strict;
use File::Find;
use Digest::MD5;
my %files;
my $wasted = 0;
find(\&check_file, $ARGV[0] || ".");
local $" = "\n";
foreach my $size (sort {$b <=> $a} keys %files) {
next unless @{$files{$size}} > 1;
my %md5;
foreach my $file (@{$files{$size}}) {
open(FILE, $file) or next;
binmode(FILE);
push @{$md5{Digest::MD5->new->addfile(*FILE)->hexdigest}},$file;
}
foreach my $hash (keys %md5) {
next unless @{$md5{$hash}} > 1;
print "\n";
print "\n";
print "($size bytes) Duplicate Files:\n";
print "@{$md5{$hash}}\n";
print "\n";
$wasted += $size * (@{$md5{$hash}} - 1);
}
}
1 while $wasted =~ s/^([-+]?\d+)(\d{3})/$1,$2/;
print "\n";
print "######################################################\n";
print " \n";
print " You have $wasted bytes total in duplicate files \n";
print " \n";
print "######################################################\n";
print "\n";
sub check_file {
-f && push @{$files{(stat(_))[7]}}, $File::Find::name;
}
Put this in the directory that you want to look though and run it from it's filename within cmd prompt. It will compare files from different folders as well.
Enjoy