Login

AceInfinity · 01-17-2011, 10:17 PM

Here's a duplicate file finder perl script that i've modified a few times. It will look through files of a similar size and compare them by looking through their MD5 hashes to find duplicate files. This is VERY useful if you have copies of music or something within a same directory but with a different name, because this will detect those copies, and you can delete them manually to free up some space on your hard drive.

I didn't make it to automatically remove files just so that you have the option yourself to decide whether or not to delete them.

Code:
#!/usr/bin/perl -w

use strict;

use File::Find;

use Digest::MD5;

my %files;

my $wasted = 0;

find(\&check_file, $ARGV[0] || ".");

local $" = "\n";

foreach my $size (sort {$b <=> $a} keys %files) {

  next unless @{$files{$size}} > 1;

  my %md5;

  foreach my $file (@{$files{$size}}) {

    open(FILE, $file) or next;

    binmode(FILE);

    push @{$md5{Digest::MD5->new->addfile(*FILE)->hexdigest}},$file;

  }

  foreach my $hash (keys %md5) {

    next unless @{$md5{$hash}} > 1;

    print "\n";

    print "\n";

    print "($size bytes) Duplicate Files:\n";

    print "@{$md5{$hash}}\n";

    print "\n";

    $wasted += $size * (@{$md5{$hash}} - 1);

  }

}

1 while $wasted =~ s/^([-+]?\d+)(\d{3})/$1,$2/;

print "\n";

print "######################################################\n";

print "                                                    \n";

print "  You have $wasted bytes total in duplicate files   \n";

print "                                                   \n";

print "######################################################\n";

print "\n";

sub check_file {

  -f && push @{$files{(stat(_))[7]}}, $File::Find::name;

}

Put this in the directory that you want to look though and run it from it's filename within cmd prompt. It will compare files from different folders as well.

Enjoy

AceInfinity · 01-22-2011, 01:33 AM

No one active in perl programming i'm assuming? Smile

Even if you don't understand it, I would recommend using this script to find file duplicates. It will find file duplicates by using it's MD5 hash, and compares files by file size to check. So unless the MD5 changes, it will detect any file duplicates. If the MD5 changes, that means the file is not the same or has been modified.

Enjoy Smile

Caaz · 02-14-2011, 08:53 PM

This is very interesting. I might actually use this later since I have a ton of images I need to sort through. It's pretty simple for what it does as well. I like.

AceInfinity · 02-14-2011, 09:06 PM

You'll find it very useful. I'm suprised there aren't more people who have experience in perl scripting. They are missing out on lots of good things you can do with the programming language.

eax · 04-30-2011, 03:06 AM

(02-14-2011, 09:06 PM)Infinity Wrote: You'll find it very useful. I'm suprised there aren't more people who have experience in perl scripting. They are missing out on lots of good things you can do with the programming language.

I personally like Ruby, it has the best bits of Perl and Python. Nice useful script though. Bad music in the YT vid. Tongue

AceInfinity · 04-30-2011, 01:20 PM

(04-30-2011, 03:06 AM)eax Wrote: I personally like Ruby, it has the best bits of Perl and Python. Nice useful script though. Bad music in the YT vid.

Found it through Audioswap or whatever that youtube function is called. I already went through the rest of my youtube playlist from iTunes so I didn't know what to add. The music is just there so you don't get bored of listening to the silence though, it really has no effect on the actual video. You can mute it if you want

Bengan · 05-08-2011, 09:28 AM

What to do with that code?

AceInfinity · 05-08-2011, 01:28 PM

(05-08-2011, 09:28 AM)Bengan Wrote: What to do with that code?

You put it into a .pl file and you open it with your perl command line interpreter. You have to have a version of ActivePerl downloaded.

andrewgail · 05-14-2011, 11:49 AM

A script to find and remove duplicate files in one or more directory. The program gets a speed-up by abbreviation file reads to a minimum. In a lot of cases, it alone reads small chunks from unique files and only files with duplicates are read completely.

AceInfinity · 05-14-2011, 02:58 PM

(05-14-2011, 11:49 AM)andrewgail Wrote: A script to find and remove duplicate files in one or more directory. The program gets a speed-up by abbreviation file reads to a minimum. In a lot of cases, it alone reads small chunks from unique files and only files with duplicates are read completely.

You don't know what you're talking about sorry to say.

1) it doesn't remove the duplicate file
2) it's not a program, nothing is compiled, it's only a script that gets interpreted
3) it doesn't read small chunks of files and read the full file of duplicates because it needs to determine what the duplicate files are first

Login
Username:
Password:	Lost Password?
	Remember me

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Admin Page Finder	HF~Legend	1	1,460	08-20-2012, 01:33 PM Last Post: Trump