|
| |||
| Command line to remove duplicate files? I have a Fedora 6 system and want to remove some duplicate files. I have about 1,500 jpg images on my XP machine on the LAN and setup the directory containing the photos as a share, then mounted it in Linux with cifs, so now I can use Linux tools on the directory. I heard of a program called fdupes and it seemed like the perfect thing so I installed it and tried it out. It worked great and found all duplicates but did not remove the dupes. Instead I got asked for every set of dupes, which one I want to keep and they would list the found dupes. I enter 1 to save the first one and then on to question 2 for the next set, etc. There are over a hundred dupes in this directory and to have to answer what one for each dupe is really time consuming and there has to be a better way to do this. Maybe automating the process with a shell script or a different approach altogether, I need your help to figure this out. Answering 1 to fdupes did not delete the duplicate files though, they still remain. This could be the way the permissions are set for the cifs mount, permissions are such: [ohmster@ohmster test]$ ls -la KatharinaTisch072* -rw-r--r-- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072(1).jpg -rw-r--r-- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072.jpg [ohmster@ohmster test]$ rw, r, r, maybe I need to chmod them all to writable but still fdupes will ask me hundreds of questions for each duplicate found. All I want is one copy of the file and am not to particular which one it is. I would rather have the plane one ending in 072.jpg rather than 072(1).jpg or 072_0.jpg but it is not a very big deal. Can anybody show me an automated method to remove dupes from thousands of files in a directory or two please? Thanks. -- ~Ohmster | ohmster /a/t/ ohmster dot com Put "messageforohmster" in message body (That is Message Body, not Subject!) to pass my spam filter. |
| |||
| Re: Command line to remove duplicate files? Ohmster wrote: > [ohmster@ohmster test]$ ls -la KatharinaTisch072* > -rw-r--r-- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072(1).jpg > -rw-r--r-- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072.jpg > [ohmster@ohmster test]$ > > rw, r, r, maybe I need to chmod them all to writable but still fdupes will > ask me hundreds of questions for each duplicate found. All I want is one > copy of the file and am not to particular which one it is. I would rather > have the plane one ending in 072.jpg rather than 072(1).jpg or 072_0.jpg > but it is not a very big deal. If the following is true: - all the files end in ".jpg" - each pair of files *always* named "file.jpg" and "file(1).jpg" - all files in the same directory then you can just do rm *(1).jpg If that is not the case, then provide more info about how duplicates are supposed to be identified and about file names, possibly providing sample inputs. |
| |||
| Re: Command line to remove duplicate files? In alt.os.linux pk <pk@pk.pk>: > pk wrote: [..] > rm *\(1\).jpg Or find . -name "*([0-9]).jpg" -exec rm "{}" \; Try first, to be sure: find . -name "*([0-9]).jpg" -exec ls "{}" \; -- Michael Heiming (X-PGP-Sig > GPG-Key ID: EDD27B94) mail: echo zvpunry@urvzvat.qr | perl -pe 'y/a-z/n-za-m/' #bofh excuse 47: Complete Transient Lockout |
| |||
| Re: Command line to remove duplicate files? On 03/12/08 12:11, Michael Heiming wrote: > In alt.os.linux pk <pk@pk.pk>: >> pk wrote: > [..] > >> rm *\(1\).jpg > > Or > > find . -name "*([0-9]).jpg" -exec rm "{}" \; > > Try first, to be sure: > > find . -name "*([0-9]).jpg" -exec ls "{}" \; > Or to check that file.jpg exists before removing file(1).jpg for I in `find . -type f -name "*([0-9]).jpg"`; do if test -e `echo $I | sed 's/([0-9])//'`; then rm -f $I; fi; done |
| |||
| Re: Command line to remove duplicate files? On Wed, 12 Mar 2008 10:05:39 -0500, Ohmster <root@dev.nul.invalid> wrote: > I have a Fedora 6 system and want to remove some duplicate files. I have > about 1,500 jpg images on my XP machine on the LAN and setup the directory > containing the photos as a share, then mounted it in Linux with cifs, so > now I can use Linux tools on the directory. I heard of a program called > fdupes and it seemed like the perfect thing so I installed it and tried it > out. It worked great and found all duplicates but did not remove the dupes. > Instead I got asked for every set of dupes, which one I want to keep and > they would list the found dupes. I enter 1 to save the first one and then > on to question 2 for the next set, etc. There are over a hundred dupes in > this directory and to have to answer what one for each dupe is really time > consuming and there has to be a better way to do this. Maybe automating the > process with a shell script or a different approach altogether, I need your > help to figure this out. Answering 1 to fdupes did not delete the duplicate > files though, they still remain. This could be the way the permissions are > set for the cifs mount, permissions are such: > > [ohmster@ohmster test]$ ls -la KatharinaTisch072* > -rw-r--r-- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072(1).jpg > -rw-r--r-- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072.jpg > [ohmster@ohmster test]$ > > rw, r, r, maybe I need to chmod them all to writable but still fdupes will > ask me hundreds of questions for each duplicate found. All I want is one > copy of the file and am not to particular which one it is. I would rather > have the plane one ending in 072.jpg rather than 072(1).jpg or 072_0.jpg > but it is not a very big deal. > > Can anybody show me an automated method to remove dupes from thousands of > files in a directory or two please? Not being able to delete is likly a problem with how it is mounted. As for commandline: $ fdupes -f . |xargs rm This will delete dupes except the first of set in the current directory. You can also check out stripdups on my site. I haven't updated it since I discovered fdupes, but it does have the advantage that it will let you edit the list of files to be deleted in one step, it should also find matches between different file types. Make sure that nothing has been short circuited in the file unless that is your intent. It requires Imagemagick to be installed. If you use it READ it first, I've got to get ready for work. Michael C. -- mjchappell@verizon.net http://mcsuper5.freeshell.org/ Whether you think you can or whether you think you can't, you're right! - Henry Ford |
| |||
| Re: Command line to remove duplicate files? On 2008-03-12, Michael C. <mjchappell@verizon.net> wrote: > On Wed, 12 Mar 2008 10:05:39 -0500, > Ohmster <root@dev.nul.invalid> wrote: [..] > > Not being able to delete is likly a problem with how it is mounted. I think you are right. I did mount the shared directory as such: [ohmster@ohmster test]$ sudo mount -t cifs -o credentials=/home/ohmster/scripts/cifsauth,directio,uid=ohmster,gid=ohmster,rw,dir_m ode=0755,file_mode=0644,iocharset=utf8 //missy/de /home/ohmster/test This caused all files from the XP machine on the LAN to mount with all files with these permissions: (Please excuse the word wrap) -rw-r--r-- 1 ohmster ohmster 140446 Mar 11 17:01 UlrikeEsszimmer092.jpg -rw-r--r-- 1 ohmster ohmster 151222 Mar 11 17:01 UlrikeEsszimmer093.jpg -rw-r--r-- 1 ohmster ohmster 189014 Mar 11 17:01 UlrikeEsszimmer094.jpg -rw-r--r-- 1 ohmster ohmster 178665 Mar 11 17:01 UlrikeEsszimmer095.jpg -rw-r--r-- 1 ohmster ohmster 186719 Mar 11 17:01 UlrikeEsszimmer096.jpg Running fdupes seemed to work but the files did not get deleted. [ohmster@ohmster test]$ fdupes -d . [1] ./KatharinaTisch072.jpg [2] ./KatharinaTisch072(1).jpg Set 1 of 95, preserve files [1 - 2, all]: 1 [+] ./KatharinaTisch072.jpg [-] ./KatharinaTisch072(1).jpg [1] ./KatharinaTisch071.jpg [2] ./KatharinaTisch071(1).jpg Set 2 of 95, preserve files [1 - 2, all]: [ohmster@ohmster test]$ (Control-c to kill the process) [ohmster@ohmster test]$ ls -la KatharinaTisch072* -rw-r--r-- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072(1).jpg -rw-r--r-- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072.jpg [ohmster@ohmster test]$ See? Not deleted. Trying to chmod 666 on all files made them all rw like this: [ohmster@ohmster test]$ chmod 666 * [ohmster@ohmster test]$ ls -la KatharinaTisch072* -rw-rw-rw- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072(1).jpg -rw-rw-rw- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072.jpg [ohmster@ohmster test]$ Still they don't delete. Even using sudo before the command won't work. Your sample command would work as kick ass if I could get over this mounted directory thing... [ohmster@ohmster test]$ fdupes -f . |xargs rm rm: cannot remove `./KatharinaTisch072(1).jpg': Permission denied rm: cannot remove `./KatharinaTisch071(1).jpg': Permission denied rm: cannot remove `./KatharinaTisch070(1).jpg': Permission denied Let's see how they are mounted... [ohmster@ohmster test]$ mount /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/hda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) //missy/de on /home/ohmster/test type cifs (rw,mand) [ohmster@ohmster test]$ This would be the last entry, /home/ohmster/test. > As for commandline: > > $ fdupes -f . |xargs rm > > This will delete dupes except the first of set in the current > directory. Man this command line would be bitching as hell if only it were not for the mounted directory I think, it is just what I wanted. I could of course just copy the files over, delete the dupes, then copy them back again but I wanted to do an exercise in Linux. I wonder what I would have to do in order to be able to delete files from this mount? I will put some test files in there from Windows and see if I can delete them from Linux. Hmmmm, not allowed. -rw-rw-rw- 1 ohmster ohmster 178665 Mar 11 17:01 UlrikeEsszimmer095.jpg -rw-rw-rw- 1 ohmster ohmster 186719 Mar 11 17:01 UlrikeEsszimmer096.jpg -rw-r--r-- 1 ohmster ohmster 2705 Mar 12 17:53 zzzzfile1.txt -rw-r--r-- 1 ohmster ohmster 2705 Mar 12 17:53 zzzzfile2.txt [ohmster@ohmster test]$ rm *.txt rm: cannot remove `zzzzfile1.txt': Permission denied rm: cannot remove `zzzzfile2.txt': Permission denied [ohmster@ohmster test]$ Try making them all wr now. [ohmster@ohmster test]$ chmod 666 * [ohmster@ohmster test]$ ls -la *.txt -rw-rw-rw- 1 ohmster ohmster 2705 Mar 12 17:53 zzzzfile1.txt -rw-rw-rw- 1 ohmster ohmster 2705 Mar 12 17:53 zzzzfile2.txt [ohmster@ohmster test]$ rm *.txt rm: cannot remove `zzzzfile1.txt': Permission denied rm: cannot remove `zzzzfile2.txt': Permission denied [ohmster@ohmster test]$ Still not allowed. What is up with that? I cannot even delete my own files? My mount command is rw but says files to be 644. Maybe if I mount them as 666? Let's try it. [ohmster@ohmster ~]$ sudo mount -t cifs -o credentials=/home/ohmster/scripts/cifsauth,directio,uid=ohmster,gid=ohmster,rw,dir_m ode=0755,file_mode=0666,iocharset=utf8 //missy/de /home/ohmster/test [ohmster@ohmster ~]$ mount /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/hda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) //missy/de on /home/ohmster/test type cifs (rw,mand) [ohmster@ohmster ~]$ -rw-rw-rw- 1 ohmster ohmster 178665 Mar 11 17:01 UlrikeEsszimmer095.jpg -rw-rw-rw- 1 ohmster ohmster 186719 Mar 11 17:01 UlrikeEsszimmer096.jpg -rw-rw-rw- 1 ohmster ohmster 2705 Mar 12 17:53 zzzzfile1.txt -rw-rw-rw- 1 ohmster ohmster 2705 Mar 12 17:53 zzzzfile2.txt Yeah, they mounted as rw, try again now. No, still getting permission denied. Hmmm, this is not working, maybe Windows XP is not allowing it as a share, time to check share properties. Yeah, that was it. I moved the folder in XP to J:\share and gave it a share permission of full for user "Linux", then on the Security tab, added user "Linux" and gave special full permissions on this directory, it's contents, and all of it's subdirectories. I do have a user "Linux" on the XP system as an administrator but hide him from the login welcome screen with Tweak UI for XP Now I remounted it as 666 and have no problems at all deleting files. Time to try xargs again. [ohmster@ohmster test]$ fdupes -f . |xargs rm [ohmster@ohmster test]$ man fdupes [ohmster@ohmster test]$ fdupes . [ohmster@ohmster test]$ Wow, boy oh boy did THAT ever work! Thank you Micheal, that is just what I needed. Way cool brudder. :) > You can also check out stripdups on my site. I haven't updated it > since I discovered fdupes, but it does have the advantage that it will > let you edit the list of files to be deleted in one step, it should > also find matches between different file types. Make sure that > nothing has been short circuited in the file unless that is your > intent. It requires Imagemagick to be installed. If you use it READ > it first, I've got to get ready for work. > > Michael C. I tried reading the man page for xargs and cannot quite get the gist of it. Can you, in just a few lines and in layman's terms, tell me just what xargs does? Thank you. -- ~Ohmster | ohmster /a/t/ ohmster dot com Put "messageforohmster" in message body (That is MESSAGE BODY, not Subject!) to pass my spam filter. |
| |||
| Re: Command line to remove duplicate files? On 2008-03-12, pk <pk@pk.pk> wrote: > Ohmster wrote: > >> [ohmster@ohmster test]$ ls -la KatharinaTisch072* >> -rw-r--r-- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072(1).jpg >> -rw-r--r-- 1 ohmster ohmster 162726 Mar 11 16:46 KatharinaTisch072.jpg >> [ohmster@ohmster test]$ >> >> rw, r, r, maybe I need to chmod them all to writable but still fdupes will >> ask me hundreds of questions for each duplicate found. All I want is one >> copy of the file and am not to particular which one it is. I would rather >> have the plane one ending in 072.jpg rather than 072(1).jpg or 072_0.jpg >> but it is not a very big deal. > > If the following is true: > > - all the files end in ".jpg" > - each pair of files *always* named "file.jpg" and "file(1).jpg" > - all files in the same directory > > then you can just do > > rm *(1).jpg > > If that is not the case, then provide more info about how duplicates are > supposed to be identified and about file names, possibly providing sample > inputs. I already did the "fdupes -f |xargs rm" thing so they are all gone now, but the method was Patricia072(1).jpg or Patricia072_0.jpg, Patricia072_1.jpg, etc., so far as I can tell. Thanks for helping. -- ~Ohmster | ohmster /a/t/ ohmster dot com Put "messageforohmster" in message body (That is MESSAGE BODY, not Subject!) to pass my spam filter. |
| |||
| Re: Command line to remove duplicate files? On 2008-03-12, pk <pk@pk.pk> wrote: > pk wrote: > >> then you can just do >> >> rm *(1).jpg > > That was of course meant to be > > rm *\(1\).jpg > Why is that, you have to escape certain characters in bash? -- ~Ohmster | ohmster /a/t/ ohmster dot com Put "messageforohmster" in message body (That is MESSAGE BODY, not Subject!) to pass my spam filter. |
| |||
| Re: Command line to remove duplicate files? On 2008-03-12, Michael Heiming <michael+USENET@www.heiming.de> wrote: > In alt.os.linux pk <pk@pk.pk>: >> pk wrote: > [..] > >> rm *\(1\).jpg > > Or > > find . -name "*([0-9]).jpg" -exec rm "{}" \; > > Try first, to be sure: > > find . -name "*([0-9]).jpg" -exec ls "{}" \; > This is all very interesting, I am keeping all of this stuff in a text reference for just this sort of thing. Thank you Michael. -- ~Ohmster | ohmster /a/t/ ohmster dot com Put "messageforohmster" in message body (That is MESSAGE BODY, not Subject!) to pass my spam filter. |
| |||
| Re: Command line to remove duplicate files? On 2008-03-12, Douglas O'Neal <oneal@dbi.udel.edu> wrote: > On 03/12/08 12:11, Michael Heiming wrote: >> In alt.os.linux pk <pk@pk.pk>: >>> pk wrote: >> [..] >> >>> rm *\(1\).jpg >> >> Or >> >> find . -name "*([0-9]).jpg" -exec rm "{}" \; >> >> Try first, to be sure: >> >> find . -name "*([0-9]).jpg" -exec ls "{}" \; >> > > Or to check that file.jpg exists before removing file(1).jpg > > for I in `find . -type f -name "*([0-9]).jpg"`; do if test -e `echo $I | > sed 's/([0-9])//'`; then rm -f $I; fi; done Yep, definitly keeping all of this stuff for reference, thank you Doug. -- ~Ohmster | ohmster /a/t/ ohmster dot com Put "messageforohmster" in message body (That is MESSAGE BODY, not Subject!) to pass my spam filter. |
| |||
| Re: Command line to remove duplicate files? Ohmster wrote: >> rm *\(1\).jpg >> > > Why is that, you have to escape certain characters in bash? Of course, since "(" and ")" have special meaning in bash (and in most if not all shells, for that matter). |
| |||
| Re: Command line to remove duplicate files? On Wed, 12 Mar 2008 23:00:01 -0500, Ohmster <ohmster@dev.nul.invalid> wrote: > > I tried reading the man page for xargs and cannot quite get the gist of > it. Can you, in just a few lines and in layman's terms, tell me just > what xargs does? Thank you. In its simplest usage it executes a command passed as a parameter on all words passed to it on stdin until EOF is reached. Words are separated by unescaped whitespace (spaces, tabs or newlines not preceded by a backslash or inside of quotes.) #Create boring files mkdir /tmp/boring for i in 1 2 3 4 5 ; do touch /tmp/boring/$i ; done echo /tmp/boring/* |xargs ls -l echo /tmp/boring/* |rm rm -fr /tmp/boring In this example you easily have accomplished the listing with: $ ls -l /tmp/boring/* But that might cause trouble if you have several thousand files in the directory. The shell will expand the filenames if you use a wildcard, and you might get an error indicating the line is too long. (Yes, I'm aware you wouldn't usually use the trailing /* in this case, but it may be needed when using other commands.) Generally it's useful if you'll be passing an unknown and possibly large number of parameters to a program. HTH, Michael C. -- mjchappell@verizon.net http://mcsuper5.freeshell.org/ It's what you learn after you know it all that counts. |
| |||
| Re: Command line to remove duplicate files? On 2008-03-13, Michael C. <mjchappell@verizon.net> wrote: > On Wed, 12 Mar 2008 23:00:01 -0500, > Ohmster <ohmster@dev.nul.invalid> wrote: >> >> I tried reading the man page for xargs and cannot quite get the gist of >> it. Can you, in just a few lines and in layman's terms, tell me just >> what xargs does? Thank you. > > In its simplest usage it executes a command passed as a parameter on > all words passed to it on stdin until EOF is reached. Words are > separated by unescaped whitespace (spaces, tabs or newlines not > preceded by a backslash or inside of quotes.) > [..] > But that might cause trouble if you have several thousand files in the > directory. The shell will expand the filenames if you use a wildcard, > and you might get an error indicating the line is too long. (Yes, I'm > aware you wouldn't usually use the trailing /* in this case, but it > may be needed when using other commands.) > > Generally it's useful if you'll be passing an unknown and possibly > large number of parameters to a program. > > HTH, > > Michael C. So what xargs does is pretty much pass on large numbers to the command that comes next and this could be very large numbers, more than the command itself can accept. Something like a huge file list might generate, maybe thousands of files in a list, and then pass them on to another command like rm. $ fdupes -f . |xargs rm So fdupes is to output a list of duplicate files but omit the first of the duplicates from each set, which is omitted from the generated list, the list could potentially be thousands of files in length. Then this possibly huge number of files is passed on through xargs to the rm command which does remove the files. Hey that is kind of neat, I get it now, you came up with a perfect example of how to use the xargs command for my need and explained it pretty well. I get it now. Thanks Michael. You were right about the mounting too, I have to mount a directory share in such a way that the shared files really are read/write to the logged in mounter for the share or else deleting files will fail because of permission problems. Good deal now, situation is solved and I have a great understanding of what went on. Oh I love when this happens. :) I like this Linux stuff and am getting a bit older now and tired of doing electronic circuit repair, especially now that TV sets are all becoming modular with very expensive parts instead of fairly cheap parts like resistors, transistors, and capacitors. Now the LCD, Plasma, and DLP sets need panels (Thousands of dollars and not available.) and various circuit modules like power supply ($130->$500), digital or main boards ($400-$600), X or Y boards (Plasma - several hundred dollars each), mercury vapor lamps ($200-$600 for a freaking light bulb!), and other really expensive stuff. I hear "Oh I could buy a new one" so many times and that means no work, no money that I am getting sick of it. A good tech with experience in Florida can make $20 an hour but these jobs are hard to find or get piece work which is slowing down so bad you cannot live on it anymore. Does Linux or network work, or small network and systems pay decent and are there jobs where you can do work on computer, systems, and networks these days? I would sure love to do something else rather than solder parts and not get paid anymore. :( I used to make such good money doing that stuff too but it is getting hard now, parts are no bigger than a half a grain of rice or ICs have 300 legs on a stamp flat on a circuit board. This sucks, I need something different to do. I hope that some folks that do this for a living will chime in here with advice and some work stories so I can get a feel for it. Thanks for your help, Michael. -- ~Ohmster | ohmster /a/t/ ohmster dot com Put "messageforohmster" in message body (That is MESSAGE BODY, not Subject!) to pass my spam filter. |
| |||
| Re: Command line to remove duplicate files? On 2008-03-13, pk <pk@pk.pk> wrote: > Ohmster wrote: > >>> rm *\(1\).jpg >>> >> >> Why is that, you have to escape certain characters in bash? > > Of course, since "(" and ")" have special meaning in bash (and in most if > not all shells, for that matter). > Ahhh, thank you! Noted and now deleted. Thanks! -- ~Ohmster | ohmster /a/t/ ohmster dot com Put "messageforohmster" in message body (That is MESSAGE BODY, not Subject!) to pass my spam filter. |
| Bookmarks |
| Thread Tools | |
| |
| | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Reboot necessary after "Regsvr32" command ? Asking from command line if DLL already registered? | Kevin Yu | Windows XP | 0 | 11-16-2007 01:40 AM |
| Command line for changing files / folders ownership | ms gates | Windows XP | 3 | 06-25-2007 10:40 PM |
| How to remove the duplicate mails in OE6? | Trish | Windows XP | 6 | 03-02-2007 08:45 PM |
| "Delete" files by moving it to the Trash Can/Recycler from the command line? | per9000 | Windows XP | 22 | 01-04-2007 05:30 AM |
| "Delete" files by moving it to the Trash Can/Recycler from the command line? | per9000 | Windows XP | 0 | 01-04-2007 04:39 AM |
| New To Technology Questions? | Do You Need Help with Your Computer or Device? | Do You Need Help with this site? |