| |||
| How to backup a huge amount of tiny files Hi, We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM -- SAS 2.5" 10K rpm. We are trying to do a backup of a directory which has more or less 10.000.000 of xml files. The files size varies between 1K and 10K. When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than 10 hours and we weren't even close to finish it. So, my question is: how to do a backup of a huge amount of tiny files? TIA, Bob |
| |||
| Re: How to backup a huge amount of tiny files On Thu, 26 Jul 2007 06:44:00 -0700, rcrios wrote: > We are trying to do a backup of a directory which has more or less > 10.000.000 of xml files. The files size varies between 1K and 10K. > > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than > 10 hours and we weren't even close to finish it. > > So, my question is: how to do a backup of a huge amount of tiny files? Let it run more than 10 hours, perhaps? Let it run overnight. -- "Bother!" said Pooh, as Christopher Robin pleaded to be spanked again. |
| |||
| Re: How to backup a huge amount of tiny files rcrios******.com wrote: > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than > 10 hours and we weren't even close to finish it. it has to be a *.tar? if not mirdir perhaps http://sourceforge.net/projects/mirdir -- EOS www.photo-memories.be Running KDE 3.5.7 / openSUSE 10.2 |
| |||
| Re: How to backup a huge amount of tiny files rcrios******.com wrote: > Hi, > > We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM -- > SAS 2.5" 10K rpm. > > We are trying to do a backup of a directory which has more or less > 10.000.000 of xml files. The files size varies between 1K and 10K. > > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than > 10 hours and we weren't even close to finish it. > > So, my question is: how to do a backup of a huge amount of tiny files? > It is never a good idea to have this many files in a single directory!! I would split the files over 1000 and then use tar on it. -- Dawid Michalczyk http://www.comp.eonworks.com _Linux SysAdmin and Webmaster scripts_ |
| |||
| Re: How to backup a huge amount of tiny files On 26 jul, 17:15, Dawid Michalczyk <d...@eonworks.com> wrote: > rcr...******.com wrote: > > Hi, > > > We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM -- > > SAS 2.5" 10K rpm. > > > We are trying to do a backup of a directory which has more or less > > 10.000.000 of xml files. The files size varies between 1K and 10K. > > > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than > > 10 hours and we weren't even close to finish it. > > > So, my question is: how to do a backup of a huge amount of tiny files? > > It is never a good idea to have this many files in a single directory!! > I would split the files over 1000 and then use tar on it. > > -- > Dawid Michalczykhttp://www.comp.eonworks.com _Linux SysAdmin and Webmaster scripts_ The files are already splited into several dirs. We tried to backup only one dir per day, but it exceded out maintenace window. We tried to use tar because our backup solution (backupexec) wasn't able to do it. I didn't tested mirdir, but I think that will not work because if I can't do a simple copy, I think that verify if something changed will be worse. Thanks anyway... |
| |||
| Re: How to backup a huge amount of tiny files rcrios******.com wrote: > On 26 jul, 17:15, Dawid Michalczyk <d...@eonworks.com> wrote: >> rcr...******.com wrote: >>> Hi, >>> We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM -- >>> SAS 2.5" 10K rpm. >>> We are trying to do a backup of a directory which has more or less >>> 10.000.000 of xml files. The files size varies between 1K and 10K. >>> When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than >>> 10 hours and we weren't even close to finish it. >>> So, my question is: how to do a backup of a huge amount of tiny files? >> It is never a good idea to have this many files in a single directory!! >> I would split the files over 1000 and then use tar on it. >> >> -- >> Dawid Michalczykhttp://www.comp.eonworks.com _Linux SysAdmin and Webmaster scripts_ > > The files are already splited into several dirs. We tried to backup > only one dir per day, but it exceded out maintenace window. We tried > to use tar because our backup solution (backupexec) wasn't able to do > it. I don't think splitting this many files over several dirs will do much difference. I would try splitting over 1000 dirs, and then backup each of those dirs individually. That should help. The main problem is that you have an enormous amount of files in one dir. -- Dawid Michalczyk http://www.comp.eonworks.com _Linux SysAdmin and Webmaster scripts_ |
| |||
| Re: How to backup a huge amount of tiny files On Thu, 26 Jul 2007 23:46:34 +0000, rcrios wrote: > On 26 jul, 17:15, Dawid Michalczyk <d...@eonworks.com> wrote: >> rcr...******.com wrote: >> > Hi, >> >> > We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM -- >> > SAS 2.5" 10K rpm. >> >> > We are trying to do a backup of a directory which has more or less >> > 10.000.000 of xml files. The files size varies between 1K and 10K. >> >> > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than >> > 10 hours and we weren't even close to finish it. >> >> > So, my question is: how to do a backup of a huge amount of tiny files? >> >> It is never a good idea to have this many files in a single directory!! >> I would split the files over 1000 and then use tar on it. >> >> -- >> Dawid Michalczyk > > The files are already splited into several dirs. We tried to backup > only one dir per day, but it exceded out maintenace window. We tried > to use tar because our backup solution (backupexec) wasn't able to do > it. > > I didn't tested mirdir, but I think that will not work because if I > can't do a simple copy, I think that verify if something changed will > be worse. > > Thanks anyway... I have never had that many files to manage, but it doesn't seem like it would be hard. I don't know what your files are like, but you could tar them up in smaller batches. Put the filenames into files, and use the -T option to tar? Try this in a test directory mkdir test cd test for ACHAR in a b c d e f g h i j k l m n o p q r s t u v w x y z; do for BCHAR in a b c d e f g h i j k l m n o p q r s t u v w x y z; do for NUMBER in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20; do echo $ACHAR$BCHAR$NUMBER > $ACHAR$BCHAR$NUMBER; done; done; done This makes a bunch of files. 13520 to be exact. I will sort them by the first letter, putting the filenames into ../tar_folder/[a-z].filenames. I know my files and how many there are, you need to figure out what your limits are, and make your filenames files accordingly. In this case using 26 groups, each tarball is 520 files. Put only as many file names as your tar can handle into your filenames files. Use logic of some sort when choosing how to break up your files into tarballs. Consider file sizes, file types, author, dates, or whatever when choosing your approach. mkdir ../tar_folder for ACHAR in a b c d e f g h i j k l m n o p q r s t u v w x y z; do ls $ACHAR* > ../tar_folder/$ACHAR.filenames tar -vcf ../tar_folder/$ACHAR.tar \ -T ../tar_folder/$ACHAR.filenames done If you must make one huge tarball, you could tar up your tarballs tar -cf ../tarballs.tar ../tar_folder/* stonerfish |
| |||
| Re: How to backup a huge amount of tiny files On Thu, 26 Jul 2007 23:46:34 +0000, rcrios typed this message: > On 26 jul, 17:15, Dawid Michalczyk <d...@eonworks.com> wrote: >> rcr...******.com wrote: >> > Hi, >> >> > We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM -- >> > SAS 2.5" 10K rpm. >> >> > We are trying to do a backup of a directory which has more or less >> > 10.000.000 of xml files. The files size varies between 1K and 10K. >> >> > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more >> > than 10 hours and we weren't even close to finish it. >> >> > So, my question is: how to do a backup of a huge amount of tiny >> > files? >> >> It is never a good idea to have this many files in a single directory!! >> I would split the files over 1000 and then use tar on it. >> >> -- >> Dawid Michalczykhttp://www.comp.eonworks.com _Linux SysAdmin and >> Webmaster scripts_ > > The files are already splited into several dirs. We tried to backup only > one dir per day, but it exceded out maintenace window. We tried to use > tar because our backup solution (backupexec) wasn't able to do it. > > I didn't tested mirdir, but I think that will not work because if I > can't do a simple copy, I think that verify if something changed will be > worse. > > Thanks anyway... Funny thing about Linux you can run several jobs or applications at the same time. IMO, you have a couple of problems: The output of 10million anything is huge and time consuming. Suggestions: 1 Split the job into several simultaneously submitted jobs 2 Use a connected machine to make backups during non-maintenance window 3 Add a raid drive to mirror the existing drive. 4 Use a locking mechanism to lock directory while a low priority backup job runs during the day. 5 Alternate directory backups 2 million today, 2 million tomorrow, etc. whatever fits the window. 6 Weekend shutdown, do maintenance and a full backup, then during normal maintenance cycles do differential backups. |
| |||
| Re: How to backup a huge amount of tiny files rcrios******.com <rcrios******.com>: > > We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM -- > SAS 2.5" 10K rpm. > > We are trying to do a backup of a directory which has more or less > 10.000.000 of xml files. The files size varies between 1K and 10K. > > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than > 10 hours and we weren't even close to finish it. Skip the "v" in "cvfz". That'll only slow it down with writes to the console. This all going to be dependent on the speed of the HDD they're on. I'd suggest making a copy of the directory to whatever machine you own which has the fastest HDD, then create the archive there instead. That'll let the orginal machine carry on as usual, and the archiving be done at leisure. -- Any technology distinguishable from magic is insufficiently advanced. (*) Linux Counter #80292 - - http://www.faqs.org/rfcs/rfc1855.html Please, don't Cc: me. |
| |||
| Re: How to backup a huge amount of tiny files rcrios******.com <rcrios******.com>: > > I didn't tested mirdir, but I think that will not work because if I > can't do a simple copy, I think that verify if something changed will > be worse. You're backing up a directory structure that's still being written to?!? And you think that's going to work? You need to re-think this. -- Any technology distinguishable from magic is insufficiently advanced. (*) Linux Counter #80292 - - http://www.faqs.org/rfcs/rfc1855.html Please, don't Cc: me. |
| |||
| Re: How to backup a huge amount of tiny files On 2007-07-26 15:44, rcrios******.com wrote: > Hi, > > We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM -- > SAS 2.5" 10K rpm. > > We are trying to do a backup of a directory which has more or less > 10.000.000 of xml files. The files size varies between 1K and 10K. > > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than > 10 hours and we weren't even close to finish it. > > So, my question is: how to do a backup of a huge amount of tiny files? > > TIA, > > Bob > Make a /hugedir with reiserfs instead of ext3 that I suspect you have now, and it will handle this much better. /bb |
| |||
| Re: How to backup a huge amount of tiny files On 2007-07-27 01:46, rcrios******.com wrote: > On 26 jul, 17:15, Dawid Michalczyk <d...@eonworks.com> wrote: >> rcr...******.com wrote: >>> Hi, >>> We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM -- >>> SAS 2.5" 10K rpm. >>> We are trying to do a backup of a directory which has more or less >>> 10.000.000 of xml files. The files size varies between 1K and 10K. >>> When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than >>> 10 hours and we weren't even close to finish it. >>> So, my question is: how to do a backup of a huge amount of tiny files? >> It is never a good idea to have this many files in a single directory!! >> I would split the files over 1000 and then use tar on it. >> >> -- >> Dawid Michalczykhttp://www.comp.eonworks.com _Linux SysAdmin and Webmaster scripts_ > > The files are already splited into several dirs. We tried to backup > only one dir per day, but it exceded out maintenace window. We tried > to use tar because our backup solution (backupexec) wasn't able to do > it. > > I didn't tested mirdir, but I think that will not work because if I > can't do a simple copy, I think that verify if something changed will > be worse. > > Thanks anyway... > Once you filled a dir with 10000000 files, it dosn't matter if you later split the files in a btree structure, since used slots in a directory will never be deleted, so even if only a few files still exist, the directory itself is huge. You must make a new dir, and move the contents to it, delete the old, end then remame the new dir to the old name. /bb |
| |||
| Re: How to backup a huge amount of tiny files /bb writes: > Make a /hugedir with reiserfs instead of ext3 that I suspect you have > now, and it will handle this much better. You may also want to consider JFS and XFS. In any case, ext is probably the worst filesystem for your application. The performance of your application must be terrible. -- John Hasler john@dhh.gt.org Dancing Horse Hill Elmwood, WI USA |
| |||
| Re: How to backup a huge amount of tiny files On 2007-07-26, rcrios******.com <rcrios******.com> wrote: > We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM -- > SAS 2.5" 10K rpm. > > We are trying to do a backup of a directory which has more or less > 10.000.000 of xml files. The files size varies between 1K and 10K. > > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than > 10 hours and we weren't even close to finish it. > > So, my question is: how to do a backup of a huge amount of tiny files? A few informations seem to miss: - what is the actual volume ? - what kind of filenames do you have? - how many subdirs did you create, and are they to stay, or not? - what is your destination? Another machine, other disk, tape ? How about this for an general idea - make a number of subdirs (say 1000 dirs of 10.0000 files) (or 100 of 100.000) - make a script to loop over all subdirs - if the subdir doesn't exist in backup: rsync the subdir - if the subdir in backup is older, a file was changed: rsync the subdir Window of maintenance (you don't say how ling it is): just run that script for as long a you can. Then stop it, and rerun next window. -- There is an art, it says, or rather, a knack to flying. The knack lies in learning how to throw yourself at the ground and miss. Douglas Adams |
![]() |
| Bookmarks |
| Thread Tools | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| huge lag on renaming/moving files in Vista | Fermanagh | Windows Vista | 9 | 04-18-2007 07:45 AM |
| Files backup versus Full backup | Wieslaw | Windows Vista | 2 | 03-19-2007 04:30 AM |
| Vista backup doesn't backup all files (like for example PHP files) | A Bertrand | Windows Vista | 0 | 02-26-2007 12:45 PM |
| Copying large folders only finds a tiny percentage of files | Dale | Windows Vista | 0 | 02-25-2007 03:45 PM |
| RE: Backup files does not backup .EXEs | Dale | Windows Vista | 0 | 01-02-2007 10:27 AM |