Go Back   Technology Questions > Software Questions > Operating System Questions > Linux

Reply
 
LinkBack Thread Tools
  #1 (permalink)  
Old 07-26-2007, 05:50 AM
rcrios@gmail.com
Tablet PC Guest
 
Posts: n/a
How to backup a huge amount of tiny files

Hi,

We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM --
SAS 2.5" 10K rpm.

We are trying to do a backup of a directory which has more or less
10.000.000 of xml files. The files size varies between 1K and 10K.

When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than
10 hours and we weren't even close to finish it.

So, my question is: how to do a backup of a huge amount of tiny files?

TIA,

Bob

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

 
Old 07-26-2007, 05:50 AM
Xploder HD Movie Player for PS3. Manage, convert and transfer media files between the PC and PS3.
  #2 (permalink)  
Old 07-26-2007, 07:30 AM
Dan C
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

On Thu, 26 Jul 2007 06:44:00 -0700, rcrios wrote:

> We are trying to do a backup of a directory which has more or less
> 10.000.000 of xml files. The files size varies between 1K and 10K.
>
> When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than
> 10 hours and we weren't even close to finish it.
>
> So, my question is: how to do a backup of a huge amount of tiny files?


Let it run more than 10 hours, perhaps? Let it run overnight.


--
"Bother!" said Pooh, as Christopher Robin pleaded to be spanked again.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #3 (permalink)  
Old 07-26-2007, 08:00 AM
EOS
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

rcrios******.com wrote:

> When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than
> 10 hours and we weren't even close to finish it.


it has to be a *.tar?
if not mirdir perhaps
http://sourceforge.net/projects/mirdir
--
EOS
www.photo-memories.be
Running KDE 3.5.7 / openSUSE 10.2
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #4 (permalink)  
Old 07-26-2007, 12:20 PM
Dawid Michalczyk
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

rcrios******.com wrote:
> Hi,
>
> We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM --
> SAS 2.5" 10K rpm.
>
> We are trying to do a backup of a directory which has more or less
> 10.000.000 of xml files. The files size varies between 1K and 10K.
>
> When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than
> 10 hours and we weren't even close to finish it.
>
> So, my question is: how to do a backup of a huge amount of tiny files?
>

It is never a good idea to have this many files in a single directory!!
I would split the files over 1000 and then use tar on it.

--
Dawid Michalczyk
http://www.comp.eonworks.com _Linux SysAdmin and Webmaster scripts_
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #5 (permalink)  
Old 07-26-2007, 03:50 PM
rcrios@gmail.com
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

On 26 jul, 17:15, Dawid Michalczyk <d...@eonworks.com> wrote:
> rcr...******.com wrote:
> > Hi,

>
> > We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM --
> > SAS 2.5" 10K rpm.

>
> > We are trying to do a backup of a directory which has more or less
> > 10.000.000 of xml files. The files size varies between 1K and 10K.

>
> > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than
> > 10 hours and we weren't even close to finish it.

>
> > So, my question is: how to do a backup of a huge amount of tiny files?

>
> It is never a good idea to have this many files in a single directory!!
> I would split the files over 1000 and then use tar on it.
>
> --
> Dawid Michalczykhttp://www.comp.eonworks.com _Linux SysAdmin and Webmaster scripts_


The files are already splited into several dirs. We tried to backup
only one dir per day, but it exceded out maintenace window. We tried
to use tar because our backup solution (backupexec) wasn't able to do
it.

I didn't tested mirdir, but I think that will not work because if I
can't do a simple copy, I think that verify if something changed will
be worse.

Thanks anyway...

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #6 (permalink)  
Old 07-26-2007, 05:20 PM
Dawid Michalczyk
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

rcrios******.com wrote:
> On 26 jul, 17:15, Dawid Michalczyk <d...@eonworks.com> wrote:
>> rcr...******.com wrote:
>>> Hi,
>>> We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM --
>>> SAS 2.5" 10K rpm.
>>> We are trying to do a backup of a directory which has more or less
>>> 10.000.000 of xml files. The files size varies between 1K and 10K.
>>> When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than
>>> 10 hours and we weren't even close to finish it.
>>> So, my question is: how to do a backup of a huge amount of tiny files?

>> It is never a good idea to have this many files in a single directory!!
>> I would split the files over 1000 and then use tar on it.
>>
>> --
>> Dawid Michalczykhttp://www.comp.eonworks.com _Linux SysAdmin and Webmaster scripts_

>
> The files are already splited into several dirs. We tried to backup
> only one dir per day, but it exceded out maintenace window. We tried
> to use tar because our backup solution (backupexec) wasn't able to do
> it.


I don't think splitting this many files over several dirs will do much
difference. I would try splitting over 1000 dirs, and then backup each
of those dirs individually. That should help. The main problem is that
you have an enormous amount of files in one dir.

--
Dawid Michalczyk
http://www.comp.eonworks.com _Linux SysAdmin and Webmaster scripts_
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #7 (permalink)  
Old 07-26-2007, 06:30 PM
jellybean stonerfish
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

On Thu, 26 Jul 2007 23:46:34 +0000, rcrios wrote:

> On 26 jul, 17:15, Dawid Michalczyk <d...@eonworks.com> wrote:
>> rcr...******.com wrote:
>> > Hi,

>>
>> > We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM --
>> > SAS 2.5" 10K rpm.

>>
>> > We are trying to do a backup of a directory which has more or less
>> > 10.000.000 of xml files. The files size varies between 1K and 10K.

>>
>> > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than
>> > 10 hours and we weren't even close to finish it.

>>
>> > So, my question is: how to do a backup of a huge amount of tiny files?

>>
>> It is never a good idea to have this many files in a single directory!!
>> I would split the files over 1000 and then use tar on it.
>>
>> --
>> Dawid Michalczyk

>
> The files are already splited into several dirs. We tried to backup
> only one dir per day, but it exceded out maintenace window. We tried
> to use tar because our backup solution (backupexec) wasn't able to do
> it.
>
> I didn't tested mirdir, but I think that will not work because if I
> can't do a simple copy, I think that verify if something changed will
> be worse.
>
> Thanks anyway...



I have never had that many files to manage, but it doesn't seem like it
would be hard. I don't know what your files are like, but you could tar
them up in smaller batches. Put the filenames into files, and use the -T
option to tar?

Try this in a test directory
mkdir test
cd test

for ACHAR in a b c d e f g h i j k l m n o p q r s t u v w x y z;
do for BCHAR in a b c d e f g h i j k l m n o p q r s t u v w x y z;
do for NUMBER in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20;
do echo $ACHAR$BCHAR$NUMBER > $ACHAR$BCHAR$NUMBER;
done;
done;
done

This makes a bunch of files. 13520 to be exact. I will sort them by the
first letter, putting the filenames into ../tar_folder/[a-z].filenames.
I know my files and how many there are, you need to figure out what
your limits are, and make your filenames files accordingly. In this case
using 26 groups, each tarball is 520 files. Put only as many file names as
your tar can handle into your filenames files. Use logic of some sort
when choosing how to break up your files into tarballs. Consider file
sizes, file types, author, dates, or whatever when choosing your approach.



mkdir ../tar_folder
for ACHAR in a b c d e f g h i j k l m n o p q r s t u v w x y z;
do ls $ACHAR* > ../tar_folder/$ACHAR.filenames
tar -vcf ../tar_folder/$ACHAR.tar \
-T ../tar_folder/$ACHAR.filenames
done

If you must make one huge tarball, you could tar up your tarballs

tar -cf ../tarballs.tar ../tar_folder/*

stonerfish
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #8 (permalink)  
Old 07-26-2007, 10:00 PM
noi ance
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

On Thu, 26 Jul 2007 23:46:34 +0000, rcrios typed this message:

> On 26 jul, 17:15, Dawid Michalczyk <d...@eonworks.com> wrote:
>> rcr...******.com wrote:
>> > Hi,

>>
>> > We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM --
>> > SAS 2.5" 10K rpm.

>>
>> > We are trying to do a backup of a directory which has more or less
>> > 10.000.000 of xml files. The files size varies between 1K and 10K.

>>
>> > When we tried to do a tar cvfz backup.tgz /hugedir, we spent more
>> > than 10 hours and we weren't even close to finish it.

>>
>> > So, my question is: how to do a backup of a huge amount of tiny
>> > files?

>>
>> It is never a good idea to have this many files in a single directory!!
>> I would split the files over 1000 and then use tar on it.
>>
>> --
>> Dawid Michalczykhttp://www.comp.eonworks.com _Linux SysAdmin and
>> Webmaster scripts_

>
> The files are already splited into several dirs. We tried to backup only
> one dir per day, but it exceded out maintenace window. We tried to use
> tar because our backup solution (backupexec) wasn't able to do it.
>
> I didn't tested mirdir, but I think that will not work because if I
> can't do a simple copy, I think that verify if something changed will be
> worse.
>
> Thanks anyway...


Funny thing about Linux you can run several jobs or applications at the
same time.

IMO, you have a couple of problems:
The output of 10million anything is huge and time consuming.

Suggestions:
1 Split the job into several simultaneously submitted jobs

2 Use a connected machine to make backups during non-maintenance window

3 Add a raid drive to mirror the existing drive.

4 Use a locking mechanism to lock directory while a low priority backup
job runs during the day.

5 Alternate directory backups 2 million today, 2 million tomorrow, etc.
whatever fits the window.

6 Weekend shutdown, do maintenance and a full backup, then during normal
maintenance cycles do differential backups.



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #9 (permalink)  
Old 07-26-2007, 10:40 PM
s. keeling
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

rcrios******.com <rcrios******.com>:
>
> We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM --
> SAS 2.5" 10K rpm.
>
> We are trying to do a backup of a directory which has more or less
> 10.000.000 of xml files. The files size varies between 1K and 10K.
>
> When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than
> 10 hours and we weren't even close to finish it.


Skip the "v" in "cvfz". That'll only slow it down with writes to the
console.

This all going to be dependent on the speed of the HDD they're on.
I'd suggest making a copy of the directory to whatever machine you own
which has the fastest HDD, then create the archive there instead.
That'll let the orginal machine carry on as usual, and the archiving
be done at leisure.


--
Any technology distinguishable from magic is insufficiently advanced.
(*) Linux Counter #80292
- - http://www.faqs.org/rfcs/rfc1855.html Please, don't Cc: me.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #10 (permalink)  
Old 07-26-2007, 10:40 PM
s. keeling
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

rcrios******.com <rcrios******.com>:
>
> I didn't tested mirdir, but I think that will not work because if I
> can't do a simple copy, I think that verify if something changed will
> be worse.


You're backing up a directory structure that's still being written
to?!? And you think that's going to work?

You need to re-think this.


--
Any technology distinguishable from magic is insufficiently advanced.
(*) Linux Counter #80292
- - http://www.faqs.org/rfcs/rfc1855.html Please, don't Cc: me.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #11 (permalink)  
Old 07-27-2007, 01:00 AM
birre
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

On 2007-07-26 15:44, rcrios******.com wrote:
> Hi,
>
> We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM --
> SAS 2.5" 10K rpm.
>
> We are trying to do a backup of a directory which has more or less
> 10.000.000 of xml files. The files size varies between 1K and 10K.
>
> When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than
> 10 hours and we weren't even close to finish it.
>
> So, my question is: how to do a backup of a huge amount of tiny files?
>
> TIA,
>
> Bob
>


Make a /hugedir with reiserfs instead of ext3 that I suspect you have now, and
it will handle this much better.

/bb
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #12 (permalink)  
Old 07-27-2007, 01:00 AM
birre
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

On 2007-07-27 01:46, rcrios******.com wrote:
> On 26 jul, 17:15, Dawid Michalczyk <d...@eonworks.com> wrote:
>> rcr...******.com wrote:
>>> Hi,
>>> We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM --
>>> SAS 2.5" 10K rpm.
>>> We are trying to do a backup of a directory which has more or less
>>> 10.000.000 of xml files. The files size varies between 1K and 10K.
>>> When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than
>>> 10 hours and we weren't even close to finish it.
>>> So, my question is: how to do a backup of a huge amount of tiny files?

>> It is never a good idea to have this many files in a single directory!!
>> I would split the files over 1000 and then use tar on it.
>>
>> --
>> Dawid Michalczykhttp://www.comp.eonworks.com _Linux SysAdmin and Webmaster scripts_

>
> The files are already splited into several dirs. We tried to backup
> only one dir per day, but it exceded out maintenace window. We tried
> to use tar because our backup solution (backupexec) wasn't able to do
> it.
>
> I didn't tested mirdir, but I think that will not work because if I
> can't do a simple copy, I think that verify if something changed will
> be worse.
>
> Thanks anyway...
>


Once you filled a dir with 10000000 files, it dosn't matter if you later split
the files in a btree structure, since used slots in a directory will never be
deleted, so even if only a few files still exist, the directory itself is huge.

You must make a new dir, and move the contents to it, delete the old, end then
remame the new dir to the old name.

/bb
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #13 (permalink)  
Old 07-27-2007, 04:00 AM
flupp
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

Not that I am too familiar with this kind of stuff, but wouldn't dd be
able to come to rescue in such cases ?

Kind regards,

flupp

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #14 (permalink)  
Old 07-27-2007, 04:50 AM
John Hasler
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

/bb writes:
> Make a /hugedir with reiserfs instead of ext3 that I suspect you have
> now, and it will handle this much better.


You may also want to consider JFS and XFS. In any case, ext is probably
the worst filesystem for your application. The performance of your
application must be terrible.
--
John Hasler
john@dhh.gt.org
Dancing Horse Hill
Elmwood, WI USA
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

  #15 (permalink)  
Old 07-27-2007, 10:40 AM
Rikishi 42
Tablet PC Guest
 
Posts: n/a
Re: How to backup a huge amount of tiny files

On 2007-07-26, rcrios******.com <rcrios******.com> wrote:

> We have a RHEL 4 running on a dual Xeon (5110) 1.6Ghz -- 8Gb RAM --
> SAS 2.5" 10K rpm.
>
> We are trying to do a backup of a directory which has more or less
> 10.000.000 of xml files. The files size varies between 1K and 10K.
>
> When we tried to do a tar cvfz backup.tgz /hugedir, we spent more than
> 10 hours and we weren't even close to finish it.
>
> So, my question is: how to do a backup of a huge amount of tiny files?


A few informations seem to miss:
- what is the actual volume ?
- what kind of filenames do you have?
- how many subdirs did you create, and are they to stay, or not?
- what is your destination? Another machine, other disk, tape ?



How about this for an general idea
- make a number of subdirs (say 1000 dirs of 10.0000 files)
(or 100 of 100.000)
- make a script to loop over all subdirs
- if the subdir doesn't exist in backup: rsync the subdir
- if the subdir in backup is older, a file was changed: rsync the subdir


Window of maintenance (you don't say how ling it is): just run that script
for as long a you can. Then stop it, and rerun next window.

--
There is an art, it says, or rather, a knack to flying.
The knack lies in learning how to throw yourself at the ground and miss.
Douglas Adams
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

Reply

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
huge lag on renaming/moving files in Vista Fermanagh Windows Vista 9 04-18-2007 07:45 AM
Files backup versus Full backup Wieslaw Windows Vista 2 03-19-2007 04:30 AM
Vista backup doesn't backup all files (like for example PHP files) A Bertrand Windows Vista 0 02-26-2007 12:45 PM
Copying large folders only finds a tiny percentage of files Dale Windows Vista 0 02-25-2007 03:45 PM
RE: Backup files does not backup .EXEs Dale Windows Vista 0 01-02-2007 10:27 AM


All times are GMT -8. The time now is 10:30 AM.


2003 - 2008 All Rights Reserved. Technology Questions

SEO by vBSEO 3.1.0