OVM image file backup with tar and pigz to speed up archive process

After installing Oracle Fusion Application 11.1.7 on my OVM Server 3.2.6, I end up with at least 500Gb file for my Oracle VM.
I almost spend 2 full days to install it, due to the *very long* installation process. So, in order not to redo it again; I decided to backup those VM.
The first attempts took me another one full day to tar it and move it on my NAS Server. But i’m not satisfied with the processing and transfer rate.
I have tried several unix commands below, but the time it took to backup is too long and live is short to wait :-).

tar zcvf - /wwwdata | ssh root@ "cat > /backup/wwwdata.tar.gz"

ssh -T -c arcfour -o Compression=no -x "tar cf - /remote/path" | tar xf - -C .

time tar cf - OVM_FAH1117_FAHOST -P | pv -s $(du -sb OVM_FAH1117_FAHOST | awk '{print $1}') | gzip > OVM_FAH1117_FAHOST.tgz
 534MiB 0:00:22 [23.9MiB/s] [>                                                               ]  0% ETA 7:19:44

Hopefully, i’m not the only one having this issue. This blogs shows exactly what i’m looking for: http://intermediatesql.com/linux/scrap-the-scp-how-to-copy-data-fast-using-pigz-and-nc/

I apply the following command on my OVM Server, where resides the OVS repository. Of course, Oracle VM does not have the pigz and pv command. I don’t want to use yum on my OVM server as I don’t want to have any side effect. So, i use a VBOX with Oracle Enterprise Linux installed, yum installed the missing package (pv-1.4.12-1.x86_64.rpm and pigz-2.2.5-1.el5.x86_64.rpm) and copy the two missing binary on my OVM server. It works like a charm !

  • /usr/bin/pv
  • /usr/bin/pigz

Below the output of my commands:

[root@ovmserver326 ~]# cd /OVS/Repositories/0004fb00000300004b82295cf8272126/VirtualDisks/

# Parallel Compressing on the same physical disk 
[root@ovmserver326 VirtualDisks]# time tar -c 0004fb00001200002b46ccea59183d13.img | pv --size `du -csh 0004fb00001200002b46ccea59183d13.img | grep total | cut -f1` | pigz -9 > OVM_IDMHOST_READY_FUSION_1117.img.tgz
 500GiB 1:51:07 [76.8MiB/s] [====================================================================================>] 100%            

real    111m8.047s
user    327m18.463s
sys     27m36.859s

# Parallel Compressing over the network (mount NAS) 
[root@ovmserver326 VirtualDisks]# time tar -c 0004fb00001200002b46ccea59183d13.img | pv --size `du -csh 0004fb00001200002b46ccea59183d13.img | grep total | cut -f1` | pigz -9 > /mnt/backup/OVM_IDMHOST_READY_FUSION_1117.img.tgz
500GiB 2:24:21 [59.1MiB/s] [====================================================================================>] 100%            

real    144m21.822s
user    324m0.212s
sys     26m12.830s

# Parallel Compressing over the network (mount NAS) 
[root@ovmserver326 VirtualDisks]# time tar -c 0004fb00001200001cd0bab0e5b8d969.img | pv --size `du -csh 0004fb00001200001cd0bab0e5b8d969.img | grep total | cut -f1` | pigz -9 > /mnt/backup/OVM_FAHOST_FAH_1117_DATA_ONLY.img.tgz
 450GiB 3:02:42 [  42MiB/s] [=====================================================================================] 243%            

real    182m42.824s
user    518m4.517s
sys     31m11.339s
[root@ovmserver326 VirtualDisks]# 

# Parallel Compressing over the network (mount NAS) 
[root@ovmserver326 VirtualDisks]# cd /OVS/Repositories/0004fb00000300004b82295cf8272126/VirtualDisks/
[root@ovmserver326 VirtualDisks]# time tar -c 0004fb00001200002464e6cb85b4b50a.img | pv --size `du -csh 0004fb00001200002464e6cb85b4b50a.img | grep total | cut -f1` | pigz -9 > /mnt/backup/OVM_IDMHOST_FAH_1117_OS_DATA.img.tgz
 500GiB 2:21:31 [60.3MiB/s] [====================================================================================>] 100%            

real    141m31.386s
user    323m38.323s
sys     25m34.585s
[root@ovmserver326 VirtualDisks]# 

If you are missing space, you can use the split command and move (or even better use rsync command) the generated spitted part to your NAS Server. I use the following *quick & dirty* script to automate my backup.


#tar cvz OVM_FAH1117_FAHOST/ /OVS/Repositories/0004fb000003000005560d44ec8c302d/VirtualDisks/0004fb0000120000a35d25ebd6d5df9f.img /OVS/Repositories/0004fb0000030000a6e1c3c4fea3ba92/VirtualDisks/0004fb0000120000d1dacb7fde8c97a0.img /OVS/Repositories/0004fb00000300004b82295cf8272126/VirtualDisks/0004fb000012000067f91ad9f96669b3.img /OVS/Repositories/0004fb000003000005560d44ec8c302d/VirtualDisks/0004fb00001200001cd0bab0e5b8d969.img | split -d -b 4096m - FAH11.1.7_FAHOST.tgz_


vmdisk_to_tar="-C `dirname $VM_CFG_FILE` `basename $VM_CFG_FILE` "
for vmdisk in `cat /data/OVM_FAH1117_FAHOST/vm.cfg.orig  | grep disk`;

  if [[ $vmdisk =~ .*file.* ]]
    #echo "It's there!" $vmdisk
    tmp=$(echo $vmdisk | awk -F':' '{ print $2 }' | awk -F',' '{ print $1 }')
    #echo $tmp
	dirname_tmp=`dirname $tmp`
	basename_tmp=`basename $tmp`
    vmdisk_to_tar+="-C $dirname_tmp $basename_tmp "
echo " Processing files: $vmdisk_to_tar"

echo " Taring all files in background ..."
echo "tar cvz $vmdisk_to_tar | split -d -b 4096m - $TAR_FILENAME_WITHOUT_EXT.tgz_" > /tmp/nohup.out 
nohup `tar cvz $vmdisk_to_tar | split -d -b 4096m - $TAR_FILENAME_WITHOUT_EXT.tgz_` >> /tmp/nohup.out &

echo " Sleep $SLEEP_BEFORE_POLLING before polling..."

echo " Loop continuously..."
while true; 
    nbTar=`ps -eaf | grep 'tar cvz' | grep -v grep | wc -l` 
	for file in $(ls *.tgz*); 
	  sizeActual=$(du -b "$file" | cut -f 1)
	  if ((sizeActual>=sizeMinimum)); then 
		echo "$file OK to transfer via move to $BACKUP_MOUNT_PATH"; 
		#scp $file $SSH_SERVER 
		#echo "Scp $file done. Now removing it !"
		#rm $file
		mv $file $BACKUP_MOUNT_PATH		
	    echo "$file is still compressing"; 
	if ((nbTar<=0)); then 
	  echo "TODO move the last splitted part"
	  mv $file $BACKUP_MOUNT_PATH		
	  exit 0
	 echo "Split is not finish"
	echo "Infinite loop: $current no file to process"

About Chenda Mok

19 years of hands on experience in software design and development with emphasis on Enterprise Application Integration (EAI), Services Oriented Architecture (SOA) and Identity Management (IDM) solutions. I’m a software engineer, member of the professional service delivery team working for Salesforce. Prior to this, I worked for Oracle as Solution Architect, through SeeBeyond(06/2005), then SUN’s acquisition (04/2009). After my master’s degree in computer science in 1997; I always delivered consulting on architecture, design, implementation on integration’s field. I’m interested in architecture using EAI/SOA/IDM/BPM/Cloud technologies, software development and Java’s related technologies. I may blog about my work/activities at Salesforce, but I do not speak for my employer, past, present or future.
This entry was posted in commands, Linux, Linux, Oracle server (ovm server), script and tagged , , , , . Bookmark the permalink.