Occasionally I have to copy whole directory trees from one location to another. This copying typically falls into one of two categories:
- An entire directory from one location to another on the same computer
- An entire directory from one computer to another computer
There are essentially 2 techniques, PUSH & PULL, which can be used to copy whole trees from one location to another. Below I’m going to cover several methods that make use of these 2 techniques.
Copying directories on the same host
1. tar
1 2 |
# copy SOURCEDIR into DESTDIR localhost% tar zcvf - SOURCEDIR | (cd DESTDIR; tar zxvf -) |
This approach uses tar to archive the directory SOURCEDIR, redirecting the output to STDOUT instead of a file. The contents of STDOUT are then sent to the pipe. By going through the pipe, the output on STDOUT becomes input on STDIN. This input on STDIN is then sent to everything inside of the parentheses. The commands inside the parentheses, first change directory to DESTDIR, and then un-tars the stream of data coming in via STDIN.
2. cp
1 2 3 4 5 6 7 |
# copy SOURCEDIR into DESTDIR # example 1 localhost% cp -a SOURCEDIR DESTDIR/. # example 2 localhost% cp -cdpR SOURCEDIR DESTDIR/. # example 3 localhost% cp --preserve=context,mode,ownership,timestamps,links --no-dereference --recursive SOURCEDIR DESTDIR/. |
All three examples above do exactly the same thing. Each example is just progressively more verbose. Looking at example 3, the switches should be self explanatory except for maybe –no-dereference. This switch tells cp not to follow links, just create a similar link in the copy being created in DESTDIR.
BTW, I mention cp here because newer versions of cp can in fact be used to make duplicate copies of directories from one location to another on the same host. Older versions, particularly on some older versions of Solaris that I maintain, don’t have the more feature rich version of cp, and so the tar method mentioned above is your only option.
Pushing a directory from localhost —> remotehost
Why call this Push? Conceptually we are “pushing” a duplicate copy of a directory from one location to another, i.e. we are “pushing” this directory FROM the localhost TO a remotehost. If you can’t get your head around the term pushing, think of it as the tar command, pushing the copied directory out from localhost to some destination.
1. tar & ssh
1 2 3 4 5 6 7 8 9 10 11 12 |
# copy SOURCEDIR from localhost to remotehost over ssh. Untarring begins in /home/user1 localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost tar zxvf - # copy SOURCEDIR from localhost to remotehost over ssh. Untarring in DESTDIR # example 1 localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost ‘cd DESTDIR; tar zxvf - ’ # example 2 localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost "(cd DESTDIR; tar zxvf -)" # example 3 localhost% tar zcvf - SOURCEDIR | ssh -l user1 remotehost 'cd DESTDIR ; tar zxvf -' # example 4 localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost "cat > /DESTDIR/DESTFILE.tar.gz" |
NOTE: If the OS you’re on doesn’t have a tar command that supports the z switch, such as with older versions of Solaris, then drop the z switches from both sides of the commands above.
2. scp
1 2 3 4 5 |
# contents of SOURCEDIR copied to DESTDIR (only what's in SOURCEDIR, not the directory itself) localhost% scp -rp SOURCEDIR user1@remotehost:/home/user1/DESTDIR # SOURCEDIR copied under DESTDIR localhost% scp -rp SOURCEDIR user1@remotehost:/home/user1/DESTDIR/. |
3. rsync & ssh
1 2 3 4 5 6 7 8 |
# copy directory SOURCEDIR to DESTDIR # example 1 localhost% rsync -avzH -e ssh --progress /SOURCEDIR user1@remotehost:/DESTDIR # example 2 localhost% rsync -avzH -e'ssh' /SOURCEDIR user1@remotehost:/DESTDIR # copy contents of directory SOURCEDIR to DESTDIR localhost% rsync -avzH -e ssh --progress /SOURCEDIR/ user1@remotehost:/DESTDIR |
Pulling a directory to localhost <— remotehost
Why call this Pull? Conceptually we are “pulling” a directory from one location to another, to create a duplicate. Usually we are “pulling” this directory TO our localhost back FROM a remotehost. If you can’t get your head around the term pulling, think of it as a command being run on a remotehost which streams a directory’s content to your localhost, and then the localhost pulls this stream of data in.
1. tar & ssh
1 2 3 4 5 6 7 |
# copy SOURCEDIR to DESTDIR # example 1 localhost% ssh remotehost 'tar zcvf - SOURCEDIR' | tar zxvf - # example 2 localhost% ssh -n remotehost 'tar zcvf - SOURCEDIR' | tar zxvf - # example 3 localhost% ssh remotehost "( cd SOURCEDIR ; tar zcvf - SOURCEFILES ) " | tar zxvf - |
1 2 3 4 5 6 7 8 9 10 |
# copy SOURCEDIR to a tar file # example 1 localhost% ssh remotehost 'tar zcvf - SOURCEDIR' | cat > DESTFILE.tar.gz # example 2 localhost% ssh -n remotehost 'tar zcvf - SOURCEDIR' | cat > DESTFILE.tar.gz # example 3 localhost% ssh -n remotehost "tar jcvf - SOURCEDIR" > DESTFILE.tar.bz2 # NOTE: Example 3 just demonstrates that the "| cat" is actually redundant, # so it can be dropped if you like. |
Example 1, from the second code block above, might result in what appears to be a corrupted DESTFILE.tar.gz file. For example, after creating DESTFILE.tar.gz on one of my hosts, it showed up like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# corrupted archive while using example 1 localhost% tar ztvf DESTFILE.tar.gz gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error exit delayed from previous errors # file details localhost% file DESTFILE.tar.gz DESTFILE.tar.gz: data localhost% ls -l | grep DESTFILE -rw-r--r-- 1 user1 users 102934 2009-07-19 23:10 DESTFILE.tar.gz |
Fear not! First, you can try fixing it with the command dos2unix DESTFILE.tar.gz to clean the tar file up. This appears to happen when ssh’ing to a user account that has certain output being generated via the ~/.bashrc login file.
Other times input from STDIN will inadvertently get redirected into the tar command being run via the ssh. To completely disable STDIN input to the ssh, use the -n switch, as in example 2.
Finally, there are still yet other times where neither of these will fix your problem. For example, I use the program mailstat in my ~/.bashrc to display how much new email I have since the last time I logged in. The output from mailstat shows up inside of the DESTFILE.tar.gz file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# contents of DESTFILE.tar.gz corrupted by ~/.bashrc commands % more DESTFILE.tar.gz Total Number Folder ----- ------ ------ 14467 4 folder1 3891 1 /dev/null 4424 1 formail +1 -eds >> lists/ls/$MLIST 336579 13 formail +1 -eds >> lists/rh/$MLIST 4849 1 formail +1 -eds >> lists/trilug 228074 13 /home/user1/Mail/INBOX 108279 16 /home/user1/Mail/main_boxes/razor-caught 24320 1 lists/sans/newsbites 57863 11 lists/sunsource/gridengine 13710 1 main_boxes/spamassassin_caught 87947 16 main_boxes/Trash ----- ------ 884403 78 ... ... *** contents of tar *** ... ... |
It turns out that this type of problem is because ~/.bashrc shouldn’t ever include any commands that echo output to STDOUT or STDERR. These commands should really be relocated to either ~/.bash_login or ~/.bash_profile. Relocating anything that echoes output to STDIN or STDERR results in a correctly transferred DESTFILE.tar.gz.
1 2 3 4 5 6 |
# correct DESTFILE.tar.gz localhost% file DESTFILE.tar.gz DESTFILE.tar.gz: gzip compressed data, from Unix, last modified: Tue Jul 21 01:13:48 2009 localhost% ls -l | grep DEST -rw-r--r-- 1 user1 users 102400 2009-07-21 00:49 DESTFILE.tar.gz |
2. scp
1 2 3 4 5 |
# contents of SOURCEDIR copied to DESTDIR (only what's in SOURCEDIR, not the directory itself) localhost% scp -rp user1@remotehost:/home/user1/SOURCEDIR DESTDIR # SOURCEDIR copied under DESTDIR localhost% scp -rp user1@remotehost:/home/user1/SOURCEDIR DESTDIR/. |
3. rsync & ssh
1 2 3 4 5 6 7 8 |
# copy directory SOURCEDIR to DESTDIR # example 1 localhost% rsync -avzH -e ssh --progress user1@remotehost:/SOURCEDIR /DESTDIR # example 2 localhost% rsync -avzH -e'ssh' user1@remotehost:/SOURCEDIR /DESTDIR # copy contents of directory SOURCEDIR to DESTDIR localhost% rsync -avzH -e ssh --progress user1@remotehost:/SOURCEDIR/ /DESTDIR |
These sites proved useful for working out some of the finer points: