Pushing & Pulling Files Around Using tar, ssh, scp, & rsync

Occasionally I have to copy whole directory trees from one location to another. This copying typically falls into one of two categories:

  • An entire directory from one location to another on the same computer
  • An entire directory from one computer to another computer

There are essentially 2 techniques, PUSH & PULL, which can be used to copy whole trees from one location to another. Below I’m going to cover several methods that make use of these 2 techniques.

Copying directories on the same host

1. tar

1
2
# copy SOURCEDIR into DESTDIR
localhost% tar zcvf - SOURCEDIR | (cd DESTDIR; tar zxvf -)

This approach uses tar to archive the directory SOURCEDIR, redirecting the output to STDOUT instead of a file. The contents of STDOUT are then sent to the pipe. By going through the pipe, the output on STDOUT becomes input on STDIN. This input on STDIN is then sent to everything inside of the parentheses. The commands inside the parentheses, first change directory to DESTDIR, and then un-tars the stream of data coming in via STDIN.

2. cp

1
2
3
4
5
6
7
# copy SOURCEDIR into DESTDIR
# example 1
localhost% cp -a SOURCEDIR DESTDIR/.
# example 2
localhost% cp -cdpR SOURCEDIR DESTDIR/.
# example 3
localhost% cp --preserve=context,mode,ownership,timestamps,links --no-dereference --recursive SOURCEDIR DESTDIR/.

All three examples above do exactly the same thing. Each example is just progressively more verbose. Looking at example 3, the switches should be self explanatory except for maybe –no-dereference. This switch tells cp not to follow links, just create a similar link in the copy being created in DESTDIR.

BTW, I mention cp here because newer versions of cp can in fact be used to make duplicate copies of directories from one location to another on the same host. Older versions, particularly on some older versions of Solaris that I maintain, don’t have the more feature rich version of cp, and so the tar method mentioned above is your only option.

Pushing a directory from localhost —> remotehost

Why call this Push? Conceptually we are “pushing” a duplicate copy of a directory from one location to another, i.e. we are “pushing” this directory FROM the localhost TO a remotehost. If you can’t get your head around the term pushing, think of it as the tar command, pushing the copied directory out from localhost to some destination.

1. tar & ssh

1
2
3
4
5
6
7
8
9
10
11
12
# copy SOURCEDIR from localhost to remotehost over ssh. Untarring begins in /home/user1
localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost tar zxvf -
 
# copy SOURCEDIR from localhost to remotehost over ssh. Untarring in DESTDIR
# example 1
localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost ‘cd DESTDIR; tar zxvf - ’
# example 2
localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost "(cd DESTDIR; tar zxvf -)"
# example 3
localhost% tar zcvf - SOURCEDIR | ssh -l user1 remotehost 'cd DESTDIR ; tar zxvf -'
# example 4
localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost "cat > /DESTDIR/DESTFILE.tar.gz"

NOTE: If the OS you’re on doesn’t have a tar command that supports the z switch, such as with older versions of Solaris, then drop the z switches from both sides of the commands above.

2. scp

1
2
3
4
5
# contents of SOURCEDIR copied to DESTDIR (only what's in SOURCEDIR, not the directory itself)
localhost% scp -rp SOURCEDIR user1@remotehost:/home/user1/DESTDIR
 
# SOURCEDIR copied under DESTDIR
localhost% scp -rp SOURCEDIR user1@remotehost:/home/user1/DESTDIR/.

3. rsync & ssh

1
2
3
4
5
6
7
8
# copy directory SOURCEDIR to DESTDIR
# example 1
localhost% rsync -avzH -e ssh --progress /SOURCEDIR user1@remotehost:/DESTDIR
# example 2
localhost% rsync -avzH -e'ssh' /SOURCEDIR user1@remotehost:/DESTDIR
 
# copy contents of directory SOURCEDIR to DESTDIR
localhost% rsync -avzH -e ssh --progress /SOURCEDIR/ user1@remotehost:/DESTDIR

Pulling a directory to localhost <— remotehost

Why call this Pull? Conceptually we are “pulling” a directory from one location to another, to create a duplicate. Usually we are “pulling” this directory TO our localhost back FROM a remotehost. If you can’t get your head around the term pulling, think of it as a command being run on a remotehost which streams a directory’s content to your localhost, and then the localhost pulls this stream of data in.

1. tar & ssh

1
2
3
4
5
6
7
# copy SOURCEDIR to DESTDIR
# example 1
localhost% ssh remotehost 'tar zcvf - SOURCEDIR' | tar zxvf -
# example 2
localhost% ssh -n remotehost 'tar zcvf - SOURCEDIR' | tar zxvf -
# example 3
localhost% ssh remotehost "( cd SOURCEDIR ; tar zcvf - SOURCEFILES ) " | tar zxvf -
1
2
3
4
5
6
7
8
9
10
# copy SOURCEDIR to a tar file
# example 1
localhost% ssh remotehost 'tar zcvf - SOURCEDIR' | cat > DESTFILE.tar.gz
# example 2
localhost% ssh -n remotehost 'tar zcvf - SOURCEDIR' | cat > DESTFILE.tar.gz
# example 3
localhost% ssh -n remotehost "tar jcvf - SOURCEDIR" > DESTFILE.tar.bz2
 
# NOTE: Example 3 just demonstrates that the "| cat" is actually redundant,
#       so it can be dropped if you like.

Example 1, from the second code block above, might result in what appears to be a corrupted DESTFILE.tar.gz file. For example, after creating DESTFILE.tar.gz on one of my hosts, it showed up like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
# corrupted archive while using example 1
localhost% tar ztvf DESTFILE.tar.gz 
 
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
 
# file details
localhost% file DESTFILE.tar.gz 
DESTFILE.tar.gz: data
 
localhost% ls -l  | grep DESTFILE
-rw-r--r-- 1 user1  users  102934 2009-07-19 23:10 DESTFILE.tar.gz

Fear not! First, you can try fixing it with the command dos2unix DESTFILE.tar.gz to clean the tar file up. This appears to happen when ssh’ing to a user account that has certain output being generated via the ~/.bashrc login file.

Other times input from STDIN will inadvertently get redirected into the tar command being run via the ssh. To completely disable STDIN input to the ssh, use the -n switch, as in example 2.

Finally, there are still yet other times where neither of these will fix your problem. For example, I use the program mailstat in my ~/.bashrc to display how much new email I have since the last time I logged in. The output from mailstat shows up inside of the DESTFILE.tar.gz file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# contents of DESTFILE.tar.gz corrupted by ~/.bashrc commands
% more DESTFILE.tar.gz 
 
  Total  Number Folder
  -----  ------ ------
  14467       4 folder1
   3891       1 /dev/null
   4424       1  formail +1 -eds >> lists/ls/$MLIST
 336579      13  formail +1 -eds >> lists/rh/$MLIST
   4849       1  formail +1 -eds >> lists/trilug
 228074      13 /home/user1/Mail/INBOX
 108279      16 /home/user1/Mail/main_boxes/razor-caught
  24320       1 lists/sans/newsbites
  57863      11 lists/sunsource/gridengine
  13710       1 main_boxes/spamassassin_caught
  87947      16 main_boxes/Trash
  -----  ------
 884403      78
...
...
*** contents of tar ***
...
...

It turns out that this type of problem is because ~/.bashrc shouldn’t ever include any commands that echo output to STDOUT or STDERR. These commands should really be relocated to either ~/.bash_login or ~/.bash_profile. Relocating anything that echoes output to STDIN or STDERR results in a correctly transferred DESTFILE.tar.gz.

1
2
3
4
5
6
# correct DESTFILE.tar.gz
localhost% file DESTFILE.tar.gz 
DESTFILE.tar.gz: gzip compressed data, from Unix, last modified: Tue Jul 21 01:13:48 2009
 
localhost% ls -l | grep DEST
-rw-r--r--  1 user1   users    102400 2009-07-21 00:49 DESTFILE.tar.gz

2. scp

1
2
3
4
5
# contents of SOURCEDIR copied to DESTDIR (only what's in SOURCEDIR, not the directory itself)
localhost% scp -rp user1@remotehost:/home/user1/SOURCEDIR DESTDIR
 
# SOURCEDIR copied under DESTDIR
localhost% scp -rp user1@remotehost:/home/user1/SOURCEDIR DESTDIR/.

3. rsync & ssh

1
2
3
4
5
6
7
8
# copy directory SOURCEDIR to DESTDIR
# example 1
localhost% rsync -avzH -e ssh --progress user1@remotehost:/SOURCEDIR /DESTDIR
# example 2
localhost% rsync -avzH -e'ssh' user1@remotehost:/SOURCEDIR /DESTDIR
 
# copy contents of directory SOURCEDIR to DESTDIR
localhost% rsync -avzH -e ssh --progress user1@remotehost:/SOURCEDIR/ /DESTDIR

These sites proved useful for working out some of the finer points:

This entry was posted in shell, Syndicated, tip, tips & tricks. Bookmark the permalink.

Comments are closed.