How to rsync certain files, exclude the rest, all while ignoring .svn directories?

I came across this question on the Stack Exchange site Unix & Linux. The question interested me so I answered it but thought I’d cross post it on my blog as well, given I took a pretty significant amount of time to put together a test case and write-up of how the solution ultimately worked.

Problem

I’m using rsync to copy some files from a share to another.

Recursively, I need to:

– delete files at the destination that are deleted in the origin
– Only sync php and js files
– exclude de rest of file types
– Don’t delete .svn/ directory in the destination

If I use this:

rsync -zavC --delete --include='*.php' --include='*.js' --exclude="*" /media/datacod/Test/ /home/lucas/Desktop/rsync/

Then rsync is not recursive because exclude=”*” excludes all files but also folders

If I add --include="*/" then the .svn/ directory gets deleted (it also gets included)

How can I solve this mind blasting dilemma?

Solution

The solution I ultimately came up with made use of a little known feature, at least to me, called filters. Filters allow you to play games with the includes/excludes by protecting portions based on regular expressions. Read on, I’ll discuss them further down.

1
2
rsync -avzC --filter='-rs_*/.svn*' --include="*/" --include='*.js' --include='*.php' \
     --exclude="*" --delete dir1/ dir2/

test data

To help determine if my solution was going to work or not I created some sample data so that I could test it out. For starters I wrote a script that would generate the data. Here’s that script, setup_svn_sample.bash:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/bash
 
# setup .svn dirs
mkdir -p dir{1,2}/dir{1,2,3,4}/.svn
 
# fake data under .svn
mkdir -p dir1/dir{1,2,3,4}/.svn/origdir
mkdir -p dir2/dir{1,2,3,4}/.svn/keepdir
 
# files to not sync
touch dir1/dir{1,2,3,4}/file{1,2}
 
# files to sync
touch dir1/dir{1,2,3,4}/file1.js
touch dir1/dir{1,2,3,4}/file1.php

Running the above script produces the following directories (dir1 & dir2):

source dir

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ tree -a dir1
dir1
|-- dir1
|   |-- file1
|   |-- file1.js
|   |-- file1.php
|   |-- file2
|   `-- .svn
|       `-- origdir
|-- dir2
|   |-- file1
|   |-- file1.js
|   |-- file1.php
|   |-- file2
|   `-- .svn
|       `-- origdir
|-- dir3
|   |-- file1
|   |-- file1.js
|   |-- file1.php
|   |-- file2
|   `-- .svn
|       `-- origdir
`-- dir4
    |-- file1
    |-- file1.js
    |-- file1.php
    |-- file2
    `-- .svn
        `-- origdir

destination dir

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ tree -a dir2
dir2
|-- dir1
|   `-- .svn
|       `-- keepdir
|-- dir2
|   `-- .svn
|       `-- keepdir
|-- dir3
|   `-- .svn
|       `-- keepdir
`-- dir4
    `-- .svn
        `-- keepdir

Running the above rsync command which includes the --filter below we can see that it’s only syncing the files that match the --include patterns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
rsync -avzC --filter='-rs_*/.svn*' --include="*/" --include='*.js' --include='*.php' \
     --exclude="*" --delete dir1/ dir2/
sending incremental file list
dir1/file1.js
dir1/file1.php
dir2/file1.js
dir2/file1.php
dir3/file1.js
dir3/file1.php
dir4/file1.js
dir4/file1.php
 
sent 480 bytes  received 168 bytes  1296.00 bytes/sec
total size is 0  speedup is 0.00

Resulting dir2 afterwards:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ tree -a dir2
dir2
|-- dir1
|   |-- file1.js
|   |-- file1.php
|   `-- .svn
|       `-- keepdir
|-- dir2
|   |-- file1.js
|   |-- file1.php
|   `-- .svn
|       `-- keepdir
|-- dir3
|   |-- file1.js
|   |-- file1.php
|   `-- .svn
|       `-- keepdir
`-- dir4
    |-- file1.js
    |-- file1.php
    `-- .svn
        `-- keepdir

Why does it work?

The key piece to this script is to make use of the filters capability of rsync. Filters allow you to remove files from the matched set at various points in the command. So in our case we’re filtering any files that match the pattern */.svn*. The modifiers -rs_ tell the filter that we want to filter on both the source side as well as the target side.

excerpt from the FILTER NOTES section of rsync’s man page

– An s is used to indicate that the rule applies to the sending side. When a rule affects the sending side, it prevents files from being
transferred. The default is for a rule to affect both sides unless --delete-excluded was specified, in which case default rules become sender-side only. See also the hide (H) and show (S) rules, which are an alternate way to specify sending-side includes/excludes.

– An r is used to indicate that the rule applies to the receiving side. When a rule affects the receiving side, it prevents files from being deleted. See the s modifier for more info. See also the protect (P) and risk ® rules, which are an alternate way to specify receiver-side includes/excludes.

See man rsync for more details.

Tips for figuring this out (hint using --dry-run)

While describing how to do this I thought I’d mention the --dry-run switch to rsync. It’ extremely useful in seeing what will happen without having the rsync actually take place.

For Example

Using the following command will do a test run and show us the decision logic behind rsync:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
rsync --dry-run -avvzC --filter='-rs_*/.svn*' --include="*/" \
     --include='*.js' --include='*.php' --exclude="*" --delete dir1/ dir2/
sending incremental file list
[sender] showing directory dir3 because of pattern */
[sender] showing directory dir2 because of pattern */
[sender] showing directory dir4 because of pattern */
[sender] showing directory dir1 because of pattern */
[sender] hiding file dir1/file1 because of pattern *
[sender] showing file dir1/file1.js because of pattern *.js
[sender] hiding file dir1/file2 because of pattern *
[sender] showing file dir1/file1.php because of pattern *.php
[sender] hiding directory dir1/.svn because of pattern */.svn*
[sender] hiding file dir2/file1 because of pattern *
[sender] showing file dir2/file1.js because of pattern *.js
[sender] hiding file dir2/file2 because of pattern *
[sender] showing file dir2/file1.php because of pattern *.php
[sender] hiding directory dir2/.svn because of pattern */.svn*
[sender] hiding file dir3/file1 because of pattern *
[sender] showing file dir3/file1.js because of pattern *.js
[sender] hiding file dir3/file2 because of pattern *
[sender] showing file dir3/file1.php because of pattern *.php
[sender] hiding directory dir3/.svn because of pattern */.svn*
[sender] hiding file dir4/file1 because of pattern *
[sender] showing file dir4/file1.js because of pattern *.js
[sender] hiding file dir4/file2 because of pattern *
[sender] showing file dir4/file1.php because of pattern *.php
[sender] hiding directory dir4/.svn because of pattern */.svn*
delta-transmission disabled for local transfer or --whole-file
[generator] risking directory dir3 because of pattern */
[generator] risking directory dir2 because of pattern */
[generator] risking directory dir4 because of pattern */
[generator] risking directory dir1 because of pattern */
[generator] protecting directory dir1/.svn because of pattern */.svn*
dir1/file1.js
dir1/file1.php
[generator] protecting directory dir2/.svn because of pattern */.svn*
dir2/file1.js
dir2/file1.php
[generator] protecting directory dir3/.svn because of pattern */.svn*
dir3/file1.js
dir3/file1.php
[generator] protecting directory dir4/.svn because of pattern */.svn*
dir4/file1.js
dir4/file1.php
total: matches=0  hash_hits=0  false_alarms=0 data=0
 
sent 231 bytes  received 55 bytes  572.00 bytes/sec
total size is 0  speedup is 0.00 (DRY RUN)

In the above output you can see that the ./svn directories are being protected by our filter rule. Valuable insight for debugging the rsync.

References

Delete extraneous files from dest dir via rsync?
Above scripts in a tarball

This entry was posted in rsync, script, Syndicated, tutorials. Bookmark the permalink.

Comments are closed.