I came across this question on the Stack Exchange site Unix & Linux. The question interested me so I answered it but thought I’d cross post it on my blog as well, given I took a pretty significant amount of time to put together a test case and write-up of how the solution ultimately worked.
Problem
I’m using rsync to copy some files from a share to another.
Recursively, I need to:
- delete files at the destination that are deleted in the origin
- Only sync php and js files
- exclude de rest of file types
- Don’t delete .svn/ directory in the destinationIf I use this:
rsync -zavC --delete --include='*.php' --include='*.js' --exclude="*" /media/datacod/Test/ /home/lucas/Desktop/rsync/Then
rsyncis not recursive because exclude=”*” excludes all files but also foldersIf I add
--include="*/"then the.svn/directory gets deleted (it also gets included)How can I solve this mind blasting dilemma?
Solution
The solution I ultimately came up with made use of a little known feature, at least to me, called filters. Filters allow you to play games with the includes/excludes by protecting portions based on regular expressions. Read on, I’ll discuss them further down.
1 2 |
rsync -avzC --filter='-rs_*/.svn*' --include="*/" --include='*.js' --include='*.php' \ --exclude="*" --delete dir1/ dir2/ |
test data
To help determine if my solution was going to work or not I created some sample data so that I could test it out. For starters I wrote a script that would generate the data. Here’s that script, setup_svn_sample.bash:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
#!/bin/bash # setup .svn dirs mkdir -p dir{1,2}/dir{1,2,3,4}/.svn # fake data under .svn mkdir -p dir1/dir{1,2,3,4}/.svn/origdir mkdir -p dir2/dir{1,2,3,4}/.svn/keepdir # files to not sync touch dir1/dir{1,2,3,4}/file{1,2} # files to sync touch dir1/dir{1,2,3,4}/file1.js touch dir1/dir{1,2,3,4}/file1.php |
Running the above script produces the following directories (dir1 & dir2):
source dir
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
$ tree -a dir1 dir1 |-- dir1 | |-- file1 | |-- file1.js | |-- file1.php | |-- file2 | `-- .svn | `-- origdir |-- dir2 | |-- file1 | |-- file1.js | |-- file1.php | |-- file2 | `-- .svn | `-- origdir |-- dir3 | |-- file1 | |-- file1.js | |-- file1.php | |-- file2 | `-- .svn | `-- origdir `-- dir4 |-- file1 |-- file1.js |-- file1.php |-- file2 `-- .svn `-- origdir |
destination dir
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$ tree -a dir2 dir2 |-- dir1 | `-- .svn | `-- keepdir |-- dir2 | `-- .svn | `-- keepdir |-- dir3 | `-- .svn | `-- keepdir `-- dir4 `-- .svn `-- keepdir |
Running the above rsync command which includes the --filter below we can see that it’s only syncing the files that match the --include patterns:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
rsync -avzC --filter='-rs_*/.svn*' --include="*/" --include='*.js' --include='*.php' \ --exclude="*" --delete dir1/ dir2/ sending incremental file list dir1/file1.js dir1/file1.php dir2/file1.js dir2/file1.php dir3/file1.js dir3/file1.php dir4/file1.js dir4/file1.php sent 480 bytes received 168 bytes 1296.00 bytes/sec total size is 0 speedup is 0.00 |
Resulting dir2 afterwards:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
$ tree -a dir2 dir2 |-- dir1 | |-- file1.js | |-- file1.php | `-- .svn | `-- keepdir |-- dir2 | |-- file1.js | |-- file1.php | `-- .svn | `-- keepdir |-- dir3 | |-- file1.js | |-- file1.php | `-- .svn | `-- keepdir `-- dir4 |-- file1.js |-- file1.php `-- .svn `-- keepdir |
Why does it work?
The key piece to this script is to make use of the filters capability of rsync. Filters allow you to remove files from the matched set at various points in the command. So in our case we’re filtering any files that match the pattern */.svn*. The modifiers -rs_ tell the filter that we want to filter on both the source side as well as the target side.
excerpt from the FILTER NOTES section of rsync’s man page
- An s is used to indicate that the rule applies to the sending side. When a rule affects the sending side, it prevents files from being
transferred. The default is for a rule to affect both sides unless--delete-excludedwas specified, in which case default rules become sender-side only. See also the hide (H) and show (S) rules, which are an alternate way to specify sending-side includes/excludes.- An r is used to indicate that the rule applies to the receiving side. When a rule affects the receiving side, it prevents files from being deleted. See the s modifier for more info. See also the protect (P) and risk ® rules, which are an alternate way to specify receiver-side includes/excludes.
See man rsync for more details.
Tips for figuring this out (hint using --dry-run)
While describing how to do this I thought I’d mention the --dry-run switch to rsync. It’ extremely useful in seeing what will happen without having the rsync actually take place.
For Example
Using the following command will do a test run and show us the decision logic behind rsync:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
rsync --dry-run -avvzC --filter='-rs_*/.svn*' --include="*/" \ --include='*.js' --include='*.php' --exclude="*" --delete dir1/ dir2/ sending incremental file list [sender] showing directory dir3 because of pattern */ [sender] showing directory dir2 because of pattern */ [sender] showing directory dir4 because of pattern */ [sender] showing directory dir1 because of pattern */ [sender] hiding file dir1/file1 because of pattern * [sender] showing file dir1/file1.js because of pattern *.js [sender] hiding file dir1/file2 because of pattern * [sender] showing file dir1/file1.php because of pattern *.php [sender] hiding directory dir1/.svn because of pattern */.svn* [sender] hiding file dir2/file1 because of pattern * [sender] showing file dir2/file1.js because of pattern *.js [sender] hiding file dir2/file2 because of pattern * [sender] showing file dir2/file1.php because of pattern *.php [sender] hiding directory dir2/.svn because of pattern */.svn* [sender] hiding file dir3/file1 because of pattern * [sender] showing file dir3/file1.js because of pattern *.js [sender] hiding file dir3/file2 because of pattern * [sender] showing file dir3/file1.php because of pattern *.php [sender] hiding directory dir3/.svn because of pattern */.svn* [sender] hiding file dir4/file1 because of pattern * [sender] showing file dir4/file1.js because of pattern *.js [sender] hiding file dir4/file2 because of pattern * [sender] showing file dir4/file1.php because of pattern *.php [sender] hiding directory dir4/.svn because of pattern */.svn* delta-transmission disabled for local transfer or --whole-file [generator] risking directory dir3 because of pattern */ [generator] risking directory dir2 because of pattern */ [generator] risking directory dir4 because of pattern */ [generator] risking directory dir1 because of pattern */ [generator] protecting directory dir1/.svn because of pattern */.svn* dir1/file1.js dir1/file1.php [generator] protecting directory dir2/.svn because of pattern */.svn* dir2/file1.js dir2/file1.php [generator] protecting directory dir3/.svn because of pattern */.svn* dir3/file1.js dir3/file1.php [generator] protecting directory dir4/.svn because of pattern */.svn* dir4/file1.js dir4/file1.php total: matches=0 hash_hits=0 false_alarms=0 data=0 sent 231 bytes received 55 bytes 572.00 bytes/sec total size is 0 speedup is 0.00 (DRY RUN) |
In the above output you can see that the ./svn directories are being protected by our filter rule. Valuable insight for debugging the rsync.
References
- Delete extraneous files from dest dir via rsync?
- Above scripts in a tarball




LATEST NEWS
