Background
I’ve used AWK for years but never had a reason to use it to split on anything except spaces until now. Hard to believe but I usually use sed to convert the separator characters to spaces.
1 2 |
# sed example echo "this|is|a|string" | sed 's/|/ /g' |
So I was trying to use AWK’s field separator variable, FS but it was skipping the first line of my data file. Here’s what my data file looks like:
1 2 3 |
line1f1|line1f2|line1f3 line2f1|line2f2|line2f3 line3f1|line3f2|line3f3 |
…and here’s an example of what I was seeing with AWK:
1 2 3 4 |
% awk '{FS="|";print $1,$2,$3}' datafile.txt line1f1|line1f2|line1f3 line2f1 line2f2 line2f3 line3f1 line3f2 line3f3 |
Surprisingly I had a hard time finding the solution via Google. Couldn’t get the right search terms 8-). So I’m putting this post on my blog in the hopes of saving someone else a couple of minutes in the future.
Solution
The answer is one of those DUH! moments that are famous for programmers. The key to understanding why this is happening is to better understand how AWK actually works. In my haste to put my above AWK line together I never really appreciated what AWK was actually doing. By the time AWK reads my FS=”|” variable inside the curly braces, it’s already read the first line of my data file, and has dutifully processed it. Turns out there are essentially two ways to correctly set the field separator variable.
#1. on AWK’s command line
1 2 3 4 5 |
# example setting AWK's field sep. via command line switch % awk -F"|" '{print $1,$2,$3}' datafile.txt line1f1 line1f2 line1f3 line2f1 line2f2 line2f3 line3f1 line3f2 line3f3 |
#2. within a BEGIN block
1 2 3 4 5 |
# example setting FS in BEGIN block % awk 'BEGIN {FS="|"} {print $1,$2,$3}' datafile.txt line1f1 line1f2 line1f3 line2f1 line2f2 line2f3 line3f1 line3f2 line3f3 |
HTH someone in the future!
References
NOTE: For further details regarding my one-liner blog posts, check out my one-liner style guide primer.