Linux Comm Awk

[![Image 1: Linux Command Manual](#) Linux Command Manual](#) Awk is a language for processing text files and is a powerful text analysis tool. Awk can read text files line by line and provide functionalities similar to programming languages, such as: * Variable definition and computation * Conditional statements and loops * String processing and formatted output These features make AWK highly efficient when processing structured text (like CSV, log files). The name "awk" is derived from the family names of its three founders: Alfred Aho, Peter Weinberger, and Brian Kernighan. ### Syntax

awk options 'pattern {action}' file

**Option Parameter Description:** * `options`: Some options to control the behavior of `awk`. * `pattern`: A pattern used to match the input data. If omitted, `awk` will operate on all lines. * `{action}`: The action to be executed on lines matching the pattern. If omitted, the default action is to print the entire line. **Options Parameter Description:** * `-F ` or `--field-separator=`: Specifies the delimiter for input fields. The default is a space. Use this option to specify a field delimiter different from the default. * `-v =`: Sets the value of an internal `awk` variable. This option can be used to pass external values to variables in an `awk` script. * `-f `: Specifies a file containing an `awk` script. This allows writing larger `awk` scripts in a file and then loading them via the `-f` option. * `-V` or `--version`: Displays version information for `awk`. * `-h` or `--help`: Displays `awk` help information, including options and usage examples. Here are some common awk command usages: Print the entire line:

awk '{print}' file

Print specific columns:

awk '{print $1, $2}' file

Specify columns using a delimiter:

awk -F',' '{print $1, $2}' file

Print the line number:

awk '{print NR, $0}' file

Print lines where the line number meets a condition:

awk '/pattern/ {print NR, $0}' file

Calculate the sum of a column:

awk '{sum += $1} END {print sum}' file

Print the maximum value:

awk 'max < $1 {max = $1} END {print max}' file

Formatted output:

awk '{printf "%-10s %-10sn", $1, $2}' file

* * * ## Basic Usage The content of the text file `log.txt` is as follows:

2 this is a test
3 Do you like awk
This's a test
10 There are orange,apple,mongo

### Usage 1: Basic Syntax

awk '{  action }' filenames

**Note:** The script part of awk must be enclosed in single quotes `' '`. Example 1: Output the 1st and 4th columns of each line

awk '{print $1, $4}' log.txt

Output:

2 a
3 like
This's
10 orange,apple,mongo

Example 2: Formatted output `printf` can customize the output format (similar to C's printf):

awk '{printf "%-8s %-10sn", $1, $4}' log.txt

Output:

2 a
3 like
This's
10 orange,apple,mongo

### Usage 2: Specifying the Delimiter with -F The `-F` option specifies the delimiter between columns, equivalent to the built-in variable **FS (Field Separator)**. Example 1: Using comma `,` as delimiter

awk -F, '{print $1, $2}' log.txt

Output:

2 this is a test
3 Do you like awk
This's a test
10 There are orange apple

Example 2: Using the built-in variable FS to set the delimiter

awk 'BEGIN { FS="," } {print $1, $2}' log.txt

Output is the same:

2 this is a test
3 Do you like awk
This's a test
10 There are orange apple

Example 3: Using multiple delimiters (space or comma) `[ ,]` means either space or comma is used as a delimiter.

awk -F '[ ,]' '{print $1, $2, $5}' log.txt

Output:

2 this test
3 Do awk
This's a
10 There apple

### Usage 3: Setting Variables with -v The `-v` option is used to pass external variables *before* executing the AWK script, often used for dynamic assignment. Syntax:

awk -v variable_name=value '{action}' filename

Example 1: Define a numeric variable and use it in calculations

awk -va=1 '{print $1, $1+a}' log.txt

Output:

2 3
3 4
This's
10 11

**Explanation:** * `-v a=1` defines a variable `a=1`. * `$1+a` adds the variable `a` to the first column. When `$1` is not a number (like `This's`), the result is empty or outputs as is. ### Example 2: Defining multiple variables simultaneously

awk -va=1 -vb=s '{print $1, $1+a, $1b}' log.txt

Output:

2 3 2s
3 4 3s
This's 1 This'ss
10 11 10s

**Explanation:** * `-vb=s` defines a string variable `b`. * `$1b` concatenates the first column with variable `b` (e.g., `2s`, `10s`). ### Usage 4: Using -f to Call an External AWK Script When the AWK script is long, you can write the logic in a separate file and load it with the `-f` parameter. Syntax:

awk -f script_file filename

Assume we have a script file `cal.awk` with the following content:

{ print $1, $1 + 10 }

Execute the command:

awk -f cal.awk log.txt

Output:

2 12
3 13
This's
10 20

**Explanation:** * `-f cal.awk` means executing the external script `cal.awk`. * AWK will run the action defined in the script on each line of `log.txt`. * * * ## Operators | Operator | Description | | --- | --- | | = += -= *= /= %= ^= **= | Assignment | | ?: | C conditional expression | | || | Logical OR | | && | Logical AND | | ~ and !~ | Match regular expression and not match regular expression | | <>= != == | Relational operators | | Space | Concatenation | | + - | Addition, Subtraction | | * / % | Multiplication, Division, and Modulus | | + - ! | Unary plus, minus, and logical NOT | | ^ *** | Exponentiation | | ++ -- | Increment or decrement, as prefix or postfix | | $ | Field reference | | in | Array membership | Filter rows where the first column is greater than 2

$ awk '$1>2' log.txt
# Command
# Output
3 Do you like awk
This's a test
10 There are orange,apple,mongo

Filter rows where the first column is equal to 2

$ awk '$1==2 {print $1,$3}' log.txt
# Command
# Output
2 is

Filter rows where the first column is greater than 2 AND the second column is 'Are'

$ awk '$1>2 && $2=="Are" {print $1,$2,$3}' log.txt
# Command
# Output
3 Are you

* * * ## Built-in Variables | Variable | Description | | --- | --- | | $n | The nth field of the current record, separated by FS | | $0 | The complete input record | | ARGC | The number of command-line arguments | | ARGIND | The position of the current file in the command line (from 0) | | ARGV | Array containing the command-line arguments | | CONVFMT | The conversion format for numbers (default is %.6g) | | ENVIRON | An associative array of environment variables | | ERRNO | Description of the last system error | | FIELDWIDTHS | A list of field widths separated by spaces | | FILENAME | The current filename | | FNR | The record number of the current file | | FS | The field separator (default is any whitespace) | | IGNORECASE | If true, case-insensitive matching is performed | | NF | The number of fields in the current record | | NR | The total number of records read so far, i.e., line number, starting from 1 | | OFMT | The output format for numbers (default is %.6g) | | OFS | The output field separator (default is the same as the input field separator) | | ORS | The output record separator (default is a newline) | | RLENGTH | The length of the string matched by the match function | | RS | The record separator (default is a newline) | | RSTART | The starting position of the string matched by the match function | | SUBSEP | The subscript separator for arrays (default is 34) |

$ awk 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4sn","FILENAME","ARGC","FNR","FS","NF","NR","OFS","ORS","RS";printf "---------------------------------------------n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4sn",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}' log.txt
FILENAME ARGC FNR FS NF NR OFS ORS RS
---------------------------------------------
log.txt 2 1 5 1
log.txt 2 2 5 2
log.txt 2 3 3 3
log.txt 2 4 4 4

$ awk -F' 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4sn","FILENAME","ARGC","FNR","FS","NF","NR","OFS","ORS","RS";printf "---------------------------------------------n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4sn",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}' log.txt
FILENAME ARGC FNR FS NF NR OFS ORS RS
---------------------------------------------
log.txt 2 1 ' 1 1
log.txt 2 2 ' 1 2
log.txt 2 3 ' 2 3
log.txt 2 4 ' 1 4

# Output the sequence number NR, the matched text line number
$ awk '{print NR,FNR,$1,$2,$3}' log.txt
---------------------------------------------
1 1 2 this is
2 2 3 Are you
3 3 This's a test
4 4 10 There are

# Specify the output separator
$ awk '{print $1,$2,$5}' OFS=" $ " log.txt
---------------------------------------------
 2 $ this $ test
 3 $ Are $ awk
 This's $ a $
 10 $ There $

* * * ## Using Regular Expressions, String Matching

# Output rows where the second column contains "th", and print the second and fourth columns
$ awk '$2 ~ /th/ {print $2,$4}' log.txt
---------------------------------------------
this a

**`~` indicates the start of the pattern. The pattern is inside `//`.**

# Output rows containing "re"
$ awk '/re/ ' log.txt
---------------------------------------------
3 Do you like awk
10 There are orange,apple,mongo

* * * ## Ignoring Case

$ awk 'BEGIN{IGNORECASE=1} /this/' log.txt
---------------------------------------------
2 this is a test
This's a test

* * * ## Negating a Pattern

$ awk '$2 !~ /th/ {print $2,$4}' log.txt
---------------------------------------------
Are like a
There orange,apple,mongo

$ awk '!/th/ {print $2,$4}' log.txt
---------------------------------------------
Are like a
There orange,apple,mongo

* * * ## Awk Script Regarding awk scripts, we need to pay attention to two keywords: `BEGIN` and `END`. * `BEGIN{ ... }` contains statements to be executed before processing. * `END { ... }` contains statements to be executed after processing all lines. * `{ ... }` contains statements to be executed for each line processed. Assume there is a file (student grade table):

$ cat score.txt
 Marry 2143 78 84 77
 Jack 2321 66 78 45
 Tom 2122 48 77 71
 Mike 2537 87 97 95
 Bob 2415 40 57 62

Our awk script is as follows:

$ cat cal.awk
 #!/bin/awk -f
 # Run before processing
 BEGIN {
     math = 0
     english = 0
     computer = 0
     printf "NAME NO. MATH ENGLISH COMPUTER TOTALn"
     printf "---------------------------------------------n"
 }
 # Run during processing
 {
     math+=$3
     english+=$4
     computer+=$5
     printf "%-6s %-6s %4d %8d %8d %8dn", $1, $2, $3,$4,$5, $3+$4+$5
 }
 # Run after processing
 END {
     printf "---------------------------------------------n"
     printf " TOTAL:%10d %8d %8d n", math, english, computer
     printf "AVERAGE:%10.2f %8.2f %8.2fn", math/NR, english/NR, computer/NR
 }

Let's see the execution result:

$ awk -f cal.awk score.txt
 NAME    NO. MATH ENGLISH COMPUTER TOTAL
 ---------------------------------------------
 Marry  2143   78       84       77      239
 Jack   2321   66       78       45      189
 Tom    2122   48       77       71      196
 Mike   2537   87       97       95      279
 Bob    2415   40       57       62      159
 ---------------------------------------------
 TOTAL:       319      393      350
 AVERAGE:     63.80    78.60    70.00

* * * ## Some More Examples The hello world program for AWK is:

BEGIN { print "Hello, world!" }

Calculate file size

$ ls -l *.txt | awk '{sum+=$5} END {print sum}'
--------------------------------------------------
666581

Find lines from a file with a length greater than 80:

awk 'length>80' log.txt

Print the multiplication table

seq 9 | sed 'H;g' | awk -v RS='' '{for(i=1;i<=NF;i++)printf("%dx%d=%d%s", i, NR, i*NR, i==NR?"n":"t")}'

> More content: > > > > > > > > > > > > > > > * (http://www.gnu.org/software/gawk/manual/gawk.html) [![Image 2: Linux Command Manual](#) Linux Command Manual](#)

YouTip

Linux Comm Awk

📂 Categories