AWK practical examples & usage – Complete guide

One of the most powerful data manipulation utilities is awk, a program that incorporates a wide range of data matching, modifying, and programming features. The awk is the first letters of the last names of its three developers, Aho, Weinberger, and Kernighan. The awk utility like grep is a pattern matching tool, but with the added ability to perform specified, often complex, operations on records or on specific fields in records after a pattern is matched.

In addition awk is fully programmable- capable of supporting the loops, conditional statements, and variables expected in a programming language.

One of the most important differences b/w awk and grep is awk’s ability to select records on the basis of the location of values within a record. In addition awk can select pieces of a record for processing. This can only be accomplished when the data is organized in a structured manner, as in a database.

Ways in which to use Awk

the awk utility reads data files or input that is the output of another utility. In this section several introductory forms of the awk utility are used to manipulate data read from files.

  • All lines containing anuj in file records are displayed
  • $ awk '/anuj/ {print}' records
    anuj singh tomar:+919926604345:06:02:1987
    
    $ awk '/anuj/' records

    all lines that contain the target anuj are displayed.  

  • Not specifying a pattern

awk '{print}' records

Every record of the entire file is displayed.

  • The first field of the file records is displayed.
$ awk '{print $1}' records
 anuj
satyendra
 raj

  • The three fields of the file records are displayed without spaces b/w the fields.
    $ awk 'print $3 $2 $1' records
    tomar:+919926604345:06:02:1987singhanuj
    
    narwariya:+919926547924:04:07:1987satyendra
    
    gurjar:+919977186990:08:07:1987kishoreraj
  • the three fields of the file records are displayed with spaces b/w the fields.
$ awk -F: '{print $3, $2,$1}' records 06 +919926604345 anuj singh tomar

04 +919926547924 satyendra narwariya

08 +919977186990 raj kishore gurjar

05 +919893110089 navdeep rajput
  • Prints all the records from the records file.
$ awk -F: '{print $0}' records

anuj singh tomar:+919926604345:06:02:1987
satyendra narwariya:+919926547924:04:07:1987
raj kishore gurjar:+919977186990:08:07:1987
navdeep rajput:+919893110089:05:06:1987

Identifying variables and strings of characters

The $1 is a variable, awk interprets every variable as instruction to replace it with its value. we can create variable and use it with awk.

$ awk -v item='Name = ' '{print item,$1}' records

Name =  anuj Name =  satyendra Name =  raj

Name =  navdeep
the -v option tells awk that the first argument passed to awk tells awk that a variable definition follows.

‘   ‘ Are used to assign value to a variable

“   ” To print a string

  • $ awk -F: '{print "Name = ",$1,"Number = ",$2}' records 
    Name =  anuj singh tomar Number =  +919926604345 
    Name =  satyendra narwariya Number =  +919926547924 
    Name =  raj kishore gurjar Number =  +919977186990

    Selecting with regular expressions

$ awk '/anuj/ {print $0}' records

anuj singh tomar:+919926604345:06:02:1987
$ awk '/[Aa]nuj/ {print $0}' records

anuj singh tomar:+919926604345:06:02:1987 
Anuj singh tomar:+919926604345:06:02:1987

Specifying beginning of lines

  • All lines in the records file that start with a character v are displayed.
    $ awk -F: '/^v/ {print $1,$2}' records
    
    vidyasagar yadav +919349736772

    The output will be all lines starting with other than a through m

$ awk -F: '/^[^a-m]/ {print "Name = ", $1}' records

Name =  satyendra narwariya
Name =  raj kishore gurjar 
Name =  navdeep rajput 
Name =  ravi poddar
Name =  vidyasagar yadav
  • The output will be all the lines starting with a through m
$ awk -F: '/^[a-m]/ {print "Name = ", $1}' records

Name =  anuj singh tomar 
Name = Anuj singh tomar 
Name =  madhuraj tomar 
Name =  amit kumar gupta

Selecting records by specific database components

  • $ awk -F: '$1=="Anuj singh tomar" {print $0}' records

    It shows those records which have anuj in their first field Anuj singh tomar:+919926604345:06:02:1987

  • This command will display all those records which have +919926604345 in its 2nd field.

$ awk -F: '$2==+919926604345' records anuj singh tomar:+919926604345:06:02:1987 Anuj singh tomar:+919926604345:06:02:1987

  • This command shows those files which have either gurjar or tomar in it.
$ awk '/gurjar/ || /tomar/' records
anuj singh tomar:+919926604345:06:02:1987 Anuj singh tomar:+919926604345:06:02:1987 raj kishore gurjar:+919977186990:08:07:1987 madhuraj tomar:+919977859602:04:01:1987
$ awk -F: '$1=="Anuj singh tomar" && $2=="+919926604345"' records
Anuj singh tomar:+919926604345:06:02:1987

Finding patterns within the fields

  • This will search for ‘6’ in 3rd field
$ awk -F: '$3 ~ /6/' records
anuj singh tomar:+919926604345:06:02:1987 Anuj singh tomar:+919926604345:06:02:1987
  • It shows those lines which have 6 in it.
$ awk '/6/' records

anuj singh tomar:+919926604345:06:02:1987 Anuj singh tomar:+919926604345:06:02:1987 satyendra narwariya:+919926547924:04:07:1987 raj kishore gurjar:+919977186990:08:07:1987 navdeep rajput:+919893110089:05:06:1987 madhuraj tomar:+919977859602:04:01:1987 vidyasagar yadav:+919349736772:05:12:1987
  • Shows those lines which have 5 fields
$ awk -F: ' (NF == 5)' records

anuj singh tomar:+919926604345:06:02:1987 Anuj singh tomar:+919926604345:06:02:1987

$ awk -F: ' (NF == 4)' records

no output since no line has 4 fields all have 5 fields.

Creating and Using awk command files

when we place the complex awk commands in a separate file and then associate these files on the command line then we reduce both complexity and the potential for errors.

Example

In a separate file place the following code.

/Anuj/ {print $1, $2} 
#On command line enter foll.
awk -F: -f ex1.ak records

The resulting output will be the first and second fields of the records in file records that contain the string Anuj. -F: should be used before -f otherwise it will show an error we can also specify field separator in the ex1.ak file and we will see it later.

$ awk -F: -f ex1.ak records
Anuj singh tomar +919926604345

$ cat ex2.ak
BEGIN {
FS=":" OFS="-----" ORS="\n"
}
{
print "Record no. is " NR,$1,$2
}

$ awk -f ex2.ak records
Record no. is 1-----anuj singh tomar-----+919926604345 
Record no. is 2-----Anuj singh tomar-----+919926604345
Record no. is 3-----satyendra narwariya-----+919926547924
Record no. is 4-----raj kishore gurjar-----+919977186990
Record no. is 5-----navdeep rajput-----+919893110089 
Record no. is 6-----madhuraj tomar-----+919977859602

In FS we specify the record Field Separator. In OFS we specify the Output Field Separator.

In ORS we specify the Output Record Separator.

The BEGIN’s opening curly brace starts at the same line and not on the new line.

Employing Variables

User defined variables are also supported by awk and they work when you are trying to improve the readability of the code.

  • Create a file ex3.ak with the following awk code.
$ vi ex3.ak
BEGIN { FS=":"
}
/Anuj/ { name=$1 number=$2
print name, price
}

$ awk -f ex3.ak records
Anuj singh tomar +919926604345

Using variable names as words

in awk, literals are always enclosed in quotation marks, variables on the other hand are not quoted. Ex.

$ awk -F: '/Anuj/{name=$1; print "name",name}' records
name Anuj singh tomar

Performing arithmetic operations in awk

In addition to manipulating character strings, the awk utility can apply arithmetic operations to variables and data.

Ex. Subtract one day from date of birth in file records and show the filtered records from 2 to 4.

$ awk -F: 'NR==2,NR==4{ print NR,$1,$3-1}' records
2 Anuj singh tomar 5
3 satyendra narwariya 3
4 raj kishore gurjar 7

Maintaining a running Total

The way in which awk creates and initializes variables can be used to maintain an updated or running total on items in a database.

$ cat ex4.ak
BEGIN { FS=":"
}
{
name=$1 number=$2 total=$3 * $4
running=running+total print name,total,running
}
$ awk -f ex4.ak records
anuj singh tomar 12 12
Anuj singh tomar 12 24
satyendra narwariya 28 52
raj kishore gurjar 56 108

Using the printf function to format output

the awk utility borrows some of its notation and functions from C  language, in which the utility is written. May be Kernighan, who was an author of both, had something to do with it. the C function

printf, is commonly used in awk code to provide additional formating capabilities over basic print

Left and Right justifying the output

Modify the printf function as follows of the previous example:-

printf "%-20s %10s %10s\n", name,total,running

The newly added format specifiers -20 and 10 have altered the appearance of the output .These numerical specifiers create minimum field widths of 20 and 10,10 characters. Their respective variables are left and right,right

Output:

$ awk -f ex4.ak records
anuj singh tomar 12 12
Anuj singh tomar 12 24
satyendra narwariya 28 52
raj kishore gurjar 56 108
navdeep rajput 30 138

Aligning the decimal and truncating numbers

All decimal points in the output should be aligned to do this use following:

%10.2f = tells to right align a floating point number held to a precision of two decimal places rather than a string. This results in an improved alignment.

COMMAND SUMMARY

  • -Fcharacter

When used on command line the -F flag informs awk to use the specified character as the field separator.

  • -v variablename=value

Assign the value to variable before execution of the program . such variable values are available to the BEGIN block of an awk program.

  • ;

Separates actions in a block

  • BEGIN

Instructs awk to perform the following block of actions before processing of the database.

  • END

Instructs awk to perform the following block of actions after processing of the database.

Summary of awk predefined variables

  • $#

the value of @# is the content of the #th field in the current record.

  • $0

the value of $0 is the content of all the fields in the current record.

  • NF

the value of NF is the number of fields in the current record.

  • NR

The value of NR is the Record number of the current record.

  • FS

The value of FS is the value of the field separator. Default separators (Delimiters) are one or more spaces, or a tab.

  • OFS

The output field separator, a space by default.

  • RS

The value of RS is the value of the record separator , the default separator is a newline character.

  • ORS

the output record separator , by default a newline.

Summary of awk printing commands

printf “string”

Prints the string enclosed by the double quotes.

printf “\tstring\n”

prints the string enclosed by double quotes, preceded by a tab and followed by a newline.

print “string %s \n”, variable

print the string in “” replacing %s with the variable and starting at a newline.

printf “%ns” , variable

printf string variable right justified to n number of spaces

printf “%-ns” , variable

printf string variable left justified to n number of spaces

printf “%nf”, variable

print the value of variable as a floating point number, right justified against the end space of a field n characters wide.

printf “%n.nf”, variable

print the value of variable as a floating point number, rounded to the nth decimal point, right justified to the nth space..

Summary of operators

TYPE OF OPERATOR OPERATORS FUNCTION
Logical a || b True if either a or b is true.
a && b True if both a and b are true.
! a true if a is not true
Assignment a = b assign value of b in a
a += b a=a+b
Arithmetic +,-,*,/ same meaning as symbol
Relations a==b true if a matches b
a < b true if a < b
a > b true if a > b
a ~ b true if field a contains string b.

You can download a PDF of this guide from below Link :

https://devops.egyan.space/awk-complete-guide-with-follow-along-examples/

Leave a Reply

Your email address will not be published.