AWK(1) | General Commands Manual | AWK(1) |
awk
—
pattern-directed scanning and processing
language
awk |
[-F fs]
[-v
var= value]
[-safe ]
[-d [N]]
[prog | -f
progfile] file ... |
awk |
-version |
awk
is the Bell Labs' implementation of
the AWK programming language as described in the The AWK
Programming Language by A.V. Aho, B.W. Kernighan,
P.J. Weinberger.
awk
scans each input
file for lines that match any of a set of patterns
specified literally in prog or in one or more files
specified as -f
progfile. With
each pattern there can be an associated action that will be performed when a
line of a file matches the pattern. Each line is
matched against the pattern portion of every pattern-action statement; the
associated action is performed for each matched pattern. The file name
- means the standard input. Any
file of the form
var=
value
is treated as an assignment, not a filename, and is executed at the time it
would have been opened if it were a filename. The option
-v
followed by
var=
value
is an assignment to be done before prog is executed;
any number of -v
options may be present. The
-F
fs option defines the input
field separator to be the regular expression fs.
The options are as follows:
-d
[N]-f
filename-f
options may be
specified.-F
fs-mr
NNN, -mf
NNN-safe
system
()
make the program abort (with a warning message).-v
var=
value-v
options may be present.-version
awk
version on standard output and
exit.An input line is normally made up of fields separated by white
space, or by the regular expression the built-in variable
FS is set to. If FS is null, the
input line is split into one field per character. The fields are denoted
$
1,
$
2, ..., while
$
0 refers to the entire line.
Setting any other field causes the re-evaluation of
$
0 Assigning to
$
0 resets the values of all
other fields and the NF built-in variable.
A pattern-action statement has the form
{
action
}
A missing {
action
}
means print the line; a missing pattern always
matches. Pattern-action statements are separated by newlines or
semicolons.
An action is a sequence of statements. Statements are terminated
by semicolons, newlines or right braces. An empty
expression-list stands for
$
0. String constants are
quoted ""
, with the usual C escapes
recognized within. Expressions take on string or numeric values as
appropriate, and are built using the
Operators (see next subsection).
Variables may be scalars, array elements (denoted
x[i]) or fields. Variables are
initialized to the null string. Array subscripts may be any string, not
necessarily numeric; this allows for a form of associative memory. Multiple
subscripts such as
[i,
j,
k]
are permitted; the constituents are concatenated, separated by the value of
SUBSEP.
awk
operators, in order of decreasing
precedence, are:
(
...)
$
++
--
^
**
form is also supported, and
**=
for the assignment operator).<
>
<=
>=
!=
==
~
!~
in
&&
||
?:
?
expr2
:
expr3. If
expr1 is true, the result value is
expr2, otherwise it is expr3.
Only one of expr2 and expr3 is
evaluated.= +=
-=
*= /= %=
^=
The control statements are as follows:
if
(
expression)
statement [else
statement]while
(
expression)
statementfor
(
expression;
expression;
expression)
statementfor
(
var in
array)
statementdo
statement while
(
expression)
break
continue
{
[statement ...] }
=
expressionreturn
[expression]next
nextfile
delete
array[
expression]
delete
arrayexit
[expression]The input/output statements are as follows:
close
(expr)fflush
(expr)getline
[var]$
0 if
var is not specified) to the next input record from
the current input file. getline
returns 1 for a
successful input, 0 for end of file, and -1 for an error.getline
[var] <
file$
0 if
var is not specified) to the next input record from
the specified file file.| getline
getline
; each call of
getline
returns the next line of output from
expr.print
[expr-list] [redirection]printf
format[,
expr-list] [redirection]Both print
and
printf
statements write to standard output by
default. The output is written to the file or pipe specified by
redirection if one is supplied, as follows:
>
file,
>>
file, or
|
expr. Both
file and expr may be literal
names or parenthesized expressions; identical string values in different
statements denote the same open file. For that purpose the file names
/dev/stdin, /dev/stdout, and
/dev/stderr refer to the program's
stdin, stdout, and
stderr respectively (and are unrelated to the
fd(4) devices of the same
names).
AWK has the following mathematical and numerical functions built-in:
atan2
(x,
y)/
y in
radians. See also
atan2(3).cos
(expr)exp
(expr)int
(expr)log
(expr)rand
()sin
(expr)sqrt
(expr)srand
([expr])rand
()) and
returns the previous seed.AWK has the following string functions built-in:
gensub
(r,
s, h
[t]);g
’ or
‘G
’, then replace all matches of
r with s. Otherwise,
h is a number indicating which match of
r to replace. If no t is
supplied, $
0 is used
instead. Unlike
sub
()
and
gsub
(),
the modified string is returned as the result of the function, and the
original target is not changed. Note that the
‘\
n’ sequences
(backreferences) within replacement string s
supported by GNU awk
are not
supported at this moment.gsub
(r,
s [t]);sub
() except that all occurrences of the
regular expression are replaced; sub
() and
gsub
() return the number of replacements.index
(s,
t)length
[([string])]$
0 if no argument.match
(s,
r)split
(s,
a [fs]);[1]
,
a[2]
, ...,
a[
n]
,
and returns n. The separation is done with the
regular expression fs or with the field separator
FS if fs is not given. An
empty string as field separator splits the string into one array element
per character.sprintf
(fmt,
expr, ...)sub
(r,
s [t]);$
0 is used.substr
(s,
m [n]);tolower
(str)toupper
(str)This awk
provides the following two
functions for obtaining time stamps and formatting them:
systime
()strftime
([format
[timestamp]]);systime
(). If
timestamp is missing, current time is used. If
format is missing, a default format equivalent to
the output of date(1) would be
used. See the specification of ANSI C
strftime(3) for the format
conversions which are supported.system
(cmd)Patterns are arbitrary Boolean combinations (with
! || &&
) of regular expressions and
relational expressions. Regular expressions are as in
egrep(1). Isolated regular
expressions in a pattern apply to the entire line. Regular expressions may
also occur in relational expressions, using the operators
~
and !~
.
/
re/
is a constant regular expression; any string (constant or variable) may be
used as a regular expression, except in the position of an isolated regular
expression in a pattern.
A pattern may consist of two patterns separated by a comma; in this case, the action is performed for all lines from an occurrence of the first pattern though an occurrence of the second.
A relational expression is one of the following:
in
array-name(
expr,
expr,
... ) in
array-namewhere a relop is any of the six relational
operators in C, and a matchop is either
~
(matches) or !~
(does not
match). A conditional is an arithmetic expression, a relational expression,
or a Boolean combination of these.
The special patterns BEGIN
and
END
may be used to capture control before the first
input line is read and after the last. BEGIN
and
END
do not combine with other patterns.
If an awk program consists of only actions with the pattern
BEGIN
, and the BEGIN
action
contains no getline
statement, awk exits without
reading its input when the last statement in the last
BEGIN
action is executed. If an awk program consists
of only actions with the pattern END
or only actions
with the patterns BEGIN
and
END
, the input is read before the statements in the
END
actions are executed.
Variable names with special meanings:
"%.6g"
)-F
fs."%.6g"
)match
();
0 if no match.match
(); -1 if no
match.034
)Functions may be defined (at the position of a pattern-action statement) thus:
function foo(a, b, c) { ...; return x }
Parameters are passed by value if scalar and by reference if array name; functions may be called recursively. Parameters are local to the function; all other variables are global. Thus local variables may be created by providing excess parameters in the function definition.
Print lines longer than 72 characters.
length
() defaults to
$
0 and the empty parens can
also be omitted in this case:
length > 72
Print first two fields in opposite order:
{ print $2, $1 }
Same, with input fields separated by comma and/or blanks and tabs:
BEGIN { FS = ",[ \t]*|[ \t]+" } { print $2, $1 }
Add up first column, print sum and average:
{ s += $1 } END { print "sum is", s, "average is", s/NR }
Print all lines between start/stop pairs:
/start/, /stop/
Simulate echo(1):
BEGIN { for (i = 1; i < ARGC; ++i) printf("%s%s", ARGV[i], i==ARGC-1?"\n":" ") }
Another way to do the same that demonstrates field assignment and
$
0 re-evaluation:
BEGIN { for (i = 1; i < ARGC; ++i)
$i = ARGV[i]; print }
Print an error message to standard error:
{ print "error!" > "/dev/stderr" }
egrep(1), lex(1), sed(1), atan2(3), cos(3), exp(3), log(3), sin(3), sqrt(3), strftime(3), time(3)
A.V. Aho, B.W. Kernighan, P.J. Weinberger, The AWK Programming Language, Addison-Wesley, 1988. ISBN 0-201-07981-X
AWK Language Programming, Edition 1.0, published by the Free Software Foundation, 1995
nawk
has been the default system
awk
since NetBSD 2.0,
replacing the previously used GNU awk
.
There are no explicit conversions between numbers and strings. To force an expression to be treated as a number add 0 to it; to force it to be treated as a string concatenate "" to it.
The scope rules for variables in functions are a botch; the syntax is worse.
Only eight-bit characters sets are handled correctly.
July 5, 2022 | NetBSD 10.99 |