Tuesday, March 22, 2016

Core util: cat, head, tail, sort, uniq and cut

cat
cat copies each file (‘ -’ means standard input), or standard input if none are given, to standard output.
Synopsis:
cat [ option ] [ file]…

head
head prints the first part (10 lines by default) of each file; it reads from standard input if no files are given or when given a file of - .
Synopsis:
head [ option ]… [ file]…
# head /var/log/auth.log

tail
tail prints the last part (10 lines by default) of each file; it reads from standard input if no files are given or when given a file of ‘ -’.
Synopsis:
tail [ option ]… [ file]…
# tail /var/log/auth.log

sort
sort sorts, merges, or compares all the lines from the given files, or standard input if none are given or for a file of ‘ - ’. By default, sort writes the results to standard output.
Synopsis:
sort [ option ]… [ file]…
options:
‘ -n ’
‘ --numeric-sort ’
‘ --sort=numeric ’
Sort numerically. The number begins each line and consists of optional blanks, an optional ‘ -’ sign, and zero or more digits possibly separated by thousands separators, optionally followed by a decimal-point character and zero or more digits. An empty number is treated as ‘ 0 ’. The LC_NUMERIC locale specifies the decimal-point character and thousands separator. By default a blank is a space or a tab, but the LC_CTYPE locale can change this.

uniq
uniq writes the unique lines in the given
input , or standard input if nothing is given or for an input name of ‘ -’.
Synopsis:
uniq [ option ]… [ input [output ]]
options:
‘ -c ’
‘ --count ’
Print the number of times each line occurred along with the line.

cut
cut writes to standard output selected parts of each line of each input file, or standard input if no files are given or for a file name of ‘ - ’.
Synopsis:
cut option… [ file]…
Options:
‘ -d input_delim_byte ’
‘ --delimiter= input_delim_byte ’
With -f, use the first byte of
input_delim_byte as the input fields separator (default is TAB).
‘ -f field-list ’
‘ --fields= field-list ’
Select for printing only the fields listed in field-list . Fields are separated by a TAB character by default. Also print any line that contains no delimiter character, unless the --only-delimited (-s) option is specified.
Note awk supports more sophisticated field processing, and by default will use (and discard) runs of blank characters to separate fields, and ignore leading and trailing blanks.
awk '{print $2}'    # print the second field
awk '{print $NF-1}' # print the penultimate field
awk '{print $2,$1}' # reorder the first two fields
In the unlikely event that awk is unavailable, one can use the join command, to process blank characters as
awk does above.
join -a1 -o 1.2     - /dev/null # print the second field
join -a1 -o 1.2,1.1 - /dev/null # reorder the first two fields
Example: a quick way to see which IP addresses are most active is to sort by them:
# cat access.log |cut -d ' ' -f 1 |sort

UPDATE: even easier: the uniq command has a -c argument that does most of this work automatically. It counts the occurrences of each unique line. Then a quick sort -n and a tail shows the big ones. Also, I tend to use "cut" as above, but one of the Dreamhost guys reminded me that awk may be a little more straightforward:
# cat /path/to/access.log |awk '{print $1}' |sort  |uniq -c |sort -n |tail

References:

  • https://www.gnu.org/software/coreutils/manual/html_node/index.html
  • https://encodable.com/tech/blog/2008/12/17/Count_IP_Addresses_in_Access_Log_File_BASH_OneLiner

No comments:

Post a Comment