Dedetok: My Experiences Notes: Core util: cat, head, tail, sort, uniq and cut

cat

cat copies each file (‘ -’ means standard input), or standard input if none are given, to standard output.
Synopsis:

cat [ option ] [ file]…

head

head prints the first part (10 lines by default) of each file; it reads from standard input if no files are given or when given a file of - .
Synopsis:

head [ option ]… [ file]…

# head /var/log/auth.log

tail

tail prints the last part (10 lines by default) of each file; it reads from standard input if no files are given or when given a file of ‘ -’.
Synopsis:

tail [ option ]… [ file]…

# tail /var/log/auth.log

sort

sort sorts, merges, or compares all the lines from the given files, or standard input if none are given or for a file of ‘ - ’. By default, sort writes the results to standard output.
Synopsis:

sort [ option ]… [ file]…

options:

‘ -n ’
‘ --numeric-sort ’
‘ --sort=numeric ’
Sort numerically. The number begins each line and consists of optional blanks, an optional ‘ -’ sign, and zero or more digits possibly separated by thousands separators, optionally followed by a decimal-point character and zero or more digits. An empty number is treated as ‘ 0 ’. The LC_NUMERIC locale specifies the decimal-point character and thousands separator. By default a blank is a space or a tab, but the LC_CTYPE locale can change this.

uniq

uniq writes the unique lines in the given
input , or standard input if nothing is given or for an input name of ‘ -’.
Synopsis:

uniq [ option ]… [ input [output ]]

options:

‘ -c ’
‘ --count ’
Print the number of times each line occurred along with the line.

cut

cut writes to standard output selected parts of each line of each input file, or standard input if no files are given or for a file name of ‘ - ’.
Synopsis:

cut option… [ file]…

Options:
‘ -d input_delim_byte ’
‘ --delimiter= input_delim_byte ’
With -f, use the first byte of
input_delim_byte as the input fields separator (default is TAB).

‘ -f field-list ’

‘ --fields= field-list ’

Select for printing only the fields listed in field-list . Fields are separated by a TAB character by default. Also print any line that contains no delimiter character, unless the --only-delimited (-s) option is specified.

Note awk supports more sophisticated field processing, and by default will use (and discard) runs of blank characters to separate fields, and ignore leading and trailing blanks.

awk '{print $2}' # print the second field
awk '{print $NF-1}' # print the penultimate field
awk '{print $2,$1}' # reorder the first two fields

In the unlikely event that awk is unavailable, one can use the join command, to process blank characters as

awk does above.

join -a1 -o 1.2 - /dev/null # print the second field
join -a1 -o 1.2,1.1 - /dev/null # reorder the first two fields

Example: a quick way to see which IP addresses are most active is to sort by them:

# cat access.log |cut -d ' ' -f 1 |sort

UPDATE: even easier: the uniq command has a -c argument that does most of this work automatically. It counts the occurrences of each unique line. Then a quick sort -n and a tail shows the big ones. Also, I tend to use "cut" as above, but one of the Dreamhost guys reminded me that awk may be a little more straightforward:

# cat /path/to/access.log |awk '{print $1}' |sort |uniq -c |sort -n |tail

References:

https://www.gnu.org/software/coreutils/manual/html_node/index.html
https://encodable.com/tech/blog/2008/12/17/Count_IP_Addresses_in_Access_Log_File_BASH_OneLiner

Tuesday, March 22, 2016

Core util: cat, head, tail, sort, uniq and cut