Which Text Stream Processing Command Filters Identical

sort

File sorter, frequently used as a filter in a pipe. This command sorts a text stream or file forwards or backwards, or co-ordinate to diverse keys or character positions. Using the -m selection, it merges presorted input files. The info folio lists its many capabilities and options. Run into Example 10-ix, Instance 10-10, and Instance A-9.

tsort

Topological sort, reading in pairs of whitespace-separated strings and sorting co-ordinate to input patterns.

uniq

This filter removes duplicate lines from a sorted file. It is often seen in a pipe coupled with sort.

true cat listing-one list-two list-3 | sort | uniq > final.listing # Concatenates the list files, # sorts them, # removes duplicate lines, # and finally writes the effect to an output file.

The useful -c pick prefixes each line of the input file with its number of occurrences.

                  bash$                                                        cat testfile                                    This line occurs only once.  This line occurs twice.  This line occurs twice.  This line occurs iii times.  This line occurs three times.  This line occurs three times.                  bash$                                                        uniq -c testfile                                                        ane This line occurs merely once.        2 This line occurs twice.        three This line occurs iii times.                  bash$                                                        sort testfile | uniq -c | sort -nr                                                        three This line occurs three times.        2 This line occurs twice.        i This line occurs only once.

The sort INPUTFILE | uniq -c | sort -nr command string produces a frequency of occurrence listing on the INPUTFILE file (the -nr options to sort crusade a reverse numerical sort). This template finds utilize in assay of log files and dictionary lists, and wherever the lexical structure of a document needs to exist examined.

Example 12-xi. Word Frequency Analysis

#!/bin/bash # wf.sh: Crude discussion frequency assay on a text file. # This is a more efficient version of the "wf2.sh" script.   # Bank check for input file on control line. ARGS=ane E_BADARGS=65 E_NOFILE=66  if [ $# -ne "$ARGS" ]  # Correct number of arguments passed to script? then   echo "Usage: `basename $0` filename"   exit $E_BADARGS fi  if [ ! -f "$1" ]       # Bank check if file exists. then   repeat "File \"$1\" does not exist."   leave $E_NOFILE fi    ######################################################## # principal () sed -eastward 's/\.//g'  -e 's/\,//chiliad' -east 'south/ /\ /yard' "$one" | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr #                           ========================= #                            Frequency of occurrence  #  Filter out periods and commas, and #+ alter infinite betwixt words to linefeed, #+ then shift characters to lowercase, and #+ finally prefix occurrence count and sort numerically. ########################################################  # Exercises: # --------- # 1) Add 'sed' commands to filter out other punctuation, #+   such equally semicolons. # 2) Modify to also filter out multiple spaces and other whitespace. # 3) Add a secondary sort primal, so that instances of equal occurrence #+   are sorted alphabetically.  exit 0

                  bash$                                                        cat testfile                                    This line occurs just one time.  This line occurs twice.  This line occurs twice.  This line occurs three times.  This line occurs three times.  This line occurs three times.                  bash$                                                        ./wf.sh testfile                                                        6 this        vi occurs        vi line        3 times        3 iii        2 twice        1 only        1 once

expand, unexpand

The expand filter converts tabs to spaces. Information technology is often used in a pipage.

The unexpand filter converts spaces to tabs. This reverses the effect of expand.

cut

A tool for extracting fields from files. Information technology is similar to the print $N command prepare in awk, just more than limited. It may be simpler to use cut in a script than awk. Especially important are the -d (delimiter) and -f (field specifier) options.

Using cut to obtain a list of the mounted filesystems:

true cat /etc/mtab | cut -d ' ' -f1,2

Using cutting to list the OS and kernel version:

uname -a | cut -d" " -f1,3,11,12

Using cut to extract message headers from an electronic mail binder:

                  fustigate$                                                        grep '^Field of study:' read-messages | cutting -c10-80                                    Re: Linux suitable for mission-critical apps?  Brand MILLIONS WORKING AT HOME!!!  Spam complaint  Re: Spam complaint

Using cut to parse a file:

# List all the users in /etc/passwd.  FILENAME=/etc/passwd  for user in $(cut -d: -f1 $FILENAME) exercise   echo $user washed  # Thanks, Oleg Philon for suggesting this.

cut -d ' ' -f2,3 filename is equivalent to awk -F'[ ]' '{ print $ii, $3 }' filename

Come across besides Example 12-37.

paste

Tool for merging together unlike files into a single, multi-cavalcade file. In combination with cut, useful for creating system log files.

join

Consider this a special-purpose cousin of paste. This powerful utility allows merging two files in a meaningful fashion, which essentially creates a simple version of a relational database.

The join command operates on exactly ii files, just pastes together only those lines with a common tagged field (usually a numerical label), and writes the upshot to stdout. The files to be joined should be sorted co-ordinate to the tagged field for the matchups to work properly.

File: i.data  100 Shoes 200 Laces 300 Socks

File: 2.information  100 $40.00 200 $1.00 300 $2.00

                  bash$                                                        join 1.information 2.data                                    File: 1.data ii.information   100 Shoes $twoscore.00  200 Laces $1.00  300 Socks $two.00

The tagged field appears but once in the output.

head

lists the beginning of a file to stdout (the default is 10 lines, simply this can be changed). It has a number of interesting options.

Example 12-12. Which files are scripts?

#!/bin/bash # script-detector.sh: Detects scripts within a directory.  TESTCHARS=2    # Test start 2 characters. SHABANG='#!'   # Scripts begin with a "sha-bang."  for file in *  # Traverse all the files in current directory. do   if [[ `head -c$TESTCHARS "$file"` = "$SHABANG" ]]   #      head -c2                      #!   #  The '-c' option to "caput" outputs a specified   #+ number of characters, rather than lines (the default).   then     repeat "File \"$file\" is a script."   else     echo "File \"$file\" is *not* a script."   fi done    exit 0

Case 12-13. Generating 10-digit random numbers

#!/bin/bash # rnd.sh: Outputs a 10-digit random number  # Script by Stephane Chazelas.  head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p'   # =================================================================== #  # Analysis # --------  # head: # -c4 option takes beginning four bytes.  # od: # -N4 option limits output to 4 bytes. # -tu4 pick selects unsigned decimal format for output.  # sed:  # -northward option, in combination with "p" flag to the "s" command, # outputs simply matched lines.    # The author of this script explains the activeness of 'sed', as follows.  # head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p' # ----------------------------------> |  # Presume output upward to "sed" --------> | # is 0000000 1198195154\due north  # sed begins reading characters: 0000000 1198195154\n. # Here it finds a newline character, # and so it is ready to process the outset line (0000000 1198195154). # It looks at its <range><action>southward. The commencement and simply 1 is  #   range     action #   ane         s/.* //p  # The line number is in the range, and then it executes the activeness: # tries to substitute the longest string ending with a infinite in the line # ("0000000 ") with nothing (//), and if it succeeds, prints the result # ("p" is a flag to the "s" control here, this is unlike from the "p" command).  # sed is now ready to continue reading its input. (Note that before # continuing, if -north option had non been passed, sed would accept printed # the line once over again).  # Now, sed reads the remainder of the characters, and finds the end of the file. # It is now prepare to procedure its 2nd line (which is also numbered '$' as # it'southward the final ane). # It sees it is non matched past any <range>, so its job is done.  # In few word this sed commmand ways: # "On the first line only, remove any character up to the right-about space, # then print information technology."  # A ameliorate way to practise this would have been: #           sed -e 's/.* //;q'  # Hither, two <range><activeness>s (could have been written #           sed -e 'south/.* //' -due east q):  #   range                    action #   nil (matches line)   southward/.* // #   cipher (matches line)   q (quit)  # Here, sed but reads its kickoff line of input. # It performs both actions, and prints the line (substituted) before quitting # (considering of the "q" action) since the "-northward" option is not passed.  # =================================================================== #  # A simpler altenative to the in a higher place 1-line script would be: #           head -c4 /dev/urandom| od -An -tu4  exit 0

Run across too Example 12-33.

tail

lists the end of a file to stdout (the default is 10 lines). Commonly used to proceed rail of changes to a organization logfile, using the -f selection, which outputs lines appended to the file.

Case 12-14. Using tail to monitor the system log

#!/bin/fustigate  filename=sys.log  true cat /dev/null > $filename; echo "Creating / cleaning out file." #  Creates file if it does not already exist, #+ and truncates it to zero length if it does. #  : > filename   and   > filename likewise work.  tail /var/log/messages > $filename   # /var/log/messages must accept world read permission for this to piece of work.  echo "$filename contains tail end of organization log."  exit 0

See likewise Instance 12-5, Case 12-33 and Instance xxx-6.

grep

A multi-purpose file search tool that uses Regular Expressions. It was originally a command/filter in the venerable ed line editor: chiliad/re/p -- global - regular expression - print.

grep pattern [ file ...]

Search the target file(s) for occurrences of pattern , where pattern may be literal text or a Regular Expression.

                  bash$                                                        grep '[rst]ystem.$' osinfo.txt                                    The GPL governs the distribution of the Linux operating system.

If no target file(s) specified, grep works as a filter on stdout, every bit in a pipe.

                  bash$                                                        ps ax | grep clock                                    765 tty1     South      0:00 xclock  901 pts/i    Southward      0:00 grep clock

The -i option causes a case-insensitive search.

The -west pick matches just whole words.

The -fifty option lists but the files in which matches were constitute, merely non the matching lines.

The -r (recursive) option searches files in the current working directory and all subdirectories below it.

The -due north choice lists the matching lines, together with line numbers.

                  bash$                                                        grep -north Linux osinfo.txt                                    2:This is a file containing information nigh Linux.  6:The GPL governs the distribution of the Linux operating system.

The -5 (or --invert-match) choice filters out matches.

grep pattern1 *.txt | grep -5 pattern2  # Matches all lines in "*.txt" files containing "pattern1", # just ***not*** "pattern2".

The -c (--count) option gives a numerical count of matches, rather than really listing the matches.

grep -c txt *.sgml   # (number of occurrences of "txt" in "*.sgml" files)   #   grep -cz . #            ^ dot # ways count (-c) cipher-separated (-z) items matching "." # that is, non-empty ones (containing at least 1 character). #  printf 'a b\nc  d\northward\n\due north\n\north\000\n\000e\000\000\nf' | grep -cz .     # four printf 'a b\nc  d\n\n\n\north\n\000\n\000e\000\000\nf' | grep -cz '$'   # 5 printf 'a b\nc  d\n\n\n\northward\due north\000\n\000e\000\000\nf' | grep -cz '^'   # 5 # printf 'a b\nc  d\n\northward\northward\n\north\000\due north\000e\000\000\nf' | grep -c '$'    # 9 # Past default, newline chars (\n) separate items to lucifer.   # Note that the -z selection is GNU "grep" specific.   # Thanks, S.C.

When invoked with more than ane target file given, grep specifies which file contains matches.

                  bash$                                                        grep Linux osinfo.txt misc.txt                                    osinfo.txt:This is a file containing information nearly Linux.  osinfo.txt:The GPL governs the distribution of the Linux operating organisation.  misc.txt:The Linux operating system is steadily gaining in popularity.

To force grep to show the filename when searching only ane target file, simply give /dev/null as the second file.

                            bash$                                                                                      grep Linux osinfo.txt /dev/null                                                        osinfo.txt:This is a file containing data about Linux.  osinfo.txt:The GPL governs the distribution of the Linux operating system.

If there is a successful match, grep returns an exit status of 0, which makes it useful in a condition test in a script, specially in combination with the -q choice to suppress output.

SUCCESS=0                      # if grep lookup succeeds give-and-take=Linux filename=data.file  grep -q "$word" "$filename"    # The "-q" pick causes nothing to repeat to stdout.  if [ $? -eq $SUCCESS ] # if grep -q "$word" "$filename"   can supersede lines 5 - 7. and then   repeat "$give-and-take found in $filename" else   repeat "$word not found in $filename" fi

Example xxx-half dozen demonstrates how to use grep to search for a word blueprint in a organization logfile.

Example 12-15. Emulating "grep" in a script

#!/bin/bash # grp.sh: Very rough reimplementation of 'grep'.  E_BADARGS=65  if [ -z "$ane" ]    # Check for statement to script. and so   echo "Usage: `basename $0` pattern"   exit $E_BADARGS fi    echo  for file in *     # Traverse all files in $PWD. do   output=$(sed -n /"$1"/p $file)  # Control substitution.    if [ ! -z "$output" ]           # What happens if "$output" is not quoted?   and so     echo -n "$file: "     repeat $output   fi              #  sed -ne "/$1/s|^|${file}: |p"  is equivalent to above.    echo done    echo  go out 0  # Exercises: # --------- # one) Add together newlines to output, if more than one match in whatever given file. # two) Add together features.

egrep (extended grep) is the same as grep -East. This uses a somewhat different, extended set of Regular Expressions, which can brand the search a bit more than flexible.

fgrep (fast grep) is the aforementioned equally grep -F. It does a literal string search (no regular expressions), which allegedly speeds things up a bit.

agrep (approximate grep) extends the capabilities of grep to approximate matching. The search string may differ by a specified number of characters from the resulting matches. This utility is not part of the cadre Linux distribution.

To search compressed files, utilise zgrep, zegrep, or zfgrep. These also work on non-compressed files, though slower than plain grep, egrep, fgrep. They are handy for searching through a mixed prepare of files, some compressed, some not.

To search bzipped files, utilise bzgrep.

await

The command await works like grep, just does a lookup on a "dictionary", a sorted give-and-take list. By default, await searches for a friction match in /usr/dict/words, simply a dissimilar dictionary file may be specified.

Case 12-16. Checking words in a list for validity

#!/bin/fustigate # lookup: Does a lexicon lookup on each discussion in a data file.  file=words.data  # Data file from which to read words to test.  echo  while [ "$discussion" != end ]  # Last word in information file. do   read give-and-take      # From data file, considering of redirection at end of loop.   look $word > /dev/nix  # Don't want to display lines in dictionary file.   lookup=$?      # Exit status of 'look' control.    if [ "$lookup" -eq 0 ]   so     echo "\"$discussion\" is valid."   else     repeat "\"$word\" is invalid."   fi    done <"$file"    # Redirects stdin to $file, and then "reads" come up from at that place.  echo  exit 0  # ---------------------------------------------------------------- # Code beneath line will not execute because of "exit" command above.   # Stephane Chazelas proposes the following, more than concise alternative:  while read word && [[ $word != end ]] practice if look "$word" > /dev/null    then repeat "\"$give-and-take\" is valid."    else echo "\"$give-and-take\" is invalid."    fi done <"$file"  exit 0

sed, awk

Scripting languages especially suited for parsing text files and command output. May be embedded singly or in combination in pipes and shell scripts.

sed

Non-interactive "stream editor", permits using many ex commands in batch style. It finds many uses in shell scripts.

awk

Programmable file extractor and formatter, good for manipulating and/or extracting fields (columns) in structured text files. Its syntax is similar to C.

wc gives a "word count" on a file or I/O stream:

                  bash $                                                        wc /usr/doc/sed-three.02/README                                    20     127     838 /usr/doc/sed-3.02/README                  [20 lines  127 words  838 characters]

wc -w gives only the word count.

wc -l gives just the line count.

wc -c gives simply the character count.

wc -L gives but the length of the longest line.

Using wc to count how many .txt files are in current working directory:

$ ls *.txt | wc -l # Volition work equally long every bit none of the "*.txt" files have a linefeed in their name.  # Alternative means of doing this are: #      find . -maxdepth 1 -proper noun \*.txt -print0 | grep -cz . #      (shopt -south nullglob; prepare -- *.txt; echo $#)  # Thanks, Southward.C.

Using wc to full up the size of all the files whose names begin with letters in the range d - h

                  bash$                                                        wc [d-h]* | grep total | awk '{print $3}'                                    71832

Using wc to count the instances of the word "Linux" in the principal source file for this volume.

                  fustigate$                                                        grep Linux abs-book.sgml | wc -l                                    50

Run into too Example 12-33 and Example 16-seven.

Certain commands include some of the functionality of wc as options.

... | grep foo | wc -l # This frequently used construct can exist more concisely rendered.  ... | grep -c foo # Simply apply the "-c" (or "--count") pick of grep.  # Cheers, S.C.

character translation filter.

Must use quoting and/or brackets, equally appropriate. Quotes prevent the shell from reinterpreting the special characters in tr command sequences. Brackets should exist quoted to forestall expansion by the beat.

Either tr "A-Z" "*" <filename or tr A-Z \* <filename changes all the uppercase letters in filename to asterisks (writes to stdout). On some systems this may not work, merely tr A-Z '[**]' will.

The -d option deletes a range of characters.

echo "abcdef"                 # abcdef echo "abcdef" | tr -d b-d     # aef   tr -d 0-9 <filename # Deletes all digits from the file "filename".

The --squeeze-repeats (or -s) selection deletes all but the first instance of a string of consecutive characters. This option is useful for removing excess whitespace.

                  bash$                                                        echo "XXXXX" | tr --squeeze-repeats 'X'                                    10

The -c "complement" option inverts the graphic symbol ready to match. With this option, tr acts only upon those characters not matching the specified set.

                  bash$                                                        echo "acfdeb123" | tr -c b-d +                                    +c+d+b++++

Note that tr recognizes POSIX grapheme classes. [1]

                  bash$                                                        echo "abcd2ef1" | tr '[:blastoff:]' -                                    ----two--ane

Example 12-17. toupper: Transforms a file to all capital.

#!/bin/bash # Changes a file to all uppercase.  E_BADARGS=65  if [ -z "$one" ]  # Standard check for command line arg. and so   echo "Usage: `basename $0` filename"   exit $E_BADARGS fi    tr a-z A-Z <"$i"  # Same issue as above, but using POSIX graphic symbol set up notation: #        tr '[:lower:]' '[:upper:]' <"$1" # Thanks, Southward.C.  exit 0

Instance 12-18. lowercase: Changes all filenames in working directory to lowercase.

#! /bin/bash # # Changes every filename in working directory to all lowercase. # # Inspired by a script of John Dubois, # which was translated into into Bash by Chet Ramey, # and considerably simplified past Mendel Cooper, writer of this document.   for filename in *                # Traverse all files in directory. do    fname=`basename $filename`    n=`repeat $fname | tr A-Z a-z`  # Change name to lowercase.    if [ "$fname" != "$n" ]       # Rename only files non already lowercase.    so      mv $fname $north    fi   done     exit 0   # Code below this line volition not execute because of "exit". #--------------------------------------------------------# # To run it, delete script above line.  # The above script volition not work on filenames containing blanks or newlines.  # Stephane Chazelas therefore suggests the following alternative:   for filename in *    # Not necessary to use basename,                      # since "*" won't return any file containing "/". do due north=`repeat "$filename/" | tr '[:upper:]' '[:lower:]'` #                             POSIX char set note. #                    Slash added then that abaft newlines are not #                    removed past command substitution.    # Variable substitution:    north=${northward%/}          # Removes trailing slash, added above, from filename.    [[ $filename == $northward ]] || mv "$filename" "$n"                      # Checks if filename already lowercase. washed  exit 0

Case 12-xix. Du: DOS to Unix text file conversion.

#!/bin/bash # Du.sh: DOS to UNIX text file converter.  E_WRONGARGS=65  if [ -z "$i" ] then   echo "Usage: `basename $0` filename-to-convert"   exit $E_WRONGARGS fi  NEWFILENAME=$i.unx  CR='\015'  # Wagon return.            # 015 is octal ASCII code for CR.            # Lines in a DOS text file end in a CR-LF.  tr -d $CR < $i > $NEWFILENAME # Delete CR's and write to new file.  echo "Original DOS text file is \"$ane\"." echo "Converted UNIX text file is \"$NEWFILENAME\"."  exit 0  # Exercise: # -------- # Alter the above script to catechumen from UNIX to DOS.

Example 12-20. rot13: rot13, ultra-weak encryption.

#!/bin/fustigate # rot13.sh: Archetype rot13 algorithm, #           encryption that might fool a 3-yr onetime.  # Usage: ./rot13.sh filename # or     ./rot13.sh <filename # or     ./rot13.sh and supply keyboard input (stdin)  cat "$@" | tr 'a-zA-Z' 'n-za-mN-ZA-K'   # "a" goes to "n", "b" to "o", etc. #  The 'true cat "$@"' structure #+ permits getting input either from stdin or from files.  get out 0

Example 12-21. Generating "Crypto-Quote" Puzzles

#!/bin/bash # crypto-quote.sh: Encrypt quotes  # Will encrypt famous quotes in a simple monoalphabetic exchange. #  The issue is similar to the "Crypto Quote" puzzles #+ seen in the Op Ed pages of the Sunday newspaper.   central=ETAOINSHRDLUBCFGJMQPVWZYXK # The "central" is nothing more a scrambled alphabet. # Changing the "key" changes the encryption.  # The 'cat "$@"' structure gets input either from stdin or from files. # If using stdin, terminate input with a Control-D. # Otherwise, specify filename as control-line parameter.  cat "$@" | tr "a-z" "A-Z" | tr "A-Z" "$key" #        |  to capital letter  |     encrypt        # Volition work on lowercase, uppercase, or mixed-instance quotes. # Passes non-alphabetic characters through unchanged.   # Attempt this script with something similar # "Nothing so needs reforming as other people's habits." # --Mark Twain # # Output is: # "CFPHRCS QF CIIOQ MINFMBRCS EQ FPHIM GIFGUI'Q HETRPQ." # --BEML PZERC  # To contrary the encryption: # cat "$@" | tr "$key" "A-Z"   #  This uncomplicated-minded cipher can be broken by an average 12-year old #+ using just pencil and newspaper.  leave 0

fold

A filter that wraps lines of input to a specified width. This is especially useful with the -s option, which breaks lines at discussion spaces (see Example 12-22 and Case A-2).

fmt

Simple-minded file formatter, used equally a filter in a pipe to "wrap" long lines of text output.

Case 12-22. Formatted file listing.

#!/bin/bash  WIDTH=40                    # 40 columns broad.  b=`ls /usr/local/bin`       # Get a file listing...  echo $b | fmt -w $WIDTH  # Could besides accept been done past #    echo $b | fold - -s -w $WIDTH   exit 0

Run across also Example 12-5.

col

This deceptively named filter removes opposite line feeds from an input stream. It likewise attempts to replace whitespace with equivalent tabs. The principal use of col is in filtering the output from sure text processing utilities, such as groff and tbl.

column

Cavalcade formatter. This filter transforms list-blazon text output into a "pretty-printed" table by inserting tabs at appropriate places.

Example 12-23. Using column to format a directory listing

#!/bin/bash # This is a slight modification of the example file in the "cavalcade" man page.   (printf "PERMISSIONS LINKS Possessor Grouping SIZE MONTH DAY HH:MM PROG-Proper noun\n" \ ; ls -l | sed 1d) | column -t  #  The "sed 1d" in the pipe deletes the first line of output, #+ which would be "total        N", #+ where "N" is the full number of files found by "ls -l".  # The -t option to "column" pretty-prints a tabular array.  leave 0

colrm

Column removal filter. This removes columns (characters) from a file and writes the file, lacking the range of specified columns, back to stdout. colrm 2 4 <filename removes the second through fourth characters from each line of the text file filename.

If the file contains tabs or nonprintable characters, this may cause unpredictable behavior. In such cases, consider using expand and unexpand in a pipe preceding colrm.

Line numbering filter. nl filename lists filename to stdout, but inserts consecutive numbers at the beginning of each not-bare line. If filename omitted, operates on stdin.

The output of nl is very similar to true cat -n , all the same, past default nl does not list bare lines.

Example 12-24. nl: A self-numbering script.

#!/bin/bash  # This script echoes itself twice to stdout with its lines numbered.  # 'nl' sees this as line iii since it does non number bare lines. # 'cat -n' sees the above line equally number five.  nl `basename $0`  echo; echo  # Now, allow's attempt information technology with 'cat -n'  true cat -northward `basename $0` # The deviation is that 'cat -n' numbers the bare lines. # Notation that 'nl -ba' volition also exercise and then.  leave 0 # -----------------------------------------------------------------

Print formatting filter. This will paginate files (or stdout) into sections suitable for hard copy printing or viewing on screen. Various options permit row and column manipulation, joining lines, setting margins, numbering lines, adding page headers, and merging files, among other things. The pr command combines much of the functionality of nl, paste, fold, column, and expand.

pr -o 5 --width=65 fileZZZ | more than gives a squeamish paginated listing to screen of fileZZZ with margins set at five and 65.

A particularly useful option is -d, forcing double-spacing (aforementioned effect equally sed -G).

gettext

The GNU gettext package is a set of utilities for localizing and translating the text output of programs into strange languages. While originally intended for C programs, it now supports quite a number of programming and scripting languages.

The gettext program works on shell scripts. Run into the info page .

msgfmt

A program for generating binary message catalogs. It is used for localization.

iconv

A utility for converting file(s) to a different encoding (character fix). Its chief apply is for localization.

recode

Consider this a fancier version of iconv, above. This very versatile utility for converting a file to a different encoding is not part of the standard Linux installation.

TeX, gs

TeX and Postscript are text markup languages used for preparing copy for printing or formatted video brandish.

TeX is Donald Knuth's elaborate typsetting system. It is often convenient to write a shell script encapsulating all the options and arguments passed to 1 of these markup languages.

Ghostscript (gs) is a GPL-ed Postscript interpreter.

groff, tbl, eqn

Yet some other text markup and display formatting language is groff. This is the enhanced GNU version of the venerable Unix roff/troff display and typesetting packet. Manpages use groff (run into Example A-1).

The tbl table processing utility is considered part of groff, as its office is to convert tabular array markup into groff commands.

The eqn equation processing utility is besides role of groff, and its function is to catechumen equation markup into groff commands.

lex, yacc

The lex lexical analyzer produces programs for pattern matching. This has been replaced by the nonproprietary flex on Linux systems.

The yacc utility creates a parser based on a set of specifications. This has been replaced by the nonproprietary bison on Linux systems.