| [Top] | [Contents] | [Index] | [ ? ] |
This manual documents version 2.0 of the GNU text utilities.
1. Introduction Caveats, overview, and authors. 2. Common options 3. Output of entire files cat tac nl od 4. Formatting file contents fmt pr fold 5. Output of parts of files head tail split csplit 6. Summarizing files wc sum cksum md5sum 7. Operating on sorted files sort uniq comm ptx tsort 8. Operating on fields within a line cut paste join 9. Operating on characters tr expand unexpand 10. Opening the software toolbox The software tools philosophy. Index General index.
-- The Detailed Node Listing ---
Output of entire files
3.1 cat: Concatenate and write filesConcatenate and write files. 3.2 tac: Concatenate and write files in reverseConcatenate and write files in reverse. 3.3 nl: Number lines and write filesNumber lines and write files. 3.4 od: Write files in octal or other formatsWrite files in octal or other formats.
Formatting file contents
4.1 fmt: Reformat paragraph textReformat paragraph text. 4.2 pr: Paginate or columnate files for printingPaginate or columnate files for printing. 4.3 fold: Wrap input lines to fit in specified widthWrap input lines to fit in specified width.
Output of parts of files
5.1 head: Output the first part of filesOutput the first part of files. 5.2 tail: Output the last part of filesOutput the last part of files. 5.3 split: Split a file into fixed-size piecesSplit a file into fixed-size pieces. 5.4 csplit: Split a file into context-determined piecesSplit a file into context-determined pieces.
Summarizing files
6.1 wc: Print byte, word, and line countsPrint byte, word, and line counts. 6.2 sum: Print checksum and block countsPrint checksum and block counts. 6.3 cksum: Print CRC checksum and byte countsPrint CRC checksum and byte counts. 6.4 md5sum: Print or check message-digestsPrint or check message-digests.
Operating on sorted files
7.1 sort: Sort text filesSort text files. 7.2 uniq: Uniquify filesUniquify files. 7.3 comm: Compare two sorted files line by lineCompare two sorted files line by line. 7.5 ptx: Produce permuted indexesProduce a permuted index of file contents. 7.4 tsort: Topological sortTopological sort.
ptx: Produce permuted indexes
7.5.1 General options Options which affect general program behavior. 7.5.2 Charset selection Underlying character set considerations. 7.5.3 Word selection and input processing Input fields, contexts, and keyword selection. 7.5.4 Output formatting Types of output format, and sizing the fields. 7.5.5 The GNU extensions to ptx
Operating on fields within a line
8.1 cut: Print selected parts of linesPrint selected parts of lines. 8.2 paste: Merge lines of filesMerge lines of files. 8.3 join: Join lines on a common fieldJoin lines on a common field.
Operating on characters
9.1 tr: Translate, squeeze, and/or delete charactersTranslate, squeeze, and/or delete characters. 9.2 expand: Convert tabs to spacesConvert tabs to spaces. 9.3 unexpand: Convert spaces to tabsConvert spaces to tabs.
tr: Translate, squeeze, and/or delete characters
9.1.1 Specifying sets of characters 9.1.2 Translating Changing one characters to another. 9.1.3 Squeezing repeats and deleting 9.1.4 Warning messages
Opening the software toolbox
Toolbox introduction I/O redirection The whocommandThe cutcommandThe sortcommandThe uniqcommandPutting the tools together
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This manual is incomplete: No attempt is made to explain basic concepts in a way suitable for novices. Thus, if you are interested, please get involved in improving this manual. The entire GNU community will benefit.
The GNU text utilities are mostly compatible with the POSIX.2 standard.
Please report bugs to bug-textutils@gnu.org. Remember to include the version number, machine architecture, input files, and any other information needed to reproduce the bug: your input, what you expected, what you got, and why it is wrong. Diffs are welcome, but please include a description of the problem as well, since this is sometimes difficult to infer. See section `Bugs' in GNU CC.
This manual was originally derived from the Unix man pages in the
distribution, which were written by David MacKenzie and updated by Jim
Meyering. What you are reading now is the authoritative documentation
for these utilities; the man pages are no longer being maintained.
The original fmt man page was written by Ross Paterson.
François Pinard did the initial conversion to Texinfo format.
Karl Berry did the indexing, some reorganization, and editing of the results.
Richard Stallman contributed his usual invaluable insights to the
overall process.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Certain options are available in all these programs. Rather than writing identical descriptions for each of the programs, they are described here. (In fact, every GNU program accepts (or should accept) these options.)
A few of these programs take arbitrary strings as arguments. In those cases, `--help' and `--version' are taken as these options only if there is one and exactly one command line argument.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These commands read and write entire files, possibly transforming them in some way.
3.1 cat: Concatenate and write filesConcatenate and write files. 3.2 tac: Concatenate and write files in reverseConcatenate and write files in reverse. 3.3 nl: Number lines and write filesNumber lines and write files. 3.4 od: Write files in octal or other formatsWrite files in octal or other formats.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
cat: Concatenate and write files
cat copies each file (`-' means standard input), or
standard input if none are given, to standard output. Synopsis:
cat [option] [file]... |
The program accepts the following options. Also see 2. Common options.
cat on MS-DOS/MS-Windows uses
binary mode only when standard output is redirected to a file or a pipe;
this option overrides that. Binary file I/O is used so that the files
retain their format (Unix text as opposed to DOS text and binary),
because cat is frequently used as a file-copying program. Some
options (see below) cause cat read and write files in text mode
because then the original file contents aren't important (e.g., when
lines are numbered by cat, or when line endings should be
marked). This is so these options work as DOS/Windows users would
expect; for example, DOS-style text files have their lines end with
the CR-LF pair of characters which won't be processed as an empty line
by `-b' unless the file is read in text mode.
cat to read and write files in
text mode.
cat to read and write files in
text mode.
cat to read and write files in text mode.
cat to read and write
files in text mode.
cat to
read files and standard input in DOS binary mode, so the CR
characters at the end of each line are also visible.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
tac: Concatenate and write files in reverse
tac copies each file (`-' means standard input), or
standard input if none are given, to standard output, reversing the
records (lines by default) in each separately. Synopsis:
tac [option]... [file]... |
Records are separated by instances of a string (newline by default). By default, this separator string is attached to the end of the record that it follows in the file.
The program accepts the following options. Also see 2. Common options.
tac
on MS-DOS/MS-Windows should note that, since tac reads files in
binary mode, each line of a text file might end with a CR/LF pair
instead of the Unix-style LF.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
nl: Number lines and write files
nl writes each file (`-' means standard input), or
standard input if none are given, to standard output, with line numbers
added to some or all of the lines. Synopsis:
nl [option]... [file]... |
nl decomposes its input into (logical) pages; by default, the
line number is reset to 1 at the top of each logical page. nl
treats all of the input files as a single document; it does not reset
line numbers or logical pages between files.
A logical page consists of three sections: header, body, and footer. Any of the sections can be empty. Each can be numbered in a different style from the others.
The beginnings of the sections of logical pages are indicated in the input file by a line containing exactly one of these delimiter strings:
The two characters from which these strings are made can be changed from `\' and `:' via options (see below), but the pattern and length of each string cannot be changed.
A section delimiter is replaced by an empty line on output. Any text
that comes before the first section delimiter string in the input file
is considered to be part of a body section, so nl treats a
file that contains no section delimiters as a single body section.
The program accepts the following options. Also see 2. Common options.
rn):
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
od: Write files in octal or other formats
od writes an unambiguous representation of each file
(`-' means standard input), or standard input if none are given.
Synopsis:
od [option]... [file]... od -C [file] [[+]offset [[+]label]] |
Each line of output consists of the offset in the input, followed by
groups of data from the file. By default, od prints the offset in
octal, and each group of file data is two bytes of input printed as a
single octal number.
The program accepts the following options. Also see 2. Common options.
The default is octal.
bytes are interpreted as for the `-j' option.
od writes one copy
of each output line using each of the data types that you specified,
in the order that you specified.
Adding a trailing "z" to any type specification appends a display of the ASCII character representation of the printable characters to the output line generated by the type specification.
The type a outputs things like `sp' for space, `nl' for
newline, and `nul' for a null (zero) byte. Type c outputs
` ', `\n', and \0, respectively.
Except for types `a' and `c', you can specify the number of bytes to use in interpreting each number in the given data type by following the type indicator character with a decimal integer. Alternately, you can specify the size of one of the C compiler's built-in data types by following the type indicator character with one of the following characters. For integers (`d', `o', `u', `x'):
For floating point (f):
od outputs only
the first line, and puts just an asterisk on the following line to
indicate the elision.
n input bytes per output line. This must be a multiple of
the least common multiple of the sizes associated with the specified
output types. If n is omitted, the default is 32. If this option
is not given at all, the default is 16.
The next several options map the old, pre-POSIX format specification
options to the corresponding POSIX format specs. GNU od accepts
any combination of old- and new-style options. Format specification
options accumulate.
od
accepted. The following syntax:
od --traditional [file] [[+]offset[.][b] [[+]label[.][b]]] |
can be used to specify at most one file and optional arguments specifying an offset and a pseudo-start address, label. By default, offset is interpreted as an octal number specifying how many input bytes to skip before formatting and writing. The optional trailing decimal point forces the interpretation of offset as a decimal number. If no decimal is specified and the offset begins with `0x' or `0X' it is interpreted as a hexadecimal number. If there is a trailing `b', the number of bytes skipped will be offset multiplied by 512. The label argument is interpreted just like offset, but it specifies an initial pseudo-address. The pseudo-addresses are displayed in parentheses following any normal address.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These commands reformat the contents of files.
4.1 fmt: Reformat paragraph textReformat paragraph text. 4.2 pr: Paginate or columnate files for printingPaginate or columnate files for printing. 4.3 fold: Wrap input lines to fit in specified widthWrap input lines to fit in specified width.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
fmt: Reformat paragraph text
fmt fills and joins lines to produce output lines of (at most)
a given number of characters (75 by default). Synopsis:
fmt [option]... [file]... |
fmt reads from the specified file arguments (or standard
input if none are given), and writes to standard output.
By default, blank lines, spaces between words, and indentation are preserved in the output; successive input lines with different indentation are not joined; tabs are expanded on input and introduced on output.
fmt prefers breaking lines at the end of a sentence, and tries to
avoid line breaks after the first word of a sentence or before the last
word of a sentence. A sentence break is defined as either the end
of a paragraph or a word ending in any of `.?!', followed by two
spaces or end of line, ignoring any intervening parentheses or quotes.
Like TeX, fmt reads entire "paragraphs" before choosing line
breaks; the algorithm is a variant of that in "Breaking Paragraphs Into
Lines" (Donald E. Knuth and Michael F. Plass, Software--Practice
and Experience, 11 (1981), 1119--1184).
The program accepts the following options. Also see 2. Common options.
fmt
initially tries to make lines about 7% shorter than this, to give it
room to balance line lengths.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
pr: Paginate or columnate files for printing
pr writes each file (`-' means standard input), or
standard input if none are given, to standard output, paginating and
optionally outputting in multicolumn format; optionally merges all
files, printing all in parallel, one per column. Synopsis:
pr [option]... [file]... |
By default, a 5-line header is printed at each page: two blank lines; a line with the date, the filename, and the page count; and two more blank lines. A footer of five blank lines is also printed. With the `-F' option, a 3-line header is printed: the leading two blank lines are omitted; no footer is used. The default page_length in both cases is 66 lines. The default number of text lines changes from 56 (without `-F') to 63 (with `-F'). The text line of the header takes up the full page_width in the form `yyyy-mm-dd HH:MM string Page nnnn'. String is a centered header string.
Form feeds in the input cause page breaks in the output. Multiple form feeds produce empty pages.
Columns are of equal width, separated by an optional string (default is `space'). For multicolumn output, lines will always be truncated to page_width (default 72), unless you use the `-J' option. For single column output no line truncation occurs by default. Use `-W' option to truncate lines in that case.
Including version 1.22i:
Some small letter options (`-s', `-w') has been redefined with the object of a better posix compliance. The output of some further cases has been adapted to other unixes. A violation of downward compatibility has to be accepted.
Some new capital letter options (`-J', `-S', `-W') has been introduced to turn off unexpected interferences of small letter options. The `-N' option and the second argument last_page of `+FIRST_PAGE' offer more flexibility. The detailed handling of form feeds set in the input files requires `-T' option.
Capital letter options dominate small letter ones.
Some of the option-arguments (compare `-s', `-S', `-e', `-i', `-n') cannot be specified as separate arguments from the preceding option letter (already stated in the posix specification).
The program accepts the following options. Also see 2. Common options.
pr uses the default output
separator, TAB.
Without `-S' or `-J', pr uses a `space'
(same as `-S" "').
Using `-S' with no string is equivalent to `-S""'.
Note that for some of pr's options the single-letter option
character must be followed immediately by any corresponding argument;
there may not be any intervening white space.
`-S/-s' is one of them. Don't use `-S "STRING"'.
POSIX requires this.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
fold: Wrap input lines to fit in specified width
fold writes each file (`-' means standard input), or
standard input if none are given, to standard output, breaking long
lines. Synopsis:
fold [option]... [file]... |
By default, fold breaks lines wider than 80 columns. The output
is split into as many lines as necessary.
fold counts screen columns by default; thus, a tab may count more
than one column, backspace decreases the column count, and carriage
return sets the column to zero.
The program accepts the following options. Also see 2. Common options.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These commands output pieces of the input.
5.1 head: Output the first part of filesOutput the first part of files. 5.2 tail: Output the last part of filesOutput the last part of files. 5.3 split: Split a file into fixed-size piecesSplit a file into fixed-size pieces. 5.4 csplit: Split a file into context-determined piecesSplit a file into context-determined pieces.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
head: Output the first part of files
head prints the first part (10 lines by default) of each
file; it reads from standard input if no files are given or
when given a file of `-'. Synopses:
head [option]... [file]... head -number [option]... [file]... |
If more than one file is specified, head prints a
one-line header consisting of
==> file name <== |
head accepts two option formats: the new one, in which numbers
are arguments to the options (`-q -n 1'), and the old one, in which
the number precedes any option letters (`-1q').
The program accepts the following options. Also see 2. Common options.
-c, or `l' to mean count by lines,
or other option letters (`cqv').
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
tail: Output the last part of files
tail prints the last part (10 lines by default) of each
file; it reads from standard input if no files are given or
when given a file of `-'. Synopses:
tail [option]... [file]... tail -number [option]... [file]... tail +number [option]... [file]... |
If more than one file is specified, tail prints a
one-line header consisting of
==> file name <== |
GNU tail can output any amount of data (some other versions of
tail cannot). It also has no `-r' option (print in
reverse), since reversing a file is really a different job from printing
the end of a file; BSD tail (which is the one with -r) can
only reverse files that are at most as large as its buffer, which is
typically 32k. A more reliable and versatile way to reverse files is
the GNU tac command.
tail accepts two option formats: the new one, in which numbers
are arguments to the options (`-n 1'), and the old one, in which
the number precedes any option letters (`-1' or `+1').
If any option-argument is a number n starting with a `+',
tail begins printing with the nth item from the start of
each file, instead of from the end.
The program accepts the following options. Also see 2. Common options.
-c, or `l' to mean count by lines,
or other option letters (`cfqv').
tail prints a header whenever it
gets output from a different file, to indicate which file that output is
from.
There are two ways to specify how you'd like to track files with this option, but that difference is noticeable only when a followed file is removed or renamed. If you'd like to continue to track the end of a growing file even after it has been unlinked, use `--follow=descriptor'. This is the default behavior, but it is not useful if you're tracking a log file that may be rotated (removed or renamed, then reopened). In that case, use `--follow=name' to track the named file by reopening it periodically to see if it has been removed and recreated by some other program.
No matter which method you use, if the tracked file is determined to have
shrunk, tail prints a message saying the file has been truncated
and resumes tracking the end of the file from the newly-determined endpoint.
When a file is removed, tail's behavior depends on whether it is
following the name or the descriptor. When following by name, tail can
detect that a file has been removed and gives a message to that effect,
and if `--retry' has been specified it will continue checking
periodically to see if the file reappears.
When following a descriptor, tail does not detect that the file has
been unlinked or renamed and issues no message; even though the file
may no longer be accessible via its original name, it may still be
growing.
The option values `descriptor' and `name' may be specified only with the long form of the option, not with `-f'.
make and tail
like this then the tail process will stop when your build completes.
Without this option, you would have had to kill the tail -f
process yourself.
$ make >& makerr & tail --pid=$! -f makerr |
tail
may terminate long before any files stop growing or it may not
terminate until long after the real writer has terminated.
tail follows the descriptor of a file
that continues growing at a rapid pace even after it is deleted or renamed.
After detecting n consecutive size changes for a file,
open/fstat the file to determine if that file name is
still associated with the same device/inode-number pair as before.
See the output of tail --help for the default value.
open/fstat the file to determine if that file name is
still associated with the same device/inode-number pair as before.
When following a log file that is rotated this is approximately the
number of seconds between when tail prints the last pre-rotation lines
and when it prints the lines that have accumulated in the new log file.
See the output of tail --help for the default value.
This option is meaningful only when following by name.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
split: Split a file into fixed-size pieces
split creates output files containing consecutive sections of
input (standard input if none is given or input is
`-'). Synopsis:
split [option] [input [prefix]] |
By default, split puts 1000 lines of input (or whatever is
left over for the last section), into each output file.
The output files' names consist of prefix (`x' by default)
followed by a group of letters `aa', `ab', and so on, such
that concatenating the output files in sorted order by file name produces
the original input file. (If more than 676 output files are required,
split uses `zaa', `zab', etc.)
The program accepts the following options. Also see 2. Common options.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
csplit: Split a file into context-determined pieces
csplit creates zero or more output files containing sections of
input (standard input if input is `-'). Synopsis:
csplit [option]... input pattern... |
The contents of the output files are determined by the pattern arguments, as detailed below. An error occurs if a pattern argument refers to a nonexistent line of the input file (e.g., if no remaining line matches a given regular expression). After every pattern has been matched, any remaining input is copied into one last output file.
By default, csplit prints the number of bytes written to each
output file after it has been created.
The types of pattern arguments are:
The output files' names consist of a prefix (`xx' by default) followed by a suffix. By default, the suffix is an ascending sequence of two-digit decimal numbers from `00' and up to `99'. In any case, concatenating the output files in sorted order by filename produces the original input file.
By default, if csplit encounters an error or receives a hangup,
interrupt, quit, or terminate signal, it removes any output files
that it has created so far before it exits.
The program accepts the following options. Also see 2. Common options.
printf(3)-style conversion specification, possibly including
format specification flags, a field width, a precision specifications,
or all of these kinds of modifiers. The format letter must convert a
binary integer argument to readable form; thus, only `d', `i',
`u', `o', `x', and `X' conversions are allowed. The
entire suffix is given (with the current output file number) to
sprintf(3) to form the file name suffixes for each of the
individual output files in turn. If this option is used, the
`--digits' option is ignored.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These commands generate just a few numbers representing entire contents of files.
6.1 wc: Print byte, word, and line countsPrint byte, word, and line counts. 6.2 sum: Print checksum and block countsPrint checksum and block counts. 6.3 cksum: Print CRC checksum and byte countsPrint CRC checksum and byte counts. 6.4 md5sum: Print or check message-digestsPrint or check message-digests.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
wc: Print byte, word, and line counts
wc counts the number of bytes, whitespace-separated words, and
newlines in each given file, or standard input if none are given
or for a file of `-'. Synopsis:
wc [option]... [file]... |
wc prints one line of counts for each file, and if the file was
given as an argument, it prints the file name following the counts. If
more than one file is given, wc prints a final line
containing the cumulative counts, with the file name `total'. The
counts are printed in this order: newlines, words, bytes.
By default, each count is output right-justified in a 7-byte field with
one space between fields so that the numbers and file names line up nicely
in columns. However, POSIX requires that there be exactly one space
separating columns. You can make wc use the POSIX-mandated
output format by setting the POSIXLY_CORRECT environment variable.
By default, wc prints all three counts. Options can specify
that only certain counts be printed. Options do not undo others
previously given, so
wc --bytes --words |
prints both the byte counts and the word counts.
With the --max-line-length option, wc prints the length
of the longest line per file, and if there is more than one file it
prints the maximum (not the sum) of those lengths.
The program accepts the following options. Also see 2. Common options.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
sum: Print checksum and block counts
sum computes a 16-bit checksum for each given file, or
standard input if none are given or for a file of `-'. Synopsis:
sum [option]... [file]... |
sum prints the checksum for each file followed by the
number of blocks in the file (rounded up). If more than one file
is given, file names are also printed (by default). (With the
`--sysv' option, corresponding file name are printed when there is
at least one file argument.)
By default, GNU sum computes checksums using an algorithm
compatible with BSD sum and prints file sizes in units of
1024-byte blocks.
The program accepts the following options. Also see 2. Common options.
sum. Unless `-s' was also
given, it has no effect.
sum's default, and print file sizes in units of 512-byte blocks.
sum is provided for compatibility; the cksum program (see
next section) is preferable in new applications.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
cksum: Print CRC checksum and byte counts
cksum computes a cyclic redundancy check (CRC) checksum for each
given file, or standard input if none are given or for a
file of `-'. Synopsis:
cksum [option]... [file]... |
cksum prints the CRC checksum for each file along with the number
of bytes in the file, and the filename unless no arguments were given.
cksum is typically used to ensure that files
transferred by unreliable means (e.g., netnews) have not been corrupted,
by comparing the cksum output for the received files with the
cksum output for the original files (typically given in the
distribution).
The CRC algorithm is specified by the POSIX.2 standard. It is not
compatible with the BSD or System V sum algorithms (see the
previous section); it is more robust.
The only options are `--help' and `--version'. See section 2. Common options.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
md5sum: Print or check message-digests
md5sum computes a 128-bit checksum (or fingerprint or
message-digest) for each specified file.
If a file is specified as `-' or if no files are given
md5sum computes the checksum for the standard input.
md5sum can also determine whether a file and checksum are
consistent. Synopses:
md5sum [option]... [file]... md5sum [option]... --check [file] |
For each file, `md5sum' outputs the MD5 checksum, a flag indicating a binary or text input file, and the filename. If file is omitted or specified as `-', standard input is read.
The program accepts the following options. Also see 2. Common options.
md5sum is usually the output of
a prior, checksum-generating run of `md5sum'.
Each valid line of input consists of an MD5 checksum, a binary/text
flag, and then a filename.
Binary files are marked with `*', text with ` '.
For each such line, md5sum reads the named file and computes its
MD5 checksum. Then, if the computed message digest does not match the
one on the line with the filename, the file is noted as having
failed the test. Otherwise, the file passes the test.
By default, for each valid line, one line is written to standard
output indicating whether the named file passed the test.
After all checks have been performed, if there were any failures,
a warning is issued to standard error.
Use the `--status' option to inhibit that output.
If any listed file cannot be opened or read, if any valid line has
an MD5 checksum inconsistent with the associated file, or if no valid
line is found, md5sum exits with nonzero status. Otherwise,
it exits successfully.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These commands work with (or produce) sorted files.
7.1 sort: Sort text filesSort text files. 7.2 uniq: Uniquify filesUniquify files. 7.3 comm: Compare two sorted files line by lineCompare two sorted files line by line. 7.5 ptx: Produce permuted indexesProduce a permuted index of file contents. 7.4 tsort: Topological sortTopological sort.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
sort: Sort text files
sort sorts, merges, or compares all the lines from the given
files, or standard input if none are given or for a file of
`-'. By default, sort writes the results to standard
output. Synopsis:
sort [option]... [file]... |
sort has three modes of operation: sort (the default), merge,
and check for sortedness. The following options change the operation
mode:
A pair of lines is compared as follows: if any key fields have been
specified, sort compares each pair of fields, in the order
specified on the command line, according to the associated ordering
options, until a difference is found or no fields are left.
Unless otherwise specified, all comparisons use the character
collating sequence specified by the LC_COLLATE locale.
If any of the global options `Mbdfinr' are given but no key fields
are specified, sort compares the entire lines according to the
global options.
Finally, as a last resort when all keys compare equal (or if no
ordering options were specified at all), sort compares the entire
lines. The last resort comparison
honors the `-r' global option. The `-s' (stable) option
disables this last-resort comparison so that lines in which all fields
compare equal are left in their original relative order. If no fields
or global options are specified, `-s' has no effect.
GNU sort (as specified for all GNU utilities) has no limits on
input line length or restrictions on bytes allowed within lines. In
addition, if the final byte of an input file is not a newline, GNU
sort silently supplies one. A line's trailing newline is part of
the line for comparison purposes; for example, with no options in an
ASCII locale, a line starting with a tab sorts before an empty line
because tab precedes newline in the ASCII collating sequence.
Upon any error, sort exits with a status of `2'.
If the environment variable TMPDIR is set, sort uses its
value as the directory for temporary files instead of `/tmp'. The
`-T tempdir' option in turn overrides the environment
variable.
The following options affect the ordering of output lines. They may be
specified globally or as part of a specific key field. If no key
fields are specified, global options apply to comparison of entire
lines; otherwise the global options are inherited by key fields that do
not specify any special options of their own. The `-b', `-d',
`-f' and `-i' options classify characters according to
the LC_CTYPE locale.
strtod to convert
a prefix of each line to a double-precision floating point number.
This allows floating point numbers to be specified in scientific notation,
like 1.0e-34 and 10e100.
Do not report overflow, underflow, or conversion errors.
Use the following collating sequence:
Use this option only if there is no alternative; it is much slower than `-n' and it can lose information when converting to floating point.
LC_TIME locale
determines the month spellings.
LC_NUMERIC
locale specifies the radix character and thousands separator.
sort -n uses what might be considered an unconventional method
to compare strings representing floating point numbers. Rather than
first converting each string to the C double type and then
comparing those values, sort aligns the radix characters in the two
strings and compares the strings a character at a time. One benefit
of using this approach is its speed. In practice this is much more
efficient than performing the two corresponding string-to-double (or even
string-to-integer) conversions and then comparing doubles. In addition,
there is no corresponding loss of precision. Converting each string to
double before comparison would limit precision to about 16 digits
on most systems.
Neither a leading `+' nor exponential notation is recognized. To compare such strings numerically, use the `-g' option.
Other options are:
sort copies
it to a temporary file before sorting and writing the output to
output-file.
sort breaks it
into fields ` foo' and ` bar'. The field separator is
not considered to be part of either the field preceding or the field
following.
In addition, when GNU sort is invoked with exactly one argument,
options `--help' and `--version' are recognized. See section 2. Common options.
Historical (BSD and System V) implementations of sort have
differed in their interpretation of some options, particularly
`-b', `-f', and `-n'. GNU sort follows the POSIX
behavior, which is usually (but not always!) like the System V behavior.
According to POSIX, `-n' no longer implies `-b'. For
consistency, `-M' has been changed in the same way. This may
affect the meaning of character positions in field specifications in
obscure cases. The only fix is to add an explicit `-b'.
A position in a sort field specified with the `-k' or `+' option has the form `f.c', where f is the number of the field to use and c is the number of the first character from the beginning of the field (for `+pos') or from the end of the previous field (for `-pos'). If the `.c' is omitted, it is taken to be the first character in the field. If the `-b' option was specified, the `.c' part of a field specification is counted from the first nonblank character of the field (for `+pos') or from the first nonblank character following the previous field (for `-pos').
A sort key option may also have any of the option letters `Mbdfinr' appended to it, in which case the global ordering options are not used for that particular field. The `-b' option may be independently attached to either or both of the `+pos' and `-pos' parts of a field specification, and if it is inherited from the global options it will be attached to both. Keys may span multiple fields.
Here are some examples to illustrate various combinations of options. In them, the POSIX `-k' option is used to specify sort keys rather than the obsolete `+pos1-pos2' syntax.
sort -nr |
Sort alphabetically, omitting the first and second fields. This uses a single key composed of the characters beginning at the start of field three and extending to the end of each line.
sort -k3 |
sort -t : -k 2,2n -k 5.3,5.4 |
Note that if you had written `-k 2' instead of `-k 2,2' `sort' would have used all characters beginning in the second field and extending to the end of the line as the primary numeric key. For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect.
Also note that the `n' modifier was applied to the field-end specifier for the first key. It would have been equivalent to specify `-k 2n,2' or `-k 2n,2n'. All modifiers except `b' apply to the associated field, regardless of whether the modifier character is attached to the field-start and/or the field-end part of the key specifier.
sort -t : -k 5b,5 -k 3,3n /etc/passwd |
An alternative is to use the global numeric modifier `-n'.
sort -t : -n -k 5b,5 -k 3,3 /etc/passwd |
find src -type f -print0 | sort -t / -z -f | xargs -0 etags --append |
The use of `-print0', `-z', and `-0' in this case mean that pathnames that contain Line Feed characters will not get broken up by the sort operation.
Finally, to ignore both leading and trailing white space, you could have applied the `b' modifier to the field-end specifier for the first key,
sort -t : -n -k 5b,5b -k 3,3 /etc/passwd |
or by using the global `-b' modifier instead of `-n' and an explicit `n' with the second key specifier.
sort -t : -b -k 5,5 -k 3,3n /etc/passwd |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |