The standard UNIX archiving utility.
[1]
Originally a
Tape ARchiving program, it has
developed into a general purpose package that can handle
all manner of archiving with all types of destination
devices, ranging from tape drives to regular files to even
Some useful tar options:
![]() | This option will not work on magnetic tape devices. |
(compress or uncompress, depending on whether
combined with the
![]() | It may be difficult to recover data from a corrupted gzipped tar archive. When archiving important files, make multiple backups. |
Shell archiving utility. The files in a shell archive are concatenated without compression, and the resultant archive is essentially a shell script, complete with #!/bin/sh header, and containing all the necessary unarchiving commands. Shar archives still show up in Usenet newsgroups, but otherwise shar has been pretty well replaced by tar/gzip. The unshar command unpacks shar archives.
Creation and manipulation utility for archives, mainly used for binary object file libraries.
The Red Hat Package Manager, or rpm utility provides a wrapper for source or binary archives. It includes commands for installing and checking the integrity of packages, among other things.
A simple rpm -i package_name.rpm usually suffices to install a package, though there are many more options available.
![]() |
|
![]() |
|
This specialized archiving copy command (copy input and output) is rarely seen any more, having been supplanted by tar/gzip. It still has its uses, such as moving a directory tree. With an appropriate block size (for copying) specified, it can be appreciably faster than tar.
Example 15-30. Using cpio to move a directory tree
#!/bin/bash # Copying a directory tree using cpio. # Advantages of using 'cpio': # Speed of copying. It's faster than 'tar' with pipes. # Well suited for copying special files (named pipes, etc.) #+ that 'cp' may choke on. ARGS=2 E_BADARGS=65 if [ $# -ne "$ARGS" ] then echo "Usage: `basename $0` source destination" exit $E_BADARGS fi source="$1" destination="$2" ################################################################### find "$source" -depth | cpio -admvp "$destination" # ^^^^^ ^^^^^ # Read the 'find' and 'cpio' info pages to decipher these options. # The above works only relative to $PWD (current directory) . . . #+ full pathnames are specified. ################################################################### # Exercise: # -------- # Add code to check the exit status ($?) of the 'find | cpio' pipe #+ and output appropriate error messages if anything went wrong. exit $? |
This command extracts a cpio archive from an rpm one.
Example 15-31. Unpacking an rpm archive
#!/bin/bash
# de-rpm.sh: Unpack an 'rpm' archive
: ${1?"Usage: `basename $0` target-file"}
# Must specify 'rpm' archive name as an argument.
TEMPFILE=$$.cpio # Tempfile with "unique" name.
# $$ is process ID of script.
rpm2cpio < $1 > $TEMPFILE # Converts rpm archive into
#+ cpio archive.
cpio --make-directories -F $TEMPFILE -i # Unpacks cpio archive.
rm -f $TEMPFILE # Deletes cpio archive.
exit 0
# Exercise:
# Add check for whether 1) "target-file" exists and
#+ 2) it is an rpm archive.
# Hint: Parse output of 'file' command. |
The standard GNU/UNIX compression utility, replacing the inferior and proprietary compress. The corresponding decompression command is gunzip, which is the equivalent of gzip -d.
![]() | The |
The zcat filter decompresses a
gzipped file to
![]() | On some commercial UNIX systems, zcat is a synonym for uncompress -c, and will not work on gzipped files. |
See also Example 7-7.
An alternate compression utility, usually more efficient (but slower) than gzip, especially on large files. The corresponding decompression command is bunzip2.
![]() | Newer versions of tar have been patched with bzip2 support. |
This is an older, proprietary compression utility found in commercial UNIX distributions. The more efficient gzip has largely replaced it. Linux distributions generally include a compress workalike for compatibility, although gunzip can unarchive files treated with compress.
![]() | The znew command transforms compressed files into gzipped ones. |
Yet another compression (squeeze) utility, a filter that works only on sorted ASCII word lists. It uses the standard invocation syntax for a filter, sq < input-file > output-file. Fast, but not nearly as efficient as gzip. The corresponding uncompression filter is unsq, invoked like sq.
![]() | The output of sq may be piped to gzip for further compression. |
Cross-platform file archiving and compression utility compatible with DOS pkzip.exe. "Zipped" archives seem to be a more common medium of file exchange on the Internet than "tarballs."
These Linux utilities permit unpacking archives compressed with the DOS arc.exe, arj.exe, and rar.exe programs.
Highly efficient Lempel-Ziv-Markov compression. The syntax of lzma is similar to that of gzip. The 7-zip Website has more information.
A utility for identifying file types. The command
The
|
# Find sh and Bash scripts in a given directory: DIRECTORY=/usr/local/bin KEYWORD=Bourne # Bourne and Bourne-Again shell scripts file $DIRECTORY/* | fgrep $KEYWORD # Output: # /usr/local/bin/burn-cd: Bourne-Again shell script text executable # /usr/local/bin/burnit: Bourne-Again shell script text executable # /usr/local/bin/cassette.sh: Bourne shell script text executable # /usr/local/bin/copy-cd: Bourne-Again shell script text executable # . . . |
Example 15-32. Stripping comments from C program files
#!/bin/bash
# strip-comment.sh: Strips out the comments (/* COMMENT */) in a C program.
E_NOARGS=0
E_ARGERROR=66
E_WRONG_FILE_TYPE=67
if [ $# -eq "$E_NOARGS" ]
then
echo "Usage: `basename $0` C-program-file" >&2 # Error message to stderr.
exit $E_ARGERROR
fi
# Test for correct file type.
type=`file $1 | awk '{ print $2, $3, $4, $5 }'`
# "file $1" echoes file type . . .
# Then awk removes the first field, the filename . . .
# Then the result is fed into the variable "type."
correct_type="ASCII C program text"
if [ "$type" != "$correct_type" ]
then
echo
echo "This script works on C program files only."
echo
exit $E_WRONG_FILE_TYPE
fi
# Rather cryptic sed script:
#--------
sed '
/^\/\*/d
/.*\*\//d
' $1
#--------
# Easy to understand if you take several hours to learn sed fundamentals.
# Need to add one more line to the sed script to deal with
#+ case where line of code has a comment following it on same line.
# This is left as a non-trivial exercise.
# Also, the above code deletes non-comment lines with a "*/" . . .
#+ not a desirable result.
exit 0
# ----------------------------------------------------------------
# Code below this line will not execute because of 'exit 0' above.
# Stephane Chazelas suggests the following alternative:
usage() {
echo "Usage: `basename $0` C-program-file" >&2
exit 1
}
WEIRD=`echo -n -e '\377'` # or WEIRD=$'\377'
[[ $# -eq 1 ]] || usage
case `file "$1"` in
*"C program text"*) sed -e "s%/\*%${WEIRD}%g;s%\*/%${WEIRD}%g" "$1" \
| tr '\377\n' '\n\377' \
| sed -ne 'p;n' \
| tr -d '\n' | tr '\377' '\n';;
*) usage;;
esac
# This is still fooled by things like:
# printf("/*");
# or
# /* /* buggy embedded comment */
#
# To handle all special cases (comments in strings, comments in string
#+ where there is a \", \\" ...),
#+ the only way is to write a C parser (using lex or yacc perhaps?).
exit 0 |
which command gives the full path to "command." This is useful for finding out whether a particular command or utility is installed on the system.
|
For an interesting use of this command, see Example 33-14.
Similar to which, above, whereis command gives the full path to "command," but also to its manpage.
|
whatis command looks up
"command" in the
|
Example 15-33. Exploring
#!/bin/bash # What are all those mysterious binaries in /usr/X11R6/bin? DIRECTORY="/usr/X11R6/bin" # Try also "/bin", "/usr/bin", "/usr/local/bin", etc. for file in $DIRECTORY/* do whatis `basename $file` # Echoes info about the binary. done exit 0 # You may wish to redirect output of this script, like so: # ./what.sh >>whatis.db # or view it a page at a time on stdout, # ./what.sh | less |
See also Example 10-3.
Show a detailed directory listing. The effect is similar to ls -lb.
This is one of the GNU fileutils.
|
The locate command searches for files using a database stored for just that purpose. The slocate command is the secure version of locate (which may be aliased to slocate).
|
Disclose the file that a symbolic link points to.
|
Use the strings command to find
printable strings in a binary or data file. It will list
sequences of printable characters found in the target
file. This might be handy for a quick 'n dirty examination
of a core dump or for looking at an unknown graphic image
file (
Example 15-34. An "improved" strings command
#!/bin/bash
# wstrings.sh: "word-strings" (enhanced "strings" command)
#
# This script filters the output of "strings" by checking it
#+ against a standard word list file.
# This effectively eliminates gibberish and noise,
#+ and outputs only recognized words.
# ===========================================================
# Standard Check for Script Argument(s)
ARGS=1
E_BADARGS=65
E_NOFILE=66
if [ $# -ne $ARGS ]
then
echo "Usage: `basename $0` filename"
exit $E_BADARGS
fi
if [ ! -f "$1" ] # Check if file exists.
then
echo "File \"$1\" does not exist."
exit $E_NOFILE
fi
# ===========================================================
MINSTRLEN=3 # Minimum string length.
WORDFILE=/usr/share/dict/linux.words # Dictionary file.
# May specify a different word list file
#+ of one-word-per-line format.
# For example, the "yawl" word-list package,
# http://personal.riverusers.com/~thegrendel/yawl-0.3.2.tar.gz
wlist=`strings "$1" | tr A-Z a-z | tr '[:space:]' Z | \
tr -cs '[:alpha:]' Z | tr -s '\173-\377' Z | tr Z ' '`
# Translate output of 'strings' command with multiple passes of 'tr'.
# "tr A-Z a-z" converts to lowercase.
# "tr '[:space:]'" converts whitespace characters to Z's.
# "tr -cs '[:alpha:]' Z" converts non-alphabetic characters to Z's,
#+ and squeezes multiple consecutive Z's.
# "tr -s '\173-\377' Z" converts all characters past 'z' to Z's
#+ and squeezes multiple consecutive Z's,
#+ which gets rid of all the weird characters that the previous
#+ translation failed to deal with.
# Finally, "tr Z ' '" converts all those Z's to whitespace,
#+ which will be seen as word separators in the loop below.
# ****************************************************************
# Note the technique of feeding the output of 'tr' back to itself,
#+ but with different arguments and/or options on each pass.
# ****************************************************************
for word in $wlist # Important:
# $wlist must not be quoted here.
# "$wlist" does not work.
# Why not?
do
strlen=${#word} # String length.
if [ "$strlen" -lt "$MINSTRLEN" ] # Skip over short strings.
then
continue
fi
grep -Fw $word "$WORDFILE" # Match whole words only.
# ^^^ # "Fixed strings" and
#+ "whole words" options.
done
exit $? |
diff: flexible file comparison
utility. It compares the target files line-by-line
sequentially. In some applications, such as comparing
word dictionaries, it may be helpful to filter the
files through sort
and uniq before piping them
to diff.
The
There are available various fancy frontends for diff, such as sdiff, wdiff, xdiff, and mgdiff.
![]() | The diff command returns an exit status of 0 if the compared files are identical, and 1 if they differ. This permits use of diff in a test construct within a shell script (see below). |
A common use for diff is generating
difference files to be used with patch
The
patch: flexible versioning utility. Given a difference file generated by diff, patch can upgrade a previous version of a package to a newer version. It is much more convenient to distribute a relatively small "diff" file than the entire body of a newly revised package. Kernel "patches" have become the preferred method of distributing the frequent releases of the Linux kernel.
patch -p1 <patch-file # Takes all the changes listed in 'patch-file' # and applies them to the files referenced therein. # This upgrades to a newer version of the package. |
Patching the kernel:
cd /usr/src gzip -cd patchXX.gz | patch -p0 # Upgrading kernel source using 'patch'. # From the Linux kernel docs "README", # by anonymous author (Alan Cox?). |
![]() | The diff command can also recursively compare directories (for the filenames present).
|
An extended version of diff that compares three files at a time. This command returns an exit value of 0 upon successful execution, but unfortunately this gives no information about the results of the comparison.
|
The merge
(3-way file merge) command is an interesting adjunct to
diff3. Its syntax is
Compare and/or edit two files in order to merge them into an output file. Because of its interactive nature, this command would find little use in a script.
The cmp command is a simpler version of diff, above. Whereas diff reports the differences between two files, cmp merely shows at what point they differ.
![]() | Like diff, cmp returns an exit status of 0 if the compared files are identical, and 1 if they differ. This permits use in a test construct within a shell script. |
Example 15-35. Using cmp to compare two files within a script.
#!/bin/bash
ARGS=2 # Two args to script expected.
E_BADARGS=65
E_UNREADABLE=66
if [ $# -ne "$ARGS" ]
then
echo "Usage: `basename $0` file1 file2"
exit $E_BADARGS
fi
if [[ ! -r "$1" || ! -r "$2" ]]
then
echo "Both files to be compared must exist and be readable."
exit $E_UNREADABLE
fi
cmp $1 $2 &> /dev/null # /dev/null buries the output of the "cmp" command.
# cmp -s $1 $2 has same result ("-s" silent flag to "cmp")
# Thank you Anders Gustavsson for pointing this out.
#
# Also works with 'diff', i.e., diff $1 $2 &> /dev/null
if [ $? -eq 0 ] # Test exit status of "cmp" command.
then
echo "File \"$1\" is identical to file \"$2\"."
else
echo "File \"$1\" differs from file \"$2\"."
fi
exit 0 |
![]() | Use zcmp on gzipped files. |
Versatile file comparison utility. The files must be sorted for this to be useful.
comm
column 1 = lines unique to
column 2 = lines unique to
column 3 = lines common to both.
The options allow suppressing output of one or more columns.
This command is useful for comparing "dictionaries" or word lists -- sorted text files with one word per line.
Strips the path information from a file name, printing
only the file name. The construction
echo "Usage: `basename $0` arg1 arg2 ... argn" |
Strips the basename from a filename, printing only the path information.
![]() | basename and dirname can operate on any arbitrary string. The argument does not need to refer to an existing file, or even be a filename for that matter (see Example A-7). |
Example 15-36. basename and dirname
#!/bin/bash a=/home/bozo/daily-journal.txt echo "Basename of /home/bozo/daily-journal.txt = `basename $a`" echo "Dirname of /home/bozo/daily-journal.txt = `dirname $a`" echo echo "My own home is `basename ~/`." # `basename ~` also works. echo "The home of my home is `dirname ~/`." # `dirname ~` also works. exit 0 |
These are utilities for splitting a file into smaller chunks. Their usual use is for splitting up large files in order to back them up on floppies or preparatory to e-mailing or uploading them.
The csplit command splits a file according to context, the split occuring where patterns are matched.
Example 15-37. A script that copies itself in sections
#!/bin/bash
# splitcopy.sh
# A script that splits itself into chunks,
#+ then reassembles the chunks into an exact copy
#+ of the original script.
CHUNKSIZE=4 # Size of first chunk of split files.
OUTPREFIX=xx # csplit prefixes, by default,
#+ files with "xx" ...
csplit "$0" "$CHUNKSIZE"
# Some comment lines for padding . . .
# Line 15
# Line 16
# Line 17
# Line 18
# Line 19
# Line 20
cat "$OUTPREFIX"* > "$0.copy" # Concatenate the chunks.
rm "$OUTPREFIX"* # Get rid of the chunks.
exit $? |
These are utilities for generating checksums. A checksum is a number mathematically calculated from the contents of a file, for the purpose of checking its integrity. A script might refer to a list of checksums for security purposes, such as ensuring that the contents of key system files have not been altered or corrupted. For security applications, use the md5sum (message digest 5 checksum) command, or better yet, the newer sha1sum (Secure Hash Algorithm).
|
![]() | The cksum command shows the size,
in bytes, of its target, whether file or
The md5sum and
sha1sum commands display a
dash when they receive their input from
|
Example 15-38. Checking file integrity
#!/bin/bash
# file-integrity.sh: Checking whether files in a given directory
# have been tampered with.
E_DIR_NOMATCH=70
E_BAD_DBFILE=71
dbfile=File_record.md5
# Filename for storing records (database file).
set_up_database ()
{
echo ""$directory"" > "$dbfile"
# Write directory name to first line of file.
md5sum "$directory"/* >> "$dbfile"
# Append md5 checksums and filenames.
}
check_database ()
{
local n=0
local filename
local checksum
# ------------------------------------------- #
# This file check should be unnecessary,
#+ but better safe than sorry.
if [ ! -r "$dbfile" ]
then
echo "Unable to read checksum database file!"
exit $E_BAD_DBFILE
fi
# ------------------------------------------- #
while read record[n]
do
directory_checked="${record[0]}"
if [ "$directory_checked" != "$directory" ]
then
echo "Directories do not match up!"
# Tried to use file for a different directory.
exit $E_DIR_NOMATCH
fi
if [ "$n" -gt 0 ] # Not directory name.
then
filename[n]=$( echo ${record[$n]} | awk '{ print $2 }' )
# md5sum writes records backwards,
#+ checksum first, then filename.
checksum[n]=$( md5sum "${filename[n]}" )
if [ "${record[n]}" = "${checksum[n]}" ]
then
echo "${filename[n]} unchanged."
elif [ "`basename ${filename[n]}`" != "$dbfile" ]
# Skip over checksum database file,
#+ as it will change with each invocation of script.
# ---
# This unfortunately means that when running
#+ this script on $PWD, tampering with the
#+ checksum database file will not be detected.
# Exercise: Fix this.
then
echo "${filename[n]} : CHECKSUM ERROR!"
# File has been changed since last checked.
fi
fi
let "n+=1"
done <"$dbfile" # Read from checksum database file.
}
# =================================================== #
# main ()
if [ -z "$1" ]
then
directory="$PWD" # If not specified,
else #+ use current working directory.
directory="$1"
fi
clear # Clear screen.
echo " Running file integrity check on $directory"
echo
# ------------------------------------------------------------------ #
if [ ! -r "$dbfile" ] # Need to create database file?
then
echo "Setting up database file, \""$directory"/"$dbfile"\"."; echo
set_up_database
fi
# ------------------------------------------------------------------ #
check_database # Do the actual work.
echo
# You may wish to redirect the stdout of this script to a file,
#+ especially if the directory checked has many files in it.
exit 0
# For a much more thorough file integrity check,
#+ consider the "Tripwire" package,
#+ http://sourceforge.net/projects/tripwire/.
|
Also see Example A-20, Example 33-14, and Example 9-11 for creative uses of the md5sum command.
![]() | There have been reports that the 128-bit md5sum can be cracked, so the more secure 160-bit sha1sum is a welcome new addition to the checksum toolkit.
|
Security consultants have demonstrated that even sha1sum can be compromised. Fortunately, newer Linux distros include longer bit-length sha224sum, sha256sum, sha384sum, and sha512sum commands.
Securely erase a file by overwriting it multiple times with random bit patterns before deleting it. This command has the same effect as Example 15-60, but does it in a more thorough and elegant manner.
This is one of the GNU fileutils.
![]() | Advanced forensic technology may still be able to recover the contents of a file, even after application of shred. |
This utility encodes binary files (images, sound files, compressed files, etc.) into ASCII characters, making them suitable for transmission in the body of an e-mail message or in a newsgroup posting. This is especially useful where MIME (multimedia) encoding is not available.
This reverses the encoding, decoding uuencoded files back into the original binaries.
Example 15-39. Uudecoding encoded files
#!/bin/bash
# Uudecodes all uuencoded files in current working directory.
lines=35 # Allow 35 lines for the header (very generous).
for File in * # Test all the files in $PWD.
do
search1=`head -n $lines $File | grep begin | wc -w`
search2=`tail -n $lines $File | grep end | wc -w`
# Uuencoded files have a "begin" near the beginning,
#+ and an "end" near the end.
if [ "$search1" -gt 0 ]
then
if [ "$search2" -gt 0 ]
then
echo "uudecoding - $File -"
uudecode $File
fi
fi
done
# Note that running this script upon itself fools it
#+ into thinking it is a uuencoded file,
#+ because it contains both "begin" and "end".
# Exercise:
# --------
# Modify this script to check each file for a newsgroup header,
#+ and skip to next if not found.
exit 0 |
![]() | The fold -s command may be useful (possibly in a pipe) to process long uudecoded text messages downloaded from Usenet newsgroups. |
The mimencode and mmencode commands process multimedia-encoded e-mail attachments. Although mail user agents (such as pine or kmail) normally handle this automatically, these particular utilities permit manipulating such attachments manually from the command line or in batch processing mode by means of a shell script.
At one time, this was the standard UNIX file encryption utility. [3] Politically motivated government regulations prohibiting the export of encryption software resulted in the disappearance of crypt from much of the UNIX world, and it is still missing from most Linux distributions. Fortunately, programmers have come up with a number of decent alternatives to it, among them the author's very own cruft (see Example A-4).
Create a temporary file
[4]
with a "unique" filename. When invoked
from the command line without additional arguments,
it creates a zero-length file in the
|
PREFIX=filename tempfile=`mktemp $PREFIX.XXXXXX` # ^^^^^^ Need at least 6 placeholders #+ in the filename template. # If no filename template supplied, #+ "tmp.XXXXXXXXXX" is the default. echo "tempfile name = $tempfile" # tempfile name = filename.QA2ZpY # or something similar... # Creates a file of that name in the current working directory #+ with 600 file permissions. # A "umask 177" is therefore unnecessary, #+ but it's good programming practice anyhow. |
Utility for building and compiling binary packages. This can also be used for any set of operations that is triggered by incremental changes in source files.
The make command checks a
The make utility is, in effect, a powerful scripting language similar in many ways to Bash, but with the capability of recognizing dependencies. For in-depth coverage of this useful tool set, see the GNU software documentation site.
Special purpose file copying command, similar to
cp, but capable of setting permissions
and attributes of the copied files. This command seems
tailormade for installing software packages, and as such it
shows up frequently in
This utility, written by Benjamin Lin and collaborators, converts DOS-formatted text files (lines terminated by CR-LF) to UNIX format (lines terminated by LF only), and vice-versa.
The ptx [targetfile] command outputs a permuted index (cross-reference list) of the targetfile. This may be further filtered and formatted in a pipe, if necessary.
Pagers that display a text file or stream to
An interesting application of more is to "test drive" a command sequence, to forestall potentially unpleasant consequences.
ls /home/bozo | awk '{print "rm -rf " $1}' | more
# ^^^^
# Testing the effect of the following (disastrous) command line:
# ls /home/bozo | awk '{print "rm -rf " $1}' | sh
# Hand off to the shell to execute . . . ^^ |
The less pager has the interesting property of doing a formatted display of man page source. See Example A-41.
| [1] | An archive, in the sense discussed here, is simply a set of related files stored in a single location. |
| [2] | A |
| [3] | This is a symmetric block cipher, used to encrypt files on a single system or local network, as opposed to the public key cipher class, of which pgp is a well-known example. |
| [4] | Creates a temporary
directory when invoked with the
|