The following table provides a brief reference for how documents are summarized depending on their type. These actions can be customized, as discussed in Section 4.4.4. Some summarizers are implemented as UNIX programs while others are expressed as regular expressions; see Section 4.4.4 or Appendix C.4 for more information about how to write a summarizer.
Type Summarizer Function
--------------------------------------------------------------------
Audio Extract file name
Bibliographic Extract author and titles
Binary Extract meaningful strings and manual page summary
C, CHeader Extract procedure names, included file names, and comments
Dvi Invoke the Text summarizer on extracted ASCII text
FAQ, FullText, README
Extract all words in file
Framemaker Up-convert to SGML and pass through SGML summarizer
Font Extract comments
HTML Extract anchors, hypertext links, and selected fields (see SGML)
LaTex Parse selected LaTex fields (author, title, etc.)
Mail Extract certain header fields
Makefile Extract comments and target names
ManPage Extract synopsis, author, title, etc., based on ``-man'' macros
News Extract certain header fields
Object Extract symbol table
Patch Extract patched file names
Perl Extract procedure names and comments
PostScript Extract text in word processor-specific fashion, and pass
through Text summarizer.
RCS, SCCS Extract revision control summary
RTF Up-convert to SGML and pass through SGML summarizer
SGML Extract fields named in extraction table (see Section~\ref{sec:sgml})
ShellScript Extract comments
SourceDistribution
Extract full text of README file and comments from Makefile
and source code files, and summarize any manual pages
SymbolicLink Extract file name, owner, and date created
Tex Invoke the Text summarizer on extracted ASCII text
Text Extract first 100 lines plus first sentence of each
remaining paragraph
Troff Extract author, title, etc., based on ``-man'', ``-ms'',
``-me'' macro packages, or extract section headers and
topic sentences.
Unrecognized Extract file name, owner, and date created.