Library of Practical Abstractions

Version 1.3.0

Eric Sven Ristad and Peter N. Yianilos


Table of Contents


1 License, Copyright Notice, and Disclaimer

Copyright (C) 1998 by Eric Sven Ristad and Peter N. Yianilos. All Rights Reserved.

Permission to use, copy, and distribute this software and its documentation without fee for not-for-profit academic or research purposes is hereby granted, provided that the above copyright notice appears in all copies, and that both the copyright notice and this permission notice and warranty disclaimer appear in supporting documentation, and that use of this software for any purpose is explicitly acknowledged and cited in all relevant reports and publications via the following citation form:

 Eric Sven Ristad and Peter N. Yianilos.
 Library of practical abstractions, release 1.3.
 http://www.mnemonic.com/software/libpa
 February 1998.

Permission to modify the software and documentation is hereby granted, but only to the extent necessary to cause the software to function in a hardware or software environment not explicitly supported as part of the distribution, and subject to the condition that the original authors retain all legal rights to any modifications or improvements of this software.

The authors disclaim all warranties with regard to this software, including all implied warranties of merchantability and fitness. In no event shall the authors or any entities or institutions to which he/she is related be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of this software.

2 Overview

The library of practical abstractions (LIBPA) provides efficient implementations of conceptually simple abstractions, in the C programming language. We believe that the best library code is conceptually simple so that it will be easily understood by the application programmer; parameterized by type so that it enjoys wide applicability; and at least as efficient as a straightforward special-purpose implementation. You will find that our software satisfies the highest standards of software design, implementation, testing, and benchmarking.

The current LIBPA release is a source code distribution only. It consists of modules for portable memory management, one dimensional arrays of arbitrary types, compact symbol tables, hash tables for arbitrary types, a trie module for length-delimited strings over arbitrary alphabets, single precision floating point numbers with extended exponents, and logarithmic representations of probability values using either fixed or floating point numbers. Additional modules are provided to efficiently store the frequencies of symbols and fixed-length substrings.

We have used LIBPA to implement a wide range of statistical models for both continuous and discrete domains. The time and space efficiency of LIBPA has allowed us to build larger statistical models than previously possible, and to investigate more computationally-intensive techniques. We have found LIBPA to be indispensable in our own research, and hope that you will find it useful in yours. If you find LIBPA useful, please let us know!

LIBPA is not guaranteed to be backward compatible. We strive to provide the best libraries for each release, and do not hesitate to completely redesign a module to improve clarity or performance. As a result, upgrading your LIBPA installation to the latest release may require you to rewrite portions of your code. We will not change the semantics of a function without also changing the prototype for that function. This policy should help you quickly upgrade your software using the latest LIBPA release.

2.1 Installation

2.1.1 Licensing

Every module has its own license, written by the module's authors. Before you install or use a LIBPA module, you must agree to the terms of that module's license. Typically, authors will ask that you use LIBPA for not-for-profit research purposes only, and that you follow standard academic practice in acknowledging your use of LIBPA in any relevant reports or publications.

2.1.2 Getting the Source Code

The source code may be perused online from http://www.mnemonic.com/software/libpa/src and the entire distribution may be obtained from http://www.mnemonic.com/software/libpa. Please use GNU tar -zxf to unpack the `.tar.gz' file.

LIBPA releases are numbered according to the following convention. Each release has a number x.y.z. A change in the first number x indicates a major redesign of LIBPA and its central memory management modules. A change in the second number y indicates a change in functionality, either a module revision or the addition of new modules. A change in the third number z indicates support for a new operating environment, changes to the installation routines, or minor bug fixes. If you are able to install LIBPA, then you need only download the latest version when the first or second numbers change.

2.1.3 Environment Variables

Once you have ftped and unpacked the distribution, you must initialize the following environment variables.

LIBPA_GROUP
group with read/write permission in LIBPA installation
LIBPA_ARCH
current machine architecture (cpu/OS combination)
SCC
safe compiler to produce debuggable object code
SLD
safe loader to produce debuggable binaries
UCC
unsafe compiler to produce optimized object code
ULD
unsafe loader to produce optimized binaries
RANLIB
ranlib
MAKE
make or gmake

The LIBPA_GROUP environment variable should be set to the name of the group which will have read/write permission in the LIBPA installation. We source the `libpa-1.3.0/etc/cshrc' or `libpa-1.3.0/etc/bashrc' scripts on our system to set these environment variables properly. If you want to source these scripts on your system, you will need to edit them first. At the very least, you will need to change the LIBPA_GROUP value. To verify that the LIBPA environment variables are set, you may execute the libpa-1.3.0/bin/checkenv.sh shell script.

Unless it is the only compiler available, we do not recommend using gcc to compile the modules because gcc has a proprietary implementation of the assert() macro. Once you compile a module using gcc, all executables based on that module must link `libgcc.a'. This is particularly annoying if you are trying to track down a compiler bug.

2.1.4 Creating Object Code

After you have set the LIBPA environment variables, run make install from `libpa-1.3.0/src' to install the libraries, and then run make tests from `libpa-1.3.0/src' in order to verify our implementations in your operating environment. When you are satisfied in the outcome of these tests, run make clean in the same directory to delete extraneous object code files and archives in the source directories. You must run make clean if you want to install LIBPA on multiple architectures.

As explained below, we provide safe and unsafe versions of all object code. The safe versions contain unoptimized object code with symbol tables for debugging purposes. The unsafe versions contain optimized object code without symbol tables. We recommend using safe object code on a routine basis, and only using unsafe object code when performance matters. The `libpa.a' and `libpa_u.a' archives contain the safe and unsafe versions of all object code, respectively.

2.1.5 Supported Environments

All modules are supported on the following operating environments:

alpha
Digital Alpha with Digital UNIX 4.0
i586
Intel Pentium with Linux 2.0
sun5
Sun SPARC with SunOS 5.5 (Solaris 2.5)

Most modules will work in any UNIX operating environment, and should work in any environment with an ANSI-C compiler.

2.2 Mailing Lists

2.2.1 Staying Informed

We encourage you to subscribe to libpa-announce@cs.princeton.edu, to learn about upgrades and new modules as they are published. Send mail to majordomo@cs.princeton.edu with the single line:

subscribe libpa-announce

2.2.2 Getting and Giving Help

If you are having trouble installing or using LIBPA, or have other questions about the library of practical abstractions, send mail to libpa-help@cs.princeton.edu. If you are a skilled user of LIBPA, then we encourage you to subscribe to this majordomo mailing list and to help field the questions that are posted.

2.2.3 Bug Reports

We are unable to track down bugs in your code or in your operating environment (compiler, operating system). We have spent a significant amount of time and care writing the LIBPA modules, and have published the source code and test suites. So if you think you have found a bug in our code, please help us by sending a thoroughly tested bug fix along with your bug report. When you have finished testing your bug fix, we would be grateful if you would please send it to libpa-bugs@cs.princeton.edu along with a detailed explanation.

2.3 Acknowledgments

2.3.1 Affiliations

Eric Sven Ristad ristad@mnemonic.com is with Mnemonic Technology, Inc.

Peter N. Yianilos pny@research.nj.nec.com is with the NEC Research Institute, 4 Independence Way, Princeton, NJ 08540.

2.3.2 Thanks

We are grateful to David Hanson drh@microsoft.com for his feedback, and for providing us with so many well-taught students while he was with us at Princeton. Robert G. Thomas rgt@cs.princeton.edu provided invaluable input in the design and redesign and reredesign of the LIBPA modules. Kirk Kanzelberger kirk@research.nj.nec.com assisted with the initial implementation of the vector_t module in Summer 1993.

Documentation was created in GNU texinfo format, from which info and postscript formats were derived directly. html format was derived using texi2html from Lionel Cons.

3 Library Modules

Currently we publish modules for basic memory management, tables of arbitrary types, and extremal numerical values:

memory_t
efficient portable memory management
vector_t
one dimensional arrays of arbitrary types
table_t
compact symbol table for <key,datum> pairs.
trie_t
trie for mapping strings to integers
hash_table_t
hash table toolkit, to build your own high-performance hash tables.
symbol_table_t
hash table for symbols
string_table_t
hash table for NULL-terminated character strings
unigram_t
compact table of symbol frequencies with dynamic counter resizing
balanced_t
single precision floating point numbers with extended exponents
L_t
fixed point logarithmic representation of probability values
logpr_t
floating point logarithmic representation of probability values
pr_t
generic interface to probability value libraries

3.1 Memory Management

The memory_t module is the foundation of our library. All memory management is performed in a portable manner using the memory_t module, which helps identify memory leaks without using expensive and nonportable third party applications, such as purify. The memory_create() function replaces malloc(), while the memory_destroy() function replaces free(). Other functions are provided to replace the standard C memory functions.

The vector_t module is also widely used, and is equally indispensable. A vector_t is a length-delimited sequence of objects of arbitrary type. The objects are parameterized by the size of their representations, via the sizeof() primitive. The vector_t module supports dynamic resizing, concatenation, insertion, sorting, and portable input/output. Indeed, we often use vector_t where LISP programmers would use lists. Since the vector_t objects aren't boxed and are arranged linearly in memory, vector_t operations are at least twice as fast as lists and require significantly less storage than lists (eg., up to 16 times less storage for one byte objects on a machine with 8 byte pointers).

3.2 Tables

LIBPA provides three ways to store arbitrary information in tables: table_t, trie_t, and hash_table_t.

The table_t module is a compact table for arbitrary <symbol,datum> pairs. Lookups are performed in O(log n) using binary search, while insertions and deletions are O(n) for a table containing n entries. The table_t module is typically used as a building block for other modules, such as the trie_t module, where memory usage is the single most important performance desiderata.

The trie_t module maps length-delimited strings over arbitrary alphabets to consecutive unsigned integers. It also provides the inverse map, from indices to length-delimited strings.

The hash_table_t module supports the easy construction of compact hash tables for arbitrary <key,datum> pairs with open addressing, double hashing, and dynamic resizing. We used the hash_table_t module to implement two useful modules: the symbol_hash_t for symbols of arbitrary size and the string_hash_t for NULL-terminated character strings. Again, we strove to minimize the space requirements of the implementation so that many hash_table_t objects can be used in even the most memory-intensive computations.

3.3 Statistics

LIBPA also provides data structures to efficiently accumulate the statistics of arbitrary symbols and strings: unigram_t and suffix_array_t, respectively.

The unigram_t module provides a space-efficient representation for the frequencies of symbols drawn from an arbitrary alphabet. Our unigram_t implementation uses dynamic counter resizing to minimize the amount of storage required to store the symbol frequencies. Thus, it may be used to efficiently store the state transition frequencies in a very large Markov model.

The suffix_array_t module provides a compact representation for the frequencies of all fixed-length substrings of a given string. Strings are represented using only sizeof(symbol_t) + sizeof(unsigned) bytes per symbol in the strings. The data structure may be constructed in O(n log n) time and subsequent lookups require O(log n) time using a binary search.

3.4 Numerical Values

LIBPA provides three ways to represent extremely small numerical values, such as are commonly encountered in statistical modeling applications: balanced_t, L_t, and logpr_t. The pr_t module provides a generic macro interface to all three modules. The three modules offer a wide range of performance/accuracy tradeoffs. L_t is the fastest and requires the least memory. logpr_t is the most accurate but the slowest. balanced_t is nearly as accurate as logpr_t and can be significantly faster on systems with a slow `libm.a' math library, such as SunOS and HP-UX.

The balanced_t type represents single precision floating point numbers with extended 32-bit exponents. As a result, the balanced_t module supports floating point arithmetic for extremely small and extremely large numerical values -- values which would cause overflow or underflow in double precision floating point arithmetic. Depending on your machine, balanced_t arithmetic is from 10 to 20 times slower than double precision floating point arithmetic and is only slightly less accurate that single precision floating point arithmetic. Unlike the other LIBPA numerical types, the balanced_t type can represent a large range of numbers, such as very large positive numbers and negative numbers.

The L_t type represents probability values using fixed point numbers -- char, short, int, and long --- where larger types are capable of representing ever-smaller probability values. Depending on your machine, L_t arithmetic is from 1.2 to 1.8 times slower than double precision floating point arithmetic. It is also considerably less accurate. balanced_t arithmetic is more accurate than L_t arithmetic, but is more than five times slower and typically requires at least twice as much memory to represent each probability value.

Finally, we also provide a standard logpr_t type for the logarithmic representation of probability values using double-precision floating point numbers. Although this is the most accurate representation of probability values, it is also the slowest. Depending on the math libraries provided by your operating system, logpr_t arithmetic can be 13 to 60 times slower than double precision floating point arithmetic.

The following table reports timing results of the `pr.bench' benchmark on four different machines. All numbers are user times in seconds on an unloaded machine, as reported by /usr/bin/time or /usr/bin/timex (lower is better).

                        double   logpr_t   balanced_t   L_t
                        ------   -------   ----------  -----
AlphaStation 500/500    121.8     1619.8    1292.7     207.2
Sun Ultra2 1200         301.2    10166.4    5037.2     541.9
SGI O2 180mhz           371.4     8570.8    4830.1     546.6
HP 9000/755             369.2    19643.6    7497.1     638.0
Dell GXM 5166           805.5    12010.6    5118.6     933.3

4 Library Organization

4.1 Directory Structure

The library of practical abstractions is organized as follows:

`libpa-1.3.0/include'
files containing public typedefs and prototypes
`libpa-1.3.0/lib.ARCH'
object code archives for ARCH
`libpa-1.3.0/src/module'
source code for module
`libpa-1.3.0/bin'
architecture-independent shell scripts
`libpa-1.3.0/bin/ARCH'
architecture-dependent executables for ARCH
`libpa-1.3.0/etc'
scripts to initialize LIBPA environment
`libpa-1.3.0/info'
documentation in GNU info format
`libpa-1.3.0/html'
documentation in html format

This allows you to compile with -I`libpa-1.3.0/include' and link with -L`libpa-1.3.0/lib.ARCH'. You may also want to put `libpa-1.3.0/bin' and `libpa-1.3.0/bin/ARCH' on your PATH environment variable.

4.2 File Naming Conventions

A LIBPA module named module is implemented in the following files:

`module.h'
public prototypes and typedefs for module
`module.c'
source code for module
`module.test.c'
source code for module test suite
`module.test'
safe executable for module test suite
`module.test_u'
unsafe executable for module test suite
`Makefile'
Makefile for module
`license'
Copyright notice, license, and disclaimer

The implementation may include other files, named according to a similar naming convention, such as:

`module.bench.c'
source code for module comparative benchmark
`module.bench'
executable for module comparative benchmark
`module.bench'
executable for module comparative benchmark (safe)
`module.bench_u'
executable for module comparative benchmark (unsafe)
`module.pure'
purified test suite for module (safe)
`module.pure_u'
purified test suite for module (unsafe)

Special module functions may be provided a separate interface. For example, the memory_t library includes a `memory.spartan.h' header file for the special-purpose spartan memory_t functions. As explained below, the object code for module is placed in an archive named libmodule.a in the `libpa-1.3.0/lib.ARCH' directory.

4.3 Test Suites (required)

Comprehensive test suites are provided for all code. A test suite is a certificate of correctness. It must convince an aggressive skeptic of the correctness of the implementation. The best way to establish correctness is to verify the behavior of the implementation against an independent implementation. The independent implementation should be so simple that it is plausibly correct. Even the weakest test suite must exercise all functions for a wide range of inputs.

A test suite should run to completion in a reasonable amount of time. It should describe the tests being performed as they are executed, but should not display too much information either, rarely more than a page. It should end by announcing the successful completion of the test or by dumping core via an abort() call or an assert() failure.

4.4 Benchmarks (optional)

A benchmark is a comparative performance analysis. The goal of a benchmark is to reveal the performance tradeoffs made in your implementation. Accordingly, the ideal benchmark compares the actual time/space requirements of your implementation with a range of alternative implementations in a realistic application. In practice, a benchmark should compare your implementation to at least one alternative implementation in a realistic sequence of function calls.

4.5 Object Code and Archives

The library of practical abstractions provides two versions of all object code: safe and unsafe. We recommend the routine use of safe object code. Safe object code is intended for everyday use. Safe object code has been compiled with no optimizations, with all assert() macros, and with full symbol tables for debugging. Safe libraries are named `libmodule.a'. The `libpa.a' archive includes all safe object code in the current release.

Unsafe object code is intended for more demanding applications. Unsafe object code has been compiled with maximum optimizations, the NDEBUG macro defined, and without any symbol tables. We discourage the routine use of unsafe object code, which is why we gave it such a frightful name. Under no conditions should unsafe object code be used during debugging, because optimizations are a major source of compiler errors and erroneous debugger information as well. Unsafe libraries are named `libmodule_u.a'. The `libpa_u.a' archive includes all unsafe object code in the current release.

4.6 Makefiles

Our Makefiles include the targets "install", "uninstall", "tests", and "clean". Use `libpa-1.3.0/src/Makefile' to perform these operations on all modules.

make install
Installs the header files in `libpa-1.3.0/include' and then installs the safe and unsafe object code archives in `libpa-1.3.0/lib.ARCH'.
make uninstall
Deletes header files in `libpa-1.3.0/include' and removes the safe and unsafe object code archives in `libpa-1.3.0/lib.ARCH'.
make tests
Creates and runs safe and unsafe versions of the current module's test suite.
make clean
Deletes all object code and archives in the current directory.

5 Programming Style

The central goal of module implementation is correctness. Performance is important, but secondary to correctness. By correctness, we mean that all specified functions are implemented strictly according to the module specification, without any memory leaks or segmentation faults. Aside from trivial cases, there will be no proof of correctness, only degrees of belief in correctness. And so the astute reader has already realized that "correctness" really means "belief in correctness".

Our belief in the correctness of an implementation is principally influenced by clarity of the implementation. It is much easier to develop confidence in code that is easy to understand than in code that is difficult to understand. Clarity is determined by degree of conformance to a style guide, appropriate use of abstractions, appropriate symbol names (that obey reasonable naming conventions), useful but sparing comments (when essential to understanding), and consistent formatting.

As we hope you will see, our software places an unusual premium on clarity, both directly and indirectly. Our belief in the correctness of an implementation is also affected by the rigor of the programmer's test suite and the abundance of assertions. Accordingly, our software always includes a comprehensive test suite as well as aggressive use of assertions.

5.1 Desiderata

We evaluate our code based on the three criteria of correctness, clarity, and performance.

5.1.0.1 Procedure Specification

The following keywords are used to specify the semantics of a procedure. They typically appear as comments immediately below the procedure's prototype in the module's .h file. The absence of a keyword indicates that the procedure's behavior in that respect conforms to widely-held expectations.

Arguments:
Describes semantics of procedure arguments.
Requires:
Describes preconditions required for correct operation. Required when these preconditions are not evident from the names and types of the arguments. Unless it is unusually costly to do so, the procedure should verify its preconditions in safe compilations.
Effect(s):
Describes side-effect(s) of the procedure call. Required when the procedure mutates one of its arguments or a global variable.
Returns:
Describes the meaning of the value returned by the procedure.
Error:
Describes conditions under which the procedure will raise a fatal error. Required when the procedure call raises an error in both safe and unsafe compilations.
Cost:
States the computational cost of the procedure call (time and space). This keyword is required when the procedure call is unusually costly with respect to theoretical lower bounds.
Warning:
Warns user about any common misunderstanding regarding the procedure's semantics.
Note:
Miscellaneous comment not included by one of the more specific keywords above.

5.2 Identifier Naming Conventions

Clarity is of the utmost importance in naming your identifiers. Your code should read like a great novel. Replace generic names, such as count, with maximally specific names, such as vertex_count or edge_count. Refrain from using meaningless names, such as temp, tmp, or variants of the dreaded x, xx, xxx series. Single character names are reserved for integer iteration variables.

Vowels are obligatory! Even standard abbreviations (eg., defn, dir) should not be used unless they improve clarity. If your typing skills are inadequate to type long names quickly, then take a typing class, but don't burden others with your sloth.

5.2.1 Constants and Variables

Constants created using #define should be all upper case. Macros created using #define should be either initial letter upper case or all lower case (to allow replacement by procedures) when they take arguments. Global variables should be first character upper case. Procedures and all local variables should be all lower case.

Compound names are formed with underscore separators, eg., CONSTANT, nifty_macro(), Global_Variable, local_variable.

Don't reuse variable names. A single variable name should have a single meaning in the widest possible scope (minimally within the body of a procedure, preferably within an entire library). Iteration variables should always iterate over the same domain.

5.2.2 Pointers and Handles

When it improves clarity, variables that are pointers to a single object should be named with the _p suffix. Similarly, a pointer to a pointer to an object (also called a "handle") should be named with the _pp suffix when it improves clarity.

type_t    *name_p, **name_pp;

This convention might be used, for example, to distinguish a one dimensional array of pointers (void **name_p or void *name_p[]) from a two dimensional array of objects (void **name or void name[][]), or when an object is passed directly (ie., by value) and indirectly (ie., by reference) in the same piece of code. When an object is always passed by reference, adding the _p suffix does not improve clarity. Note that arrays are NOT conceptually considered pointers, and therefore they should not be named according to the _p or _pp suffix convention (unless the arrays are arrays of pointers or arrays of pointers to pointers, respectively).

5.2.3 Procedures

When possible, procedure names begin with their module name and their prototypes obey the following naming conventions (see module design).

5.2.3.1 Type Definitions

typedef ... type_t;
typedef for objects of type type_t.
type_tag
internal tag for union or struct typedef.

5.2.3.2 Constructors and Destructors

type_t *type_create(...);
Creates a new object of type type_t and returns its address.
void type_destroy(type_t *object);
Permanently destroys an object of type type_t; all subsequent operations on that object are undefined.
type_t *type_copy(const type_t *object);
Creates an exact copy of an object of type type_t and returns its address.
void type_mirror(const type_t *source, type_t *target);
Given two valid objects of type type_t, makes target become an exact copy of source.
void type_initialize(type_t *address, ...);
Given the address of sizeof(type_t) bytes, initializes that memory location to contain a valid object of type type_t.
void type_finalize(type_t *object);
Given a valid object of type type_t, finalizes that object so that the subsequent freeing of sizeof(type_t) bytes at the given address will completely destroy the given object.

5.2.3.3 Predicates, Selectors, and Mutators

boolean_t type_is_predicate(const type_t *object);
Returns TRUE if and only if the Boolean predicate holds for a given object of type type_t.
value type_attribute(const type_t *object);
Returns value of given attribute for given object of type type_t, or the address of that value.
void type_set_attribute(type_t *object, ...);
Change an object of type type_t so that the desired attribute now holds for that object.
unsigned type_bytes(const type_t *object);
Returns the number of bytes that would be reclaimed by calling type_destroy(object), as measured by memory_total_bytes().
void *type_workspace_p(const type_t *object);
Returns address of caller workspace in object of type type_t, which caller is allowed to modify.

5.2.3.4 Type Coercion

typeB_t *typeA2typeB(const typeA_t *object);
Given an object of type typeA_t, creates an equivalent object of type typeB_t and returns it.

5.2.3.5 File Input/Output

boolean_t type_write(const type_t *object, FILE *stream);
Writes objects of type type_t to the binary file stream in self-delimiting format and returns TRUE if and only if the write succeeds.
type_t *type_read(FILE *stream);
Reads object of type type_t from the binary file stream in self-delimiting format and returns it.
boolean_t type_fprintf(const type_t *object, FILE *stream);
Formats object of type type_t on ASCII file stream in self-delimiting human-readable format.
type_t *type_fscanf(FILE *stream);
Reads object of type type_t from ASCII file stream in self-delimiting human-readable format and returns it.
boolean_t type_backup(const type_t *object, FILE *stream);
Writes properties of an object of type type_t to the binary file stream in self-delimiting format and returns TRUE if and only if the write succeeds. This is used when
boolean_t type_restore(type_t *object, FILE *stream);
Reads properties of an object of type type_t from the binary file stream in self-delimiting format and returns TRUE if and only if the read succeeds and the objects properties are updated from the file stream.

5.2.3.6 Argument Order

The first argument to an function should always be an object of the type defined in the function's module, eg.,

void   type_write(const type_t *, FILE *);
type_t *type_read(FILE *);

5.3 Safety and Clarity

Avoid macros! Macros are a major source of bugs. Use procedures instead. Once your code is working, and you find yourself with extra time on your hands, then you can replace some procedures with macros for minor performance gains, although we still do not recommend wasting your time this way.

All source code should make abundant use of the assert() macro from the `<assert.h>' library for both clarity and correctness. Correctness includes checking that procedures are passed the proper arguments and that they return proper values. Clarity includes asserting all known invariants and assumptions. As a rough guide, around 50% of your code should be asserts, as high as 75% when performing pointer arithmetic. No kidding. The source code must work properly even if NDEBUG is defined and all the assert() macros vanish. That means no side-effects may occur in the expressions passed to assert().

Replace compound asserts with primitive asserts in order to make debugging easier. For example, the compound assertion

assert(exp1 && exp2);

should be replaced with the two primitive assertions

assert(exp1); assert(exp2);

Do not mutate a variable inside a complex expression or when it appears as an argument to a procedure call. If this restriction makes you whine about typing effort, then it's time to take another typing class. Thus, f(i++); should be f(i); i++.

All blocks must be explicitly delimited by curly brackets, especially if the blocks contain exactly one statement, eg.,

if (predicate) { statement; }

Liberal use of parentheses is encouraged, particularly within macro definitions. Programming is not a test to see whether you know all the most compact legal expressions of C. Rather, it is a test to see if you can write correct code of breathtaking clarity.

Cutting and pasting code is likely to introduce errors. It is also an early warning sign of a bad modularization. So don't do it! If you are unable to break your cut-and-paste habit, then you must review newly pasted code with extraordinary diligence.

5.4 Type Declarations

Whenever possible, arguments passed by reference (ie., pointers) should also be declared const. Not only does this improve the clarity and safety of your code, but it allows other programmers to use your code. A procedure argument that is not declared const can never be const in any code that uses that procedure.

Do not mix declarations and initializations. That is, do not combine the type declaration of automatic variables with their initialization in the body of a procedure. For example, replace int i=0; with the declaration int i; in the preamble followed by the actual initialization i=0; in the body of the procedure.

Declare variables in the narrowest possible scope. If you only use a variable inside one of the branches of a conditional statement, then it should be declared only in that branch. This reduces the risk of improperly initialized variables and inadvertent name conflicts. It also makes the code easier to understand.

In general, typedefs should be to the object itself, rather than to a pointer to the object. This makes memory management easier in most cases, and improves clarity because the caller better understands the semantics of procedure arguments.

Objects requiring more space than two machine words should typically be passed by reference for the purposes of efficiency. However, objects that are used solely to return multiple values from a procedure call should typically be returned by value in order to simplify memory management.

If at all possible, modules should be parameterized by type. Type parameterization allows you to design efficient, reusable modules. In C, types are parameterized by the size of their objects (in bytes) along with a minimal set of procedures necessary to operate on those objects.

5.5 Additional Guidelines

5.5.1 Efficiency

No operating system calls, such as memory allocation or file i/o, should occur in time-critical inner loops. This may require design changes.

Minimize the number of pointers in your data structures. Each pointer reference requires a potentially-nonlocal memory reference, resulting in a cache miss, and can take up to 8 bytes of storage on some machines.

5.5.2 Memory Management

Even a small memory leak is unacceptable.

All LIBPA code must use the memory_create() and memory_destroy() procedures from the memory_t library instead of the malloc() and free() provided by the standard stdlib.

Every test suite must conclude with a memory leak check.

assert(memory_total_bytes() == 0);
assert(memory_total_blocks() == 0);

5.5.3 Formatting

Use GNU emacs C mode. In most cases, curly braces should be on their own line, in order to more clearly delineate the scope of the block.

Index

Jump to: a - b - c - h - k - l - m - p - s - t - u - v

a

  • announcements
  • b

  • balanced_t
  • benchmarks
  • bug reports
  • c

  • Cons, Lionel
  • h

  • Hanson, David
  • hash_table_t
  • help
  • k

  • Kanzelberger, Kirk
  • l

  • L_t
  • logpr_t
  • m

  • memory_t
  • p

  • pr_t
  • pr_t benchmarks
  • s

  • string_hash_t
  • suffix_array_t
  • symbol_hash_t
  • t

  • table_t
  • Thomas, Robert G.
  • trie_t
  • u

  • unigram_t
  • v

  • vector_t

  • This document was generated on 9 Febuary 1998 using the texi2html translator version 1.52.