NAME

     pr_t - abstraction for probability values


SYNOPSIS

     #include "common.h"
     #include "pr.h"

     pr_t    pr_double2pr(double x);

     double  pr_pr2double(pr_t px);

     pr_t    pr_nats2pr(double x);

     double  pr_pr2nats(pr_t px);

     pr_t    pr_bits2pr(double x);

     double  pr_pr2bits(pr_t px);

     pr_t    pr_zero(void);

     pr_t    pr_unity(void);

     pr_t    pr_epsilon(void);

     BOOLEAN pr_is_zero(pr_t px);

     BOOLEAN pr_is_unity(pr_t px);

     BOOLEAN pr_is_valid(pr_t px, pr_t epsilon);

     int     pr_compare(pr_t px, pr_t py,

     int     pr_exact_compare(pr_t px, pr_t py);

     pr_t    pr_difference(pr_t px, pr_t py);

     pr_t    pr_add(pr_t px, pr_t py);

     pr_t    pr_multiply(pr_t px, pr_t py);

     pr_t    pr_divide(pr_t px, pr_t py);

     pr_t    pr_power(pr_t px, double n);

     int     pr_compare_ptr(pr_t *px_p, pr_t *py_p,

     int     pr_exact_compare_ptr(pr_t *px_p, pr_t *py_p);

     int     pr_difference_ptr(pr_t *px_p, pr_t *py_p,


     void    pr_add_ptr(pr_t *px_p, pr_t *py_p,

     void    pr_multiply_ptr(pr_t *px_p, pr_t *py_p,

     void    pr_divide_ptr(pr_t *px_p, pr_t *py_p,

     void    pr_power_ptr(pr_t *px_p, double n,

     void    pr_fprintf(pr_t px, FILE *stream);

     void    pr_printf(pr_t px);

     BOOLEAN pr_is_balanced(pr_t px);


INTRODUCTION

     The pr_t package provides  an  abstraction  for  probability
     values.   An object of type pr_t represents a real number in
     the closed interval [0,1].  The principal benefit  of  using
     this abstraction is to avoid underflow in time series model-
     ing.  In these settings, one computes and manipulates proba-
     bilities  of  long state sequences.  One quickly reaches the
     exponent range limit of double representation,  as  this  is
     typically only +300/-300.

     This package provides a direct and elegant solution to  this
     problem.  All work necessary to support a sufficiently large
     exponent range is swept into the  routines  associated  with
     the  pr_t  data type.  As a result, statistical applications
     are significantly easier to write and  maintain,  while  the
     overall  time/space  penalty  compares favorably with tradi-
     tional numerical methods, such  as  scaling  and  logarthmic
     representation.  Our current implementation of the pr_t type
     requires 1.5 times as  much  space  as  a  double  precision
     floating  point  number while our arithmetic operations take
     roughly ten times longer than the  standard  floating  point
     add/multiply.  This approach is both faster and simpler than
     the logarithmic approach and considerably easier to use than
     the scaling approach.

     The pr_t function  library  provides  operations  to  coerce
     types,  to  compare values, and to perform arithmetic.  Each
     operation is available in two forms,  one  defined  on  pr_t
     objects and the other defined on pointers to pr_t objects.


DATA TYPE

     The pr_t data type is implemented as follows:

             typedef struct pr_tag
             {
                 double  sig;
                 int     exp;
             } pr_t;

     Thus, probablities are represented by a pair of  numbers:  a
     floating  point  number for the significand sig, and a fixed
     point number for the exponent exp.  This is a floating point
     format  supporting  a  huge  exponent  range.  An arithmetic
     operation on the [[pr_t]] type will in general consist of  a
     fixed  point  operation on the exponent and a floating point
     operation on the significand.  For this reason our implemen-
     tation  is  particularly  well-suited to today's superscalar
     architectures, which are  able  to  simultaneously  dispatch
     fixed and floating point instructions.

     The following constants are provided in the implementation:

         PR_RADIX        exponent radix
         PR_PRECISION    open upper bound on number of bits
                         in mantissa
         PR_MANT_LOW     closed lower bound on value of
                         balanced mantissa
         PR_MANT_HIGH    open upper bound on value of
                         balanced mantissa

     Let p be an object of type pr_t.  Then p represents the real
     number  p.sig  *  2^p.exp for radix 2.  In order to maintain
     precision, we will attempt to keep the  significand  in  the
     interval  [0.5,1)  (assuming a radix of 2).  The PR_MANT_LOW
     and PR_MANT_HIGH constants store the lower and upper  bounds
     of this interval.  These constants are currently implemented
     as follows:

             #define PR_RADIX        2.0
             #define PR_PRECISION    (53+1)
             #define PR_MANT_LOW   (1/PR_RADIX)
             #define PR_MANT_HIGH    1.0


PRIMITIVE OPERATIONS

     The following primitive operations are defined  directly  on
     objects of type pr_t.

     Initial   probability   values   are    created    by    the
     pr_double2pr(),  pr_nats2pr(),  and pr_bits2pr() procedures,
     which  convert  probablity  values  from   three   different
     representations to pr_t.  These representations are, respec-
     tively: double-precision floating point, negative log likel-
     ihoods  in  base  e  (i.e.,  codelengths),  and negative log
     likelihoods in base 2 (i.e., binary codelengths).  The  pro-
     cedures  pr_pr2double(), pr_pr2nats(), and pr_pr2bits() per-
     form the inverse conversions.  For user convenience, we also
     provide  the  constructors  pr_zero() and pr_unity() for the
     distinguished probability values of zero and unity,  respec-
     tively.  pr_epsilon() returns the smallest possible mantissa
     increment, with zero exponent.  Note  that  pr_epsilon()  is
     significantly greater than the smallest non-zero pr_t value.
     The pr_is_zero() and pr_is_unity()  predicates  return  TRUE
     iff  their  arguments  are  exactly equal to zero and unity,
     respectively. pr_is_valid() returns TRUE iff its px argument
     falls  in  the closed interval [0-epsilon, 1+epsilon], for a
     user-supplied epsilon parameter.  This predicate is particu-
     larly useful in catching incorrect memory references.

     pr_compare() returns a negative quantity if px is less  than
     py,  a  positive quantity if px is greater than py, and zero
     if they are approximately equal, as determined by the  user-
     supplied    tolerance    argument.   pr_exact_compare()   is
     equivalent to pr_compare(), except  that  equality  must  be
     exact.   These  two procedures may be implemented as macros.
     pr_difference() returns the absolute value of the difference
     between its two arguments.

     The  procedures  pr_add(),  pr_multiply(),  and  pr_divide()
     implement the standard binary arithmetic operations of addi-
     tion, multiplication, and division for objects of type pr_t.

     The pr_power() procedure returns the result of  raising  the
     px  argument  to the nth power.  A domain error occurs if px
     equals 0 and n is less than or equal to 0.  Injudicious  use
     of  this  procedure  can  easily give rise to floating point
     exceptions.


POINTER OPERATIONS

     In addition to the primitive operations specified above,  we
     provide  an equivalent set of pointer-based operations, that
     are passed the memory locations of their arguments and store
     their  return  values  at a caller-supplied memory location.
     These operations are provided to support efficient manipula-
     tion  of  probability  vectors.   They may be implemented as
     macros.

     pr_compare_ptr() and pr_exact_compare_ptr()  are  equivalent
     to   the   basic  pr_compare()  and  pr_exact_compare_ptr(),
     respectively. pr_difference_ptr() places  the  magnitude  of
     the difference between its first and second arguments in the
     location specified by the third argument,  and  returns  the
     sign of their difference.

     The   functions   pr_add_ptr(),    pr_multiply_ptr(),    and
     pr_divide_ptr()  always  read their arguments from locations
     px_p and py_p and place their result in location  pz_p.   It
     is  safe  to  use  either  *px_p or *py_p as an accumulator,
     e.g.,
             pr_add_ptr(&a, &b, &b)

     pr_power_ptr() raises the  probability  located  at  address
     px_p  to  the  nth  power  and stores the result in location
     pz_p.


INPUT/OUTPUT

     The procedures pr_fprintf() and pr_printf() print the  value
     of  a  pr_t  object in the format <sig>b-<exp> to remind the
     reader that the exponent radix is 2 (i.e., binary).

     The predicate pr_is_balanced() returns TRUE iff its argument
     satisfies  the  implementation-specific  constraint that the
     significand  falls  within  the  interval  [0.5,1),  or  the
     exponent  is the smallest representable value.  Note that it
     is not possible to balance  the  significand  for  extremely
     small  probability  values,  and so pr_is_balanced() returns
     TRUE iff either the px argument is balanced  or  it  is  not
     possible to further balance the argument.


FILES

     ~pnylab/lib/1.0/lib.${ARCH}/libpa.a
     ~pnylab/lib/1.0/lib.${ARCH}/libpa_r.a (reentrant)
     ~pnylab/lib/1.0/lib.${ARCH}/libpa_o.a (optimized)
     ~pnylab/lib/1.0/lib.${ARCH}/libpa_or.a   (optimized,   reen-
     trant)


SEE ALSO

     common(3)

     Princeton Research Report CS-TR-172-94, October 1994


AUTHORS

     Peter N. Yianilos, NEC Research Institute, Inc.
     Eric Sven Ristad, Princeton University