- Introduction

        bls2: sparse svd via hybrid block Lanczos procedure for
              eigensystems of the form A'A.

	bls2.c is an ANSI-C code designed to compute singular values
	and singular vectors of a large sparse matrix A.  This is a     
	modified version of the block Lanczos algorithm first
        published by Golub, Luk, and Overton (ACM TOMS 7(2):149-169, 1981).
        This particular implementation is discussed in "Multiprocessor
        Sparse SVD Algorithms and Applications", Ph.D. Thesis by M. Berry,
        University of Illinois at Urbana-Champaign, October 1990.

        The singular values (and singular vectors) of A are determined
        by the eigenvalues (and eigenvectors) of the matrix B, where

                              B =  A'A.

        The eigenvalues of B are the squares of the singular values of
        A, the eigenvectors correspond to the right singular vectors only.
        The left singular vectors of A are then determined by

                          u = 1/sigma A*v,

        where {u,sigma,v} is a singular triplet of A.


        This "hybrid" block Lanczos procedure consists of five phases:

        phase 1: Block Lanczos outer iteration to yield a symmetric
                 block tridiagonal matrix S which shares the same 
                 eigenvalues of the matrix B=A'A.  Total or
                 complete re-orthogonalization is used here.

        phase 2: Lanczos method for tridiagonalizing the S matrix from
                 phase 1 to yield the tridiagonal matrix T which preserves
                 the same eigenvalues.  Complete or total re-orthogonal-
                 ization is used for this Lanczos recursion.  A point
                 Lanczos method is used if a blocksize (nb) of 1 is encountered.
                     
        phase 3: Apply an appropriate QL iteration to diagonalize T and 
                 hence produce approximate singular values (array alpha)
                 of the original matrix A.

        phase 4: Convergence test using a user-supplied residual tolerance
                 (tol).

        phase 5: Iteration restart with orthogonal projection with respect
                 to any (all) converged eigenvectors of B=A'A.
	
- Calling sequence

	The calling sequence for procedure blklan2 is

        long   blklan2(FILE *fp, long nnzero, long m, long n, long ik, 
               double *v, double *eig, long ic, long ib, 
               double tol, double *res, long maxit, long *iko, long *ico,
               long *ibo, long *memory)

	The user specifies as part of the parameter list:
	 
        fp              ... a pointer to output file {FILE *}.
	nnzero          ... number of nonzeros in matrix A {long}.
        m               ... row dimension of the sparse matrix A whose SVD
                            is sought {long}.
	n		... column dimension of the sparse matrix a whose SVD
                            is sought {long}.
	ik     	        ... number of singular triplets desired {long}.
	ib     	        ... initial block size for outer iteration {long}.
	ic     	        ... upper bound for dimension of Krylov subspace 
                            generated via outer iteration {long}.  ic is the
                            maximum dimension for the block upper bidiagonal
                            matrix S generated in phase 1 above.
	tol        	... user-specified tolerance for approximate singular
                            triplets {double}.
	maxit           ... maximum number of outer iterations allowed
                            {long}.
	 
	blklan2 returns via its parameter list the following items:
	 
        ik0             ... number of singular triplets approximated {long}.
	ic0    	        ... last bound for dimension of Krylov subspace 
                            used within outer iteration {long}.
	ib0    	        ... final block size used in outer iteration {long}.
	eig             ... one-dimensional array containing the ik0
                            approximate singular values {double}.
	v	        ... two-dimensional array containing the ik0
                            approximate right singular vectors corresponding
                            to the approximate singular values in array
                            eig {double}.
	res             ... one-dimensional array containing the ik0
                            residuals of the approximate eigenpairs of
                            B=A'A {double}.
	memory          ... memory needed in bytes {long}.

- User-supplied routines

        For bls2.c, the user must specify multiplication by matrices
        A and B separately.  Subroutines opb and opm are used when
        multiplying B=A'A by a single vector and a block of vectors,
        respectively.  Subroutine opa is used for multiplying a
        single vector by the matrix A.

        The specification of opa should look something like

           void opa(long m, long n, double *x, double *y)

        so that opa takes an n by 1 vector x and returns the m by 1
        vector y = A*x, where A is m by n (m >> n).

        The specification of opb should look something like

           void opb(long m, long n, double *x, double *y)

        so that opb takes an n by 1 vector x and returns the n by 1
        vector y = A'A*x, where A is m by n (m >> n).

        The specification of opm should look something like

          void opb(long m, long n, long nc, double **x, double **y)

        so that opm takes an n by nc block of vectors X and returns 
        the n by nc block of vectors Y = A'A*X, where A is m by n (m >> n).

        This version of bls2.c is designed to approximate the ik-largest
        singular triplets of A.  Users interested in the ik-smallest
        singular triplets need only sort the alpha array in increasing
        (as opposed to the default ascending order) following the line

                  if (tql2(nn, alpha, beta, qqp)) break;

        in phase 3 of bls2.c.  The columns of the two-dimensional array
        qqp which are used to obtain approximate eigenpairs of the matrix
        A'A correspond to the elements of alpha (in ascending order)
        which are approximate eigenvalues of A'A (and hence squares of the
        singular values of the matrix A).  The columns of array qqp would
        have to be reordered to reflect a one-to-one correspondence
        with the newly sorted elements of alpha.

- Information 

        Please address all questions, comments, or corrections to:

        M. W. Berry
        Department of Computer Science
        University of Tennessee
        107 Ayres Hall
        Knoxville, TN  37996-1301
        email: berry@cs.utk.edu
        phone: (615) 974-5067

-File descriptions

        bls2.c requires the include file bls2.h for compilation.
        The local parameters defined in bls2.h are:

         k        remaining # of desired triplets (done when = 0)
         k0       count of triplets found in current iteration
         nb       current block size
         nc       size of current subspace
         ns       number of blocks in current iteration

        The input and output files associated with bls2.c are
        listed below.

             Code           Input         Output
            ------      ------------    ---------
            bls2.c      blp2, matrix    blo2,blv2


       The binary output file blv2 (which contains the 
       approximate right singular vectors followed by the ap-
       proximate left singular vectors) will be created by
       bls2.c if it does not already exist.  If you are
       running on a Unix-based workstation you should uncomment
       the line

                 /*   #define  UNIX_CREAT */

       in the declarations prior to main() in bls2.c.
       UNIX_CREAT specifies the use of the UNIX "creat" system 
       routine with the permissions defined by the PERMS constant

                  #define PERMS 0664

       You may adjust PERMS for the desired permissions on the
       blv2 file (default is Read/Write for user and group,
       and Read for others).  Subsequent runs will be able to
       open and overwrite these files with the default permissions.

       bls2.c obtains its parameters specifying the
       sparse SVD problem to be solved from the input file
       blp2. This parameter file contains the single line

	 <name>   maxit   nc   nb   nums   tol   vtf

       where 

        <name>     is the name of the data set containing the nonzeros of A.
        maxit      is an integer specifying maximum number of (outer)
                   block Lanczos iterations allowed.
        nc         is an integer specifying the upper bound for the
                   Krylov subspace generated via the outer iteration.
        nb         is an integer specifying the initial block size for 
                   the outer iteration.
        nums       is an integer specifying the number of singular triplets
                   desired.
        tol        is a double specifying the residual tolerance for
                   approximated singular triplets.
        vtf        contains the string TRUE or FALSE to indicate when 
                   singular triplets are needed (TRUE) and when only 
                   singular values are needed (FALSE);  If vtf is TRUE,
                   the unformatted output file blv2 will contain the 
                   approximate singular vectors written in the order 

                      u[1], v[1], u[2], v[2], ..., u[ik0], v[ik0]. 

                   Here u[i] and v[i] denote the left and right
                   singular vectors, respectively, corresponding to the 
                   i-th approximate singular value, sing[i].

- Sparse matrix format

        bls2.c is designed to read input matrices that are stored
        in the Harwell-Boeing sparse matrix format.  The nonzeros
        of such matrices are stored in a compressed column-oriented
        format.  The row indices and corresponding nonzero values
        are stored by columns with a column start index array
        whose entries contain pointers to the nonzero starting each
        column.  bls2.c reads the sparse matrix data from the input
        file called "matrix".

        Each input file "matrix" should begin with a four-line header
        record followed by three more records containing, in order, 
        the column-start pointers, the row indices, and the nonzero
        numerical values.

        The first line of the header consists of a 72-character title
        and an 8-character key by which the matrices are referenced.
        The second line can be used for comments or to indicate record
        length for each index or value array.  Although this line is 
        generally ignored, A CHARACTER MUST BE PLACED ON THAT LINE.
        The third line contains a three-character string denoting the
        matrix type and the three integers specifying the number of rows,
        columns, and nonzeros.  The fourth line which usually contains
        input format for Fortran-77 I/O is ignored by our ANSI-C code.
        The exact format is

		"%72c %*s %*s %*s %d %d %d %*d"

	for the first three lines of the header,

		line 1      <title>         <key>
		 	(col.  1 - 72) (col. 73 - 80)

		line 2   <string>

		line 3   <matrix type> nrow ncol nnzero 

	and 

		"%*s %*s %*s %*s"

	for the last line of the header.

		line 4   <string1> <string2> <string3> <string4>

        Even though only the title and the integers specifying the
        number of rows, columns, and nonzero elements are read, other
        strings of input must be present in indicated positions.
        Otherwise, the format of the "fscanf" statements must be 
        changed accordingly.
