Header file for TridiagLU. More...

Data Structures
struct	TridiagLU

Macros
#define	_TRIDIAG_JACOBI_ "jacobi"

#define	_TRIDIAG_GS_ "gather-and-solve"

Functions
int	tridiagLU (double , double , double , double , int, int, void , void )

int	tridiagLUGS (double , double , double , double , int, int, void , void )

int	tridiagIterJacobi (double , double , double , double , int, int, void , void )

int	tridiagLUInit (void , void )

int	blocktridiagLU (double , double , double , double , int, int, int, void , void )

int	blocktridiagIterJacobi (double , double , double , double , int, int, int, void , void )

int	tridiagScaLPK (double , double , double , double , int, int, void , void )

Detailed Description

Header file for TridiagLU.

Author: Debojyoti Ghosh

Definition in file tridiagLU.h.

Data Structure Documentation

struct TridiagLU

Definition at line 81 of file tridiagLU.h.

Data Fields
char	reducedsolvetype[50]	Choice of solver for solving the reduced system. May be _TRIDIAG_JACOBI_ or _TRIDIAG_GS_.
int	evaluate_norm	calculate norm at each iteration? (relevant only for iterative solvers)
int	maxiter	maximum number of iterations (relevant only for iterative solvers)
double	atol	absolute tolerance (relevant only for iterative solvers)
double	rtol	relative tolerace (relevant only for iterative solvers)
int	exititer	number of iterations it ran for (relevant only for iterative solvers)
double	exitnorm	error norm at exit (relevant only for iterative solvers)
int	verbose	print iterations and norms (relevant only for iterative solvers)
double	total_time	Total wall time in seconds
double	stage1_time	Wall time (in seconds) for stage 1 of tridiagLU() or blocktridiagLU()
double	stage2_time	Wall time (in seconds) for stage 2 of tridiagLU() or blocktridiagLU()
double	stage3_time	Wall time (in seconds) for stage 3 of tridiagLU() or blocktridiagLU()
double	stage4_time	Wall time (in seconds) for stage 4 of tridiagLU() or blocktridiagLU()
int	blacs_ctxt	Context variable for ScaLAPACK (relevant if compiled with ScaLAPACK support (-Dwith_scalapack) See also tridiagScaLPK

Macro Definition Documentation

#define _TRIDIAG_JACOBI_ "jacobi"

Jacobi method

See also: tridiagIterJacobi(), blocktridiagIterJacobi()

Definition at line 71 of file tridiagLU.h.

#define _TRIDIAG_GS_ "gather-and-solve"

"Gather-and-solve" method

See also: tridiagLUGS

Definition at line 73 of file tridiagLU.h.

Function Documentation

int tridiagLU	(	double *	a,
		double *	b,
		double *	c,
		double *	x,
		int	n,
		int	ns,
		void *	r,
		void *	m
	)

Solve tridiagonal (non-periodic) systems of equations using parallel LU decomposition: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The iterative substructuring method is used in this function that can be briefly described through the following 4 stages:

Stage 1: Parallel elimination of the tridiagonal blocks on each processor comprising all points of the subdomain except the 1st point (unless its the 1st global point, i.e., a physical boundary)
Stage 2: Elimination of the 1st row on each processor (except the 1st processor) using the last row of the previous processor.
Stage 3: Solution of the reduced tridiagonal system that represents the coupling of the system across the processors, using blocktridiagIterJacobi() in this implementation.
Stage 4: Backward-solve to obtain the final solution

Specific details of the method implemented here are available in:

Ghosh, D., Constantinescu, E. M., Brown, J., "Scalable Nonlinear Compact Schemes", Technical Memorandum, ANL/MCS-TM-340, Argonne National Laboratory, April 2014, (http://www.mcs.anl.gov/publication/scalable-nonlinear-compact-schemes) (also available at http://debog.github.io/Files/2014_Ghosh_Consta_Brown_MCSTR340.pdf).
Ghosh, D., Constantinescu, E. M., Brown, J., Efficient Implementation of Nonlinear Compact Schemes on Massively Parallel Platforms, SIAM Journal on Scientific Computing, 37 (3), 2015, C354–C383 (http://dx.doi.org/10.1137/140989261).

More references on this class of parallel tridiagonal solvers:

E. Polizzi and A. H. Sameh, "A parallel hybrid banded system solver: The SPIKE algorithm", Parallel Comput., 32 (2006), pp. 177–194.
E. Polizzi and A. H. Sameh, "SPIKE: A parallel environment for solving banded linear systems", Comput. & Fluids, 36 (2007), pp. 113–120.

Array layout: The arguments a, b, c, and x are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, superdiagonal, and right-hand-side) of size (n X ns), where n is the local size of the system, and ns is the number of independent systems to solve. The ordering of the elements in these arrays is as follows:

Elements of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} b_0^k & c_0^k & & & \\ a_1^k & b_1^k & c_1^k & & \\ & a_2^k & b_2^k & c_2^k & & \\ & & a_3^k & b_3^k & c_3^k & \\ & & & a_4^k & b_4^k & c_4^k \\ \end{array}\right] \left[\begin{array}{c} x_0^k \\ x_1^k \\ x_2^k \\ x_3^k \\ x_4^k \end{array}\right] = \left[\begin{array}{c} r_0^k \\ r_1^k \\ r_2^k \\ r_3^k \\ r_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

and let \( ns = 3\). Note that in the code, \(x\) and \(r\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_0^0, b_0^1, b_0^2, (diagonal element of the first row in each system)
b_1^0, b_1^1, b_1^2, (diagonal element of the second row in each system)
...,
b_{n-1}^0, b_{n-1}^1, b_{n-1}^2 (diagonal element of the last row in each system)
]
The arrays a, c, and x are stored similarly.

Notes:

This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
The input array x contains the right-hand-side on entering the function, and the solution on exiting it.

Parameters

a	Array containing the sub-diagonal elements
b	Array containing the diagonal elements
c	Array containing the super-diagonal elements
x	Right-hand side; will contain the solution on exit
n	Local size of the system on this processor
ns	Number of systems to solve
r	Object of type TridiagLU
m	MPI communicator

Definition at line 83 of file tridiagLU.c.

 {
   TridiagLU       *params = (TridiagLU*) r;
   int             d,i,istart,iend;
   int             rank,nproc;
   struct timeval  start,stage1,stage2,stage3,stage4;
 
 #ifdef serial
   rank  = 0;
   nproc = 1;
 #else
   MPI_Comm        *comm = (MPI_Comm*) m;
   int             ierr = 0;
   const int       nvar = 4;
 
   if (comm) {
     MPI_Comm_size(*comm,&nproc);
     MPI_Comm_rank(*comm,&rank);
   } else {
     rank  = 0;
     nproc = 1;
   }
 #endif
 
   if (!params) {
     fprintf(stderr,"Error in tridiagLU(): NULL pointer passed for parameters.\n");
     return(1);
   }
 
   /* start */
   gettimeofday(&start,NULL);
 
   if ((ns == 0) || (n == 0)) return(0);
   double *xs1, *xp1;
   xs1 = (double*) calloc (ns, sizeof(double));
   xp1 = (double*) calloc (ns, sizeof(double));
   for (i=0; i<ns; i++) xs1[i] = xp1[i] = 0;
 
   /* Stage 1 - Parallel elimination of subdiagonal entries */
   istart  = (rank == 0 ? 1 : 2);
   iend    = n;
   for (i = istart; i < iend; i++) {
     for (d = 0; d < ns; d++) {
       if (b[(i-1)*ns+d] == 0) return(-1);
       double factor = a[i*ns+d] / b[(i-1)*ns+d];
       b[i*ns+d] -=  factor * c[(i-1)*ns+d];
       a[i*ns+d]  = -factor * a[(i-1)*ns+d];
       x[i*ns+d] -=  factor * x[(i-1)*ns+d];
       if (rank) {
         double factor = c[d] / b[(i-1)*ns+d];
         c[d]  = -factor * c[(i-1)*ns+d];
         b[d] -=  factor * a[(i-1)*ns+d];
         x[d] -=  factor * x[(i-1)*ns+d];
       }
     }
   }
 
   /* end of stage 1 */
   gettimeofday(&stage1,NULL);
 
   /* Stage 2 - Eliminate the first sub- & super-diagonal entries */
   /* This needs the last (a,b,c,x) from the previous process     */
 #ifndef serial
   double *sendbuf, *recvbuf;
   sendbuf = (double*) calloc (ns*nvar, sizeof(double));
   recvbuf = (double*) calloc (ns*nvar, sizeof(double));
   for (d=0; d<ns; d++) {
     sendbuf[d*nvar+0] = a[(n-1)*ns+d]; 
     sendbuf[d*nvar+1] = b[(n-1)*ns+d]; 
     sendbuf[d*nvar+2] = c[(n-1)*ns+d]; 
     sendbuf[d*nvar+3] = x[(n-1)*ns+d];
   }
   if (nproc > 1) {
     MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
     if (rank)             MPI_Irecv(recvbuf,nvar*ns,MPI_DOUBLE,rank-1,1436,*comm,&req[0]);
     if (rank != nproc-1)  MPI_Isend(sendbuf,nvar*ns,MPI_DOUBLE,rank+1,1436,*comm,&req[1]);
     MPI_Waitall(2,&req[0],MPI_STATUS_IGNORE);
   }
   /* The first process sits this one out */
   if (rank) {
     for (d = 0; d < ns; d++) {
       double am1, bm1, cm1, xm1;
       am1 = recvbuf[d*nvar+0]; 
       bm1 = recvbuf[d*nvar+1]; 
       cm1 = recvbuf[d*nvar+2]; 
       xm1 = recvbuf[d*nvar+3];
       double factor;
       if (bm1 == 0) return(-1);
       factor =  a[d] / bm1;
       b[d]  -=  factor * cm1;
       a[d]   = -factor * am1;
       x[d]  -=  factor * xm1;
       if (b[(n-1)*ns+d] == 0) return(-1);
       factor =  c[d] / b[(n-1)*ns+d];
       b[d]  -=  factor * a[(n-1)*ns+d];
       c[d]   = -factor * c[(n-1)*ns+d];
       x[d]  -=  factor * x[(n-1)*ns+d];
     }
   }
   free(sendbuf);
   free(recvbuf);
 #endif
 
   /* end of stage 2 */
   gettimeofday(&stage2,NULL);
 
   /* Stage 3 - Solve the reduced (nproc-1) X (nproc-1) tridiagonal system   */
 #ifndef serial
   if (nproc > 1) {
     double *zero, *one;
     zero = (double*) calloc (ns, sizeof(double));
     one  = (double*) calloc (ns, sizeof(double));
     for (d=0; d<ns; d++) {
       zero[d] = 0.0;
       one [d] = 1.0;
     }
     if (!strcmp(params->reducedsolvetype,_TRIDIAG_GS_)) {
       /* Solving the reduced system by gather-and-solve algorithm */
       if (rank) ierr = tridiagLUGS(a,b,c,x,1,ns,params,comm);
       else      ierr = tridiagLUGS(zero,one,zero,zero,1,ns,params,comm);
       if (ierr) return(ierr);
     } else if (!strcmp(params->reducedsolvetype,_TRIDIAG_JACOBI_)) {
       /* Solving the reduced system iteratively with the Jacobi method */
       if (rank) ierr = tridiagIterJacobi(a,b,c,x,1,ns,params,comm);
       else      ierr = tridiagIterJacobi(zero,one,zero,zero,1,ns,params,comm);
     }
     free(zero);
     free(one);
 
     /* Each process, get the first x of the next process */
     MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
     for (d=0; d<ns; d++)  xs1[d] = x[d];
     if (rank+1 < nproc) MPI_Irecv(xp1,ns,MPI_DOUBLE,rank+1,1323,*comm,&req[0]);
     if (rank)           MPI_Isend(xs1,ns,MPI_DOUBLE,rank-1,1323,*comm,&req[1]);
     MPI_Waitall(2,&req[0],MPI_STATUS_IGNORE);
   }
 #else
   if (nproc > 1) {
     fprintf(stderr,"Error: nproc > 1 for a serial run!\n");
     return(1);
   }
 #endif /* if not serial */
   /* end of stage 3 */
   gettimeofday(&stage3,NULL);
 
   /* Stage 4 - Parallel back-substitution to get the solution  */
   istart = n-1;
   iend   = (rank == 0 ? 0 : 1);
 
   for (d = 0; d < ns; d++) {
     if (b[istart*ns+d] == 0) return(-1);
     x[istart*ns+d] = (x[istart*ns+d]-a[istart*ns+d]*x[d]-c[istart*ns+d]*xp1[d]) / b[istart*ns+d];
   }
   for (i = istart-1; i > iend-1; i--) {
     for (d = 0; d < ns; d++) {
       if (b[i*ns+d] == 0) return(-1);
       x[i*ns+d] = (x[i*ns+d]-c[i*ns+d]*x[(i+1)*ns+d]-a[i*ns+d]*x[d]) / b[i*ns+d];
     }
   }
 
   /* end of stage 4 */
   gettimeofday(&stage4,NULL);
 
   /* Done - now x contains the solution */
   free(xs1);
   free(xp1);
 
   /* save runtimes if needed */
   long long walltime;
   walltime = ((stage1.tv_sec * 1000000 + stage1.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
   params->stage1_time = (double) walltime / 1000000.0;
   walltime = ((stage2.tv_sec * 1000000 + stage2.tv_usec) - (stage1.tv_sec * 1000000 + stage1.tv_usec));
   params->stage2_time = (double) walltime / 1000000.0;
   walltime = ((stage3.tv_sec * 1000000 + stage3.tv_usec) - (stage2.tv_sec * 1000000 + stage2.tv_usec));
   params->stage3_time = (double) walltime / 1000000.0;
   walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (stage3.tv_sec * 1000000 + stage3.tv_usec));
   params->stage4_time = (double) walltime / 1000000.0;
   walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
   params->total_time = (double) walltime / 1000000.0;
   return(0);
 }

int tridiagLUGS	(	double *	a,
		double *	b,
		double *	c,
		double *	x,
		int	n,
		int	ns,
		void *	r,
		void *	m
	)

Solve tridiagonal (non-periodic) systems of equations using the gather-and-solve approach: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The "gather-and-solve" approach gathers a tridiagonal system on one processor and solves it using tridiagLU() (sending NULL as the argument for MPI communicator to indicate that a serial solution is desired). Given multiple tridiagonal systems (ns > 1), this function will gather the systems on different processors in an optimal way, and thus each processor will solve some of the systems. After the system(s) is (are) solved, the solution(s) is (are) scattered back to the original processors.

Array layout: The arguments a, b, c, and x are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, superdiagonal, and right-hand-side) of size (n X ns), where n is the local size of the system, and ns is the number of independent systems to solve. The ordering of the elements in these arrays is as follows:

Elements of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} b_0^k & c_0^k & & & \\ a_1^k & b_1^k & c_1^k & & \\ & a_2^k & b_2^k & c_2^k & & \\ & & a_3^k & b_3^k & c_3^k & \\ & & & a_4^k & b_4^k & c_4^k \\ \end{array}\right] \left[\begin{array}{c} x_0^k \\ x_1^k \\ x_2^k \\ x_3^k \\ x_4^k \end{array}\right] = \left[\begin{array}{c} r_0^k \\ r_1^k \\ r_2^k \\ r_3^k \\ r_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

and let \( ns = 3\). Note that in the code, \(x\) and \(r\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_0^0, b_0^1, b_0^2, (diagonal element of the first row in each system)
b_1^0, b_1^1, b_1^2, (diagonal element of the second row in each system)
...,
b_{n-1}^0, b_{n-1}^1, b_{n-1}^2 (diagonal element of the last row in each system)
]
The arrays a, c, and x are stored similarly.

Notes:

This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
The input array x contains the right-hand-side on entering the function, and the solution on exiting it.

Parameters

a	Array containing the sub-diagonal elements
b	Array containing the diagonal elements
c	Array containing the super-diagonal elements
x	Right-hand side; will contain the solution on exit
n	Local size of the system on this processor
ns	Number of systems to solve
r	Object of type TridiagLU
m	MPI communicator

Definition at line 64 of file tridiagLUGS.c.

 {
   TridiagLU *context = (TridiagLU*) r;
   if (!context) {
     fprintf(stderr,"Error in tridiagLUGS(): NULL pointer passed for parameters.\n");
     return(-1);
   }
 #ifdef serial
 
   /* Serial compilation */
   return(tridiagLU(a,b,c,x,n,ns,context,m));
 
 #else
 
   int         d,i,ierr = 0,dstart,istart,p,q;
   const int   nvar = 4;
   double      *sendbuf,*recvbuf;
   int         rank,nproc;
 
   /* Parallel compilation */
   MPI_Comm  *comm = (MPI_Comm*) m;
   if (!comm) return(tridiagLU(a,b,c,x,n,ns,context,NULL));
   MPI_Comm_size(*comm,&nproc);
   MPI_Comm_rank(*comm,&rank);
 
   if ((ns == 0) || (n == 0)) return(0);
 
   /* 
     each process needs to know the local sizes of every other process 
     and total size of the system
   */
   int *N = (int*) calloc (nproc, sizeof(int));
   MPI_Allgather(&n,1,MPI_INT,N,1,MPI_INT,*comm);
   int NT = 0; for (i=0; i<nproc; i++) NT += N[i];
 
   /* counts and displacements for gather and scatter operations */
   int *counts = (int*) calloc (nproc, sizeof(int));
   int *displ  = (int*) calloc (nproc, sizeof(int));
 
   /* on all processes, calculate the number of systems each     */
   /* process has to solve                                       */
   int *ns_local = (int*) calloc (nproc,sizeof(int));
   for (p=0; p<nproc; p++)    ns_local[p] = ns / nproc; 
   for (p=0; p<ns%nproc; p++) ns_local[p]++;
 
   /* allocate the arrays for the gathered tridiagonal systems */
   double *ra=NULL,*rb=NULL,*rc=NULL,*rx=NULL; 
   if (ns_local[rank] > 0) {
     ra = (double*) calloc (ns_local[rank]*NT,sizeof(double));
     rb = (double*) calloc (ns_local[rank]*NT,sizeof(double));
     rc = (double*) calloc (ns_local[rank]*NT,sizeof(double));
     rx = (double*) calloc (ns_local[rank]*NT,sizeof(double));
   }
 
   /* Gather the systems on each process */
   /* allocate receive buffer */
   if (ns_local[rank] > 0) 
     recvbuf = (double*) calloc (ns_local[rank]*nvar*NT,sizeof(double));
   else recvbuf = NULL;
   dstart = 0;
   for (p = 0; p < nproc; p++) {
     if (ns_local[p] > 0) {
       /* allocate send buffer and form the send packet of data */
       sendbuf = (double*) calloc (nvar*n*ns_local[p],sizeof(double));
       for (d = 0; d < ns_local[p]; d++) {
         for (i = 0; i < n; i++) {
           sendbuf[n*nvar*d+n*0+i] = a[d+dstart+ns*i];
           sendbuf[n*nvar*d+n*1+i] = b[d+dstart+ns*i];
           sendbuf[n*nvar*d+n*2+i] = c[d+dstart+ns*i];
           sendbuf[n*nvar*d+n*3+i] = x[d+dstart+ns*i];
         }
       }
       dstart += ns_local[p];
 
       /* gather these reduced systems on process with rank = p */
       if (rank == p) {
         for (q = 0; q < nproc; q++) {
           counts[q] = nvar*N[q]*ns_local[p];
           displ [q] = (q == 0 ? 0 : displ[q-1]+counts[q-1]);
         }
       }
       MPI_Gatherv(sendbuf,nvar*n*ns_local[p],MPI_DOUBLE,
                   recvbuf,counts,displ,MPI_DOUBLE,p,*comm);
 
       /* deallocate send buffer */
       free(sendbuf);
     }
   }
   /* extract the data from the recvbuf and solve */
   istart = 0;
   for (q = 0; q < nproc; q++) {
     for (d = 0; d < ns_local[rank]; d++) {
       for (i = 0; i < N[q]; i++) {
         ra[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+0*N[q]+i];
         rb[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+1*N[q]+i];
         rc[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+2*N[q]+i];
         rx[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+3*N[q]+i];
       }
     }
     istart += N[q];
   }
   /* deallocate receive buffer */
   if (recvbuf)  free(recvbuf);
 
   /* solve the gathered systems in serial */
   ierr = tridiagLU(ra,rb,rc,rx,NT,ns_local[rank],context,NULL);
   if (ierr) return(ierr);
 
   /* allocate send buffer and save the data to send */
   if (ns_local[rank] > 0)
     sendbuf = (double*) calloc (ns_local[rank]*NT,sizeof(double));
   else sendbuf = NULL;
   istart = 0;
   for (q = 0; q < nproc; q++) {
     for (i = 0; i < N[q]; i++) {
       for (d = 0; d < ns_local[rank]; d++) {
         sendbuf[istart*ns_local[rank]+d*N[q]+i] = rx[d+ns_local[rank]*(istart+i)];
       }
     }
     istart += N[q];
   }
   dstart = 0;
   for (p = 0; p < nproc; p++) {
     if (ns_local[p] > 0) {
 
       /* allocate receive buffer */
       recvbuf = (double*) calloc (ns_local[p]*n, sizeof(double));
 
       /* scatter the solution back */
       for (q = 0; q < nproc; q++) {
         counts[q] = ns_local[p]*N[q];
         displ[q]  = (q == 0 ? 0 : displ[q-1]+counts[q-1]);
       }
       MPI_Scatterv(sendbuf,counts,displ,MPI_DOUBLE,
                    recvbuf,ns_local[p]*n,MPI_DOUBLE,
                    p,*comm);
       /* save the solution on all root processes */
       for (d = 0; d < ns_local[p]; d++) {
         for (i = 0; i < n; i++) {
           x[d+dstart+ns*i] = recvbuf[d*n+i];
         }
       }
       dstart += ns_local[p];
       /* deallocate receive buffer */
       free(recvbuf);
     }
   }
   /* deallocate send buffer */
   if (sendbuf) free(sendbuf);
 
   /* clean up */
   if (ns_local[rank] > 0) {
     free(ra);
     free(rb);
     free(rc);
     free(rx);
   }
   free(ns_local);
   free(N);
   free(displ);
   free(counts);
 
   return(0);
 #endif
 }

int tridiagIterJacobi	(	double *	a,
		double *	b,
		double *	c,
		double *	x,
		int	n,
		int	ns,
		void *	r,
		void *	m
	)

Solve tridiagonal (non-periodic) systems of equations using point Jacobi iterations: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The initial guess is taken as the solution of

\begin{equation} {\rm diag}\left[{\bf b}\right]{\bf x} = {\bf r} \end{equation}

where \({\bf b}\) represents the diagonal elements of the tridiagonal system, and \({\bf r}\) is the right-hand-side, stored in \({\bf x}\) at the start of this function.

Array layout: The arguments a, b, c, and x are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, superdiagonal, and right-hand-side) of size (n X ns), where n is the local size of the system, and ns is the number of independent systems to solve. The ordering of the elements in these arrays is as follows:

Elements of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} b_0^k & c_0^k & & & \\ a_1^k & b_1^k & c_1^k & & \\ & a_2^k & b_2^k & c_2^k & & \\ & & a_3^k & b_3^k & c_3^k & \\ & & & a_4^k & b_4^k & c_4^k \\ \end{array}\right] \left[\begin{array}{c} x_0^k \\ x_1^k \\ x_2^k \\ x_3^k \\ x_4^k \end{array}\right] = \left[\begin{array}{c} r_0^k \\ r_1^k \\ r_2^k \\ r_3^k \\ r_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

and let \( ns = 3\). Note that in the code, \(x\) and \(r\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_0^0, b_0^1, b_0^2, (diagonal element of the first row in each system)
b_1^0, b_1^1, b_1^2, (diagonal element of the second row in each system)
...,
b_{n-1}^0, b_{n-1}^1, b_{n-1}^2 (diagonal element of the last row in each system)
]
The arrays a, c, and x are stored similarly.

Notes:

This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
The input array x contains the right-hand-side on entering the function, and the solution on exiting it.

Parameters

a	Array containing the sub-diagonal elements
b	Array containing the diagonal elements
c	Array containing the super-diagonal elements
x	Right-hand side; will contain the solution on exit
n	Local size of the system on this processor
ns	Number of systems to solve
r	Object of type TridiagLU
m	MPI communicator

Definition at line 64 of file tridiagIterJacobi.c.

 {
   TridiagLU  *context = (TridiagLU*) r;
   int        iter,d,i,NT;
   double     norm=0,norm0=0,global_norm=0;
 
 #ifndef serial
   MPI_Comm  *comm = (MPI_Comm*) m;
   int       rank,nproc;
 
   if (comm) {
     MPI_Comm_size(*comm,&nproc);
     MPI_Comm_rank(*comm,&rank);
   } else {
     rank  = 0;
     nproc = 1;
   }
 #endif
 
   if (!context) {
     fprintf(stderr,"Error in tridiagIterJacobi(): NULL pointer passed for parameters!\n");
     return(-1);
   }
 
   /* check for zero along the diagonal */
   for (i=0; i<n; i++) {
     for (d=0; d<ns; d++) {
       if (b[i*ns+d]*b[i*ns+d] < context->atol*context->atol) {
         fprintf(stderr,"Error in tridiagIterJacobi(): Encountered zero on main diagonal!\n");
         return(1);
       }
     }
   }
 
   double *rhs = (double*) calloc (ns*n, sizeof(double));
   for (i=0; i<n; i++) {
     for (d=0; d<ns; d++) {
       rhs[i*ns+d] = x[i*ns+d]; /* save a copy of the rhs */
       x[i*ns+d]  /= b[i*ns+d]; /* initial guess          */
     }
   }
 
   double *recvbufL, *recvbufR, *sendbufL, *sendbufR;
   recvbufL = (double*) calloc (ns, sizeof(double));
   recvbufR = (double*) calloc (ns, sizeof(double));
   sendbufL = (double*) calloc (ns, sizeof(double));
   sendbufR = (double*) calloc (ns, sizeof(double));
 
   /* total number of points */
 #ifdef serial
   if (context->evaluate_norm)    NT = n;
   else                           NT = 0;
 #else
   if (context->evaluate_norm) MPI_Allreduce(&n,&NT,1,MPI_INT,MPI_SUM,*comm);
   else NT = 0;
 #endif
 
 #ifdef serial
     if (context->verbose) printf("\n");
 #else
     if (context->verbose && (!rank)) printf("\n");
 #endif
 
   iter = 0;
   while(1) {
 
     /* evaluate break conditions */
     if (    (iter >= context->maxiter) 
         ||  (iter && context->evaluate_norm && (global_norm < context->atol)) 
         ||  (iter && context->evaluate_norm && (global_norm/norm0 < context->rtol))  ) {
       break;
     }
 
     /* Communicate the boundary x values between processors */
     for (d=0; d<ns; d++)  recvbufL[d] = recvbufR[d] = 0;
 #ifndef serial
     MPI_Request req[4] =  {MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL};
     if (rank)             MPI_Irecv(recvbufL,ns,MPI_DOUBLE,rank-1,2,*comm,&req[0]);
     if (rank != nproc-1)  MPI_Irecv(recvbufR,ns,MPI_DOUBLE,rank+1,3,*comm,&req[1]);
     for (d=0; d<ns; d++)  { sendbufL[d] = x[d]; sendbufR[d] = x[(n-1)*ns+d]; }
     if (rank)             MPI_Isend(sendbufL,ns,MPI_DOUBLE,rank-1,3,*comm,&req[2]);
     if (rank != nproc-1)  MPI_Isend(sendbufR,ns,MPI_DOUBLE,rank+1,2,*comm,&req[3]);
 #endif
 
     /* calculate error norm - interior */
     if (context->evaluate_norm) {
       norm = 0;
       for (i=1; i<n-1; i++) {
         for (d=0; d<ns; d++) {
           norm  += ( (a[i*ns+d]*x[(i-1)*ns+d] + b[i*ns+d]*x[i*ns+d] + c[i*ns+d]*x[(i+1)*ns+d] - rhs[i*ns+d])
                    * (a[i*ns+d]*x[(i-1)*ns+d] + b[i*ns+d]*x[i*ns+d] + c[i*ns+d]*x[(i+1)*ns+d] - rhs[i*ns+d]) );
         }
       }
     }
     /* calculate error norm - boundary */
 #ifndef serial
     MPI_Waitall(4,req,MPI_STATUS_IGNORE);
 #endif
     if (context->evaluate_norm) {
       if (n > 1) {
         for (d=0; d<ns; d++) {
           norm  += ( (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*x[d+ns*1]- rhs[d])
                    * (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*x[d+ns*1]- rhs[d]) );
         }
         for (d=0; d<ns; d++) {
           norm  += ( (a[d+ns*(n-1)]*x[d+ns*(n-2)] + b[d+ns*(n-1)]*x[d+ns*(n-1)] + c[d+ns*(n-1)]*recvbufR[d] - rhs[d+ns*(n-1)])
                    * (a[d+ns*(n-1)]*x[d+ns*(n-2)] + b[d+ns*(n-1)]*x[d+ns*(n-1)] + c[d+ns*(n-1)]*recvbufR[d] - rhs[d+ns*(n-1)]) );
         }
       } else {
         for (d=0; d<ns; d++) {
           norm  += ( (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*recvbufR[d] - rhs[d])
                    * (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*recvbufR[d] - rhs[d]) );
         }
       }
       /* sum over all processes */
 #ifdef serial
       global_norm = norm;
 #else
       MPI_Allreduce(&norm,&global_norm,1,MPI_DOUBLE,MPI_SUM,*comm);
 #endif
       global_norm = sqrt(global_norm/NT);
       if (!iter) norm0 = global_norm;
     } else {
       norm = -1.0;
       global_norm = -1.0;
     }
 
 #ifdef serial
     if (context->verbose)
 #else
     if (context->verbose && (!rank))
 #endif
       printf("\t\titer: %d, norm: %1.16E\n",iter,global_norm);
 
     /* correct the solution for this iteration */
     if (n > 1) {
       for (d=0; d<ns; d++) {
         i = 0;    x[i*ns+d] = (rhs[i*ns+d] - a[i*ns+d]*recvbufL[d] - c[i*ns+d]*x[d+ns*(i+1)]  ) / b[i*ns+d];
         i = n-1;  x[i*ns+d] = (rhs[i*ns+d] - a[i*ns+d]*x[d+ns*(i-1)]   - c[i*ns+d]*recvbufR[d]) / b[i*ns+d];
       }
       for (i=1; i<n-1; i++) {
         for (d=0; d<ns; d++) {
           x[i*ns+d] = (rhs[i*ns+d] - a[i*ns+d]*x[d+ns*(i-1)] - c[i*ns+d]*x[d+ns*(i+1)]) / b[i*ns+d];
         }
       }
     } else {
       for (d=0; d<ns; d++) x[d] = (rhs[d] - a[d]*recvbufL[d] - c[d]*recvbufR[d]) / b[d];
     }
 
     /* finished with this iteration */
     iter++;
   }
 
   /* save convergence information */
   context->exitnorm = (context->evaluate_norm ? global_norm : -1.0);
   context->exititer = iter;
 
   free(rhs);
   free(sendbufL);
   free(sendbufR);
   free(recvbufL);
   free(recvbufR);
 
   return(0);
 }

int tridiagLUInit	(	void *	r,
		void *	c
	)

Initialize the tridiagLU solver by setting the various parameters in TridiagLU to their default values. If the file lusolver.inp is available, read it and set the parameters.

The file lusolver.inp must be in the following format:

    begin
        <keyword>   <value>
        <keyword>   <value>
        <keyword>   <value>
        ...
        <keyword>   <value>
    end

where the list of keywords are:

Keyword name	Type	Variable	Default value
evaluate_norm	int	TridiagLU::evaluate_norm	1
maxiter	int	TridiagLU::maxiter	10
atol	double	TridiagLU::atol	1e-12
rtol	double	TridiagLU::rtol	1e-10
verbose	int	TridiagLU::verbose	0
reducedsolvetype	char[]	TridiagLU::reducedsolvetype	_TRIDIAG_JACOBI_

Parameters

r	Object of type TridiagLU
c	MPI communicator

Definition at line 39 of file tridiagLUInit.c.

 {
   TridiagLU *t = (TridiagLU*) r;
   int       rank,ierr;
 #ifdef serial
   rank  = 0;
 #else
   MPI_Comm  *comm = (MPI_Comm*) c;
   if (!comm) rank = 0;
   else MPI_Comm_rank(*comm,&rank);
 #endif
 
   /* default values */
   strcpy(t->reducedsolvetype,_TRIDIAG_JACOBI_);
   t->evaluate_norm = 1;
   t->maxiter       = 10;
   t->atol          = 1e-12;
   t->rtol          = 1e-10;
   t->verbose       = 0;
 
   /* read from file, if available */
   if (!rank) {
     FILE *in;
     in = fopen("lusolver.inp","r");
     if (!in) {
       printf("tridiagLUInit: File \"lusolver.inp\" not found. Using default values.\n");
     } else {
       char word[100];
       ierr = fscanf(in,"%s",word); if (ierr != 1) return(1);
       if (!strcmp(word, "begin")) {
           while (strcmp(word, "end")) {
               ierr = fscanf(in,"%s",word); if (ierr != 1) return(1);
                 if      (!strcmp(word, "evaluate_norm"   )) ierr = fscanf(in,"%d" ,&t->evaluate_norm  );
                 else if (!strcmp(word, "maxiter"         )) ierr = fscanf(in,"%d" ,&t->maxiter        );
             else if (!strcmp(word, "atol"            )) ierr = fscanf(in,"%lf",&t->atol           );
             else if (!strcmp(word, "rtol"            )) ierr = fscanf(in,"%lf",&t->rtol           );
               else if (!strcmp(word, "verbose"         ))   ierr = fscanf(in,"%d" ,&t->verbose        );
               else if (!strcmp(word, "reducedsolvetype"))   ierr = fscanf(in,"%s" ,t->reducedsolvetype);
           else if (strcmp(word,"end")) {
             char useless[100];
             ierr = fscanf(in,"%s",useless);
             printf("Warning: keyword %s in file \"lusolver.inp\" with value %s not recognized or extraneous. Ignoring.\n",word,useless);
           }
           if (ierr != 1) return(1);
         }
       } else {
           fprintf(stderr,"Error: Illegal format in file \"solver.inp\".\n");
         return(1);
       }
       fclose(in);
     }
   }
 
   /* broadcast to all processes */
 #ifndef serial
   if (comm) {
     MPI_Bcast(t->reducedsolvetype,50,MPI_CHAR,0,*comm);
     MPI_Bcast(&t->evaluate_norm,1,MPI_INT,0,*comm);
     MPI_Bcast(&t->maxiter,1,MPI_INT,0,*comm);
     MPI_Bcast(&t->verbose,1,MPI_INT,0,*comm);
     MPI_Bcast(&t->atol,1,MPI_DOUBLE,0,*comm);
     MPI_Bcast(&t->rtol,1,MPI_DOUBLE,0,*comm);
   }
 #endif
 
   return(0);
 }

int blocktridiagLU	(	double *	a,
		double *	b,
		double *	c,
		double *	x,
		int	n,
		int	ns,
		int	bs,
		void *	r,
		void *	m
	)

Solve block tridiagonal (non-periodic) systems of equations using parallel LU decomposition: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The iterative substructuring method is used in this function that can be briefly described through the following 4 stages:

Stage 1: Parallel elimination of the tridiagonal blocks on each processor comprising all points of the subdomain except the 1st point (unless its the 1st global point, i.e., a physical boundary)
Stage 2: Elimination of the 1st row on each processor (except the 1st processor) using the last row of the previous processor.
Stage 3: Solution of the reduced tridiagonal system that represents the coupling of the system across the processors, using blocktridiagIterJacobi() in this implementation.
Stage 4: Backward-solve to obtain the final solution

Specific details of the method implemented here are available in:

Ghosh, D., Constantinescu, E. M., Brown, J., "Scalable Nonlinear Compact Schemes", Technical Memorandum, ANL/MCS-TM-340, Argonne National Laboratory, April 2014, (http://www.mcs.anl.gov/publication/scalable-nonlinear-compact-schemes) (also available at http://debog.github.io/Files/2014_Ghosh_Consta_Brown_MCSTR340.pdf).
Ghosh, D., Constantinescu, E. M., Brown, J., Efficient Implementation of Nonlinear Compact Schemes on Massively Parallel Platforms, SIAM Journal on Scientific Computing, 37 (3), 2015, C354–C383 (http://dx.doi.org/10.1137/140989261).

More references on this class of parallel tridiagonal solvers:

E. Polizzi and A. H. Sameh, "A parallel hybrid banded system solver: The SPIKE algorithm", Parallel Comput., 32 (2006), pp. 177–194.
E. Polizzi and A. H. Sameh, "SPIKE: A parallel environment for solving banded linear systems", Comput. & Fluids, 36 (2007), pp. 113–120.

Array layout: The arguments a, b, and c are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, and superdiagonal) of size (n X ns X bs^2), and x is a local 1D array (containing this processor's part of the right-hand-side, and will contain the solution on exit) of size (n X ns X bs), where n is the local size of the system, ns is the number of independent systems to solve, and bs is the block size. The ordering of the elements in these arrays is as follows:

Each block is stored in the row-major format.
Blocks of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} B_0^k & C_0^k & & & \\ A_1^k & B_1^k & C_1^k & & \\ & A_2^k & B_2^k & C_2^k & & \\ & & A_3^k & B_3^k & C_3^k & \\ & & & A_4^k & B_4^k & C_4^k \\ \end{array}\right] \left[\begin{array}{c} X_0^k \\ X_1^k \\ X_2^k \\ X_3^k \\ X_4^k \end{array}\right] = \left[\begin{array}{c} R_0^k \\ R_1^k \\ R_2^k \\ R_3^k \\ R_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

where \(A\), \(B\), and \(C\) are matrices of size bs = 2 (say), and let \( ns = 3\). In the equation above, we have

\begin{equation} B_i^k = \left[\begin{array}{cc} b_{00,i}^k & b_{01,i}^k \\ b_{10,i}^k & b_{11,i}^k \end{array}\right], X_i^k = \left[\begin{array}{c} x_{0,i}^k \\ x_{1,i}^k \end{array} \right], R_i^k = \left[\begin{array}{c} r_{0,i}^k \\ r_{1,i}^k \end{array} \right] \end{equation}

Note that in the code, \(X\) and \(R\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_{00,0}^0, b_{01,0}^0, b_{10,0}^0, b_{11,0}^0, b_{00,0}^1, b_{01,0}^1, b_{10,0}^1, b_{11,0}^1, b_{00,0}^2, b_{01,0}^2, b_{10,0}^2, b_{11,0}^2,
b_{00,1}^0, b_{01,1}^0, b_{10,1}^0, b_{11,1}^0, b_{00,1}^1, b_{01,1}^1, b_{10,1}^1, b_{11,1}^1, b_{00,1}^2, b_{01,1}^2, b_{10,1}^2, b_{11,1}^2,
...,
b_{00,n-1}^0, b_{01,n-1}^0, b_{10,n-1}^0, b_{11,n-1}^0, b_{00,n-1}^1, b_{01,n-1}^1, b_{10,n-1}^1, b_{11,n-1}^1, b_{00,n-1}^2, b_{01,n-1}^2, b_{10,n-1}^2, b_{11,n-1}^2
]
The arrays a and c are stored similarly.

The array corresponding to a vector (the solution and the right-hand-side x) must be a 1D array with the following layout of elements:
[
x_{0,0}^0, x_{1,0}^0, x_{0,0}^1, x_{1,0}^1,x_{0,0}^2, x_{1,0}^2,
x_{0,1}^0, x_{1,1}^0, x_{0,1}^1, x_{1,1}^1,x_{0,1}^2, x_{1,1}^2,
...,
x_{0,n-1}^0, x_{1,n-1}^0, x_{0,n-1}^1, x_{1,n-1}^1,x_{0,n-1}^2, x_{1,n-1}^2
]

Notes:

This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
The input array x contains the right-hand-side on entering the function, and the solution on exiting it.

Parameters

a	Array containing the sub-diagonal elements
b	Array containing the diagonal elements
c	Array containing the super-diagonal elements
x	Right-hand side; will contain the solution on exit
n	Local size of the system on this processor (not multiplied by the block size)
ns	Number of systems to solve
bs	Block size
r	Object of type TridiagLU (contains wall times at exit)
m	MPI communicator

Definition at line 107 of file blocktridiagLU.c.

 {
   TridiagLU       *params = (TridiagLU*) r;
   int             d,i,j,istart,iend,size;
   int             rank,nproc,bs2=bs*bs,nsbs=ns*bs;
   struct timeval  start,stage1,stage2,stage3,stage4;
 
 #ifdef serial
   rank  = 0;
   nproc = 1;
 #else
   MPI_Comm        *comm = (MPI_Comm*) m;
   const int       nvar = 4;
   int             ierr = 0;
 
   if (comm) {
     MPI_Comm_size(*comm,&nproc);
     MPI_Comm_rank(*comm,&rank);
   } else {
     rank  = 0;
     nproc = 1;
   }
 #endif
 
   if (!params) {
     fprintf(stderr,"Error in tridiagLU(): NULL pointer passed for parameters.\n");
     return(1);
   }
 
   /* start */
   gettimeofday(&start,NULL);
 
   if ((ns == 0) || (n == 0) || (bs == 0)) return(0);
   double *xs1, *xp1;
   xs1 = (double*) calloc (nsbs, sizeof(double));
   xp1 = (double*) calloc (nsbs, sizeof(double));
   for (i=0; i<nsbs; i++) xs1[i] = xp1[i] = 0;
 
   /* Stage 1 - Parallel elimination of subdiagonal entries */
   istart  = (rank == 0 ? 1 : 2);
   iend    = n;
   for (i = istart; i < iend; i++) {
     double binv[bs2], factor[bs2];
     for (d = 0; d < ns; d++) {
       _MatrixInvert_           (b+((i-1)*ns+d)*bs2,binv,bs);
       _MatrixMultiply_         (a+(i*ns+d)*bs2,binv,factor,bs);
       _MatrixMultiplySubtract_ (b+(i*ns+d)*bs2,factor,c+((i-1)*ns+d)*bs2,bs);
       _MatrixZero_             (a+(i*ns+d)*bs2,bs);
       _MatrixMultiplySubtract_ (a+(i*ns+d)*bs2,factor,a+((i-1)*ns+d)*bs2,bs);
       _MatVecMultiplySubtract_ (x+(i*ns+d)*bs ,factor,x+((i-1)*ns+d)*bs ,bs);
       if (rank) {
         _MatrixMultiply_         (c+d*bs2,binv,factor,bs);
         _MatrixZero_             (c+d*bs2,bs);
         _MatrixMultiplySubtract_ (c+d*bs2,factor,c+((i-1)*ns+d)*bs2,bs);
         _MatrixMultiplySubtract_ (b+d*bs2,factor,a+((i-1)*ns+d)*bs2,bs);
         _MatVecMultiplySubtract_ (x+d*bs ,factor,x+((i-1)*ns+d)*bs ,bs);
       }
     }
   }
 
   /* end of stage 1 */
   gettimeofday(&stage1,NULL);
 
   /* Stage 2 - Eliminate the first sub- & super-diagonal entries */
   /* This needs the last (a,b,c,x) from the previous process     */
 #ifndef serial
   double *sendbuf, *recvbuf;
   size = ns*bs2*(nvar-1)+nsbs;
   sendbuf = (double*) calloc (size, sizeof(double));
   recvbuf = (double*) calloc (size, sizeof(double));
   for (d=0; d<ns; d++) {
     for (i=0; i<bs2; i++) {
       sendbuf[(0*ns+d)*bs2+i] = a[((n-1)*ns+d)*bs2+i]; 
       sendbuf[(1*ns+d)*bs2+i] = b[((n-1)*ns+d)*bs2+i]; 
       sendbuf[(2*ns+d)*bs2+i] = c[((n-1)*ns+d)*bs2+i];
     }
     for (i=0; i<bs; i++) sendbuf[3*ns*bs2+d*bs+i] = x[((n-1)*ns+d)*bs+i];
   }
   if (nproc > 1) {
     MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
     if (rank)             MPI_Irecv(recvbuf,size,MPI_DOUBLE,rank-1,1436,*comm,&req[0]);
     if (rank != nproc-1)  MPI_Isend(sendbuf,size,MPI_DOUBLE,rank+1,1436,*comm,&req[1]);
     MPI_Waitall(2,&req[0],MPI_STATUS_IGNORE);
   }
   /* The first process sits this one out */
   if (rank) {
     for (d = 0; d < ns; d++) {
       double am1[bs2], bm1[bs2], cm1[bs2], xm1[bs];
       for (i=0; i<bs2; i++) {
         am1[i] = recvbuf[(0*ns+d)*bs2+i]; 
         bm1[i] = recvbuf[(1*ns+d)*bs2+i]; 
         cm1[i] = recvbuf[(2*ns+d)*bs2+i]; 
       }
       for (i=0; i<bs; i++) xm1[i] = recvbuf[3*ns*bs2+d*bs+i];
       double factor[bs2], binv[bs2];
       _MatrixInvert_           (bm1,binv,bs);
       _MatrixMultiply_         (a+d*bs2,binv,factor,bs);
       _MatrixMultiplySubtract_ (b+d*bs2,factor,cm1,bs);
       _MatrixZero_             (a+d*bs2,bs);
       _MatrixMultiplySubtract_ (a+d*bs2,factor,am1,bs);
       _MatVecMultiplySubtract_ (x+d*bs ,factor,xm1,bs);
       
       _MatrixInvert_           (b+((n-1)*ns+d)*bs2,binv,bs); if (ierr) return(ierr);
       _MatrixMultiply_         (c+d*bs2,binv,factor,bs);
       _MatrixMultiplySubtract_ (b+d*bs2,factor,a+((n-1)*ns+d)*bs2,bs);
       _MatrixZero_             (c+d*bs2,bs);
       _MatrixMultiplySubtract_ (c+d*bs2,factor,c+((n-1)*ns+d)*bs2,bs);
       _MatVecMultiplySubtract_ (x+d*bs ,factor,x+((n-1)*ns+d)*bs ,bs);
     }
   }
   free(sendbuf);
   free(recvbuf);
 #endif
 
   /* end of stage 2 */
   gettimeofday(&stage2,NULL);
 
   /* Stage 3 - Solve the reduced (nproc-1) X (nproc-1) tridiagonal system   */
 #ifndef serial
   if (nproc > 1) {
     double *zero, *eye;
     zero = (double*) calloc (ns*bs2, sizeof(double));
     eye  = (double*) calloc (ns*bs2, sizeof(double));
     for (d=0; d<ns*bs2; d++) zero[d] = eye[d] = 0.0;
     for (d=0; d<ns; d++) {
       for (i=0; i<bs; i++) eye[d*bs2+(i*bs+i)] = 1.0;
     }
 
     if (!strcmp(params->reducedsolvetype,_TRIDIAG_GS_)) {
       /* not supported */
       fprintf(stderr,"Error in blocktridiagLU(): Gather-and-solve for reduced system not available.\n");
       return(1);
     } else if (!strcmp(params->reducedsolvetype,_TRIDIAG_JACOBI_)) {
       /* Solving the reduced system iteratively with the Jacobi method */
       if (rank) ierr = blocktridiagIterJacobi(a,b,c,x,1,ns,bs,params,comm);
       else      ierr = blocktridiagIterJacobi(zero,eye,zero,zero,1,ns,bs,params,comm);
     }
     free(zero);
     free(eye);
 
     /* Each process, get the first x of the next process */
     MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
     for (d=0; d<nsbs; d++)  xs1[d] = x[d];
     if (rank+1 < nproc) MPI_Irecv(xp1,nsbs,MPI_DOUBLE,rank+1,1323,*comm,&req[0]);
     if (rank)           MPI_Isend(xs1,nsbs,MPI_DOUBLE,rank-1,1323,*comm,&req[1]);
     MPI_Waitall(2,&req[0],MPI_STATUS_IGNORE);
   }
 #else
   if (nproc > 1) {
     fprintf(stderr,"Error: nproc > 1 for a serial run!\n");
     return(1);
   }
 #endif /* if not serial */
   /* end of stage 3 */
   gettimeofday(&stage3,NULL);
 
   /* Stage 4 - Parallel back-substitution to get the solution  */
   istart = n-1;
   iend   = (rank == 0 ? 0 : 1);
 
   for (d = 0; d < ns; d++) {
     double binv[bs2],xt[bs];
     _MatrixInvert_           (b+(istart*ns+d)*bs2,binv,bs);
     _MatVecMultiplySubtract_ (x+(istart*ns+d)*bs ,a+(istart*ns+d)*bs2,x  +d*bs,bs);
     _MatVecMultiplySubtract_ (x+(istart*ns+d)*bs ,c+(istart*ns+d)*bs2,xp1+d*bs,bs);
     _MatVecMultiply_         (binv,x+(istart*ns+d)*bs,xt,bs);
     for (j=0; j<bs; j++)   x[(istart*ns+d)*bs+j]=xt[j];
   }
   for (i = istart-1; i > iend-1; i--) {
     for (d = 0; d < ns; d++) {
       double binv[bs2],xt[bs];
       _MatrixInvert_           (b+(i*ns+d)*bs2,binv,bs);
       _MatVecMultiplySubtract_ (x+(i*ns+d)*bs ,c+(i*ns+d)*bs2,x+((i+1)*ns+d)*bs,bs);
       _MatVecMultiplySubtract_ (x+(i*ns+d)*bs ,a+(i*ns+d)*bs2,x+d*bs,bs);
       _MatVecMultiply_         (binv,x+(i*ns+d)*bs,xt,bs);
       for (j=0; j<bs; j++)   x[(i*ns+d)*bs+j] = xt[j];
     }
   }
 
   /* end of stage 4 */
   gettimeofday(&stage4,NULL);
 
   /* Done - now x contains the solution */
   free(xs1);
   free(xp1);
 
   /* save runtimes if needed */
   long long walltime;
   walltime = ((stage1.tv_sec * 1000000 + stage1.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
   params->stage1_time = (double) walltime / 1000000.0;
   walltime = ((stage2.tv_sec * 1000000 + stage2.tv_usec) - (stage1.tv_sec * 1000000 + stage1.tv_usec));
   params->stage2_time = (double) walltime / 1000000.0;
   walltime = ((stage3.tv_sec * 1000000 + stage3.tv_usec) - (stage2.tv_sec * 1000000 + stage2.tv_usec));
   params->stage3_time = (double) walltime / 1000000.0;
   walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (stage3.tv_sec * 1000000 + stage3.tv_usec));
   params->stage4_time = (double) walltime / 1000000.0;
   walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
   params->total_time = (double) walltime / 1000000.0;
   return(0);
 }

int blocktridiagIterJacobi	(	double *	a,
		double *	b,
		double *	c,
		double *	x,
		int	n,
		int	ns,
		int	bs,
		void *	r,
		void *	m
	)

Solve block tridiagonal (non-periodic) systems of equations using point Jacobi iterations: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The initial guess is taken as the solution of

\begin{equation} {\rm diag}\left[{\bf b}\right]{\bf x} = {\bf r} \end{equation}

where \({\bf b}\) represents the diagonal elements of the tridiagonal system, and \({\bf r}\) is the right-hand-side, stored in \({\bf x}\) at the start of this function.

Array layout: The arguments a, b, and c are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, and superdiagonal) of size (n X ns X bs^2), and x is a local 1D array (containing this processor's part of the right-hand-side, and will contain the solution on exit) of size (n X ns X bs), where n is the local size of the system, ns is the number of independent systems to solve, and bs is the block size. The ordering of the elements in these arrays is as follows:

Each block is stored in the row-major format.
Blocks of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} B_0^k & C_0^k & & & \\ A_1^k & B_1^k & C_1^k & & \\ & A_2^k & B_2^k & C_2^k & & \\ & & A_3^k & B_3^k & C_3^k & \\ & & & A_4^k & B_4^k & C_4^k \\ \end{array}\right] \left[\begin{array}{c} X_0^k \\ X_1^k \\ X_2^k \\ X_3^k \\ X_4^k \end{array}\right] = \left[\begin{array}{c} R_0^k \\ R_1^k \\ R_2^k \\ R_3^k \\ R_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

where \(A\), \(B\), and \(C\) are matrices of size bs = 2 (say), and let \( ns = 3\). In the equation above, we have

\begin{equation} B_i^k = \left[\begin{array}{cc} b_{00,i}^k & b_{01,i}^k \\ b_{10,i}^k & b_{11,i}^k \end{array}\right], X_i^k = \left[\begin{array}{c} x_{0,i}^k \\ x_{1,i}^k \end{array} \right], R_i^k = \left[\begin{array}{c} r_{0,i}^k \\ r_{1,i}^k \end{array} \right] \end{equation}

Note that in the code, \(X\) and \(R\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_{00,0}^0, b_{01,0}^0, b_{10,0}^0, b_{11,0}^0, b_{00,0}^1, b_{01,0}^1, b_{10,0}^1, b_{11,0}^1, b_{00,0}^2, b_{01,0}^2, b_{10,0}^2, b_{11,0}^2,
b_{00,1}^0, b_{01,1}^0, b_{10,1}^0, b_{11,1}^0, b_{00,1}^1, b_{01,1}^1, b_{10,1}^1, b_{11,1}^1, b_{00,1}^2, b_{01,1}^2, b_{10,1}^2, b_{11,1}^2,
...,
b_{00,n-1}^0, b_{01,n-1}^0, b_{10,n-1}^0, b_{11,n-1}^0, b_{00,n-1}^1, b_{01,n-1}^1, b_{10,n-1}^1, b_{11,n-1}^1, b_{00,n-1}^2, b_{01,n-1}^2, b_{10,n-1}^2, b_{11,n-1}^2
]
The arrays a and c are stored similarly.

The array corresponding to a vector (the solution and the right-hand-side x) must be a 1D array with the following layout of elements:
[
x_{0,0}^0, x_{1,0}^0, x_{0,0}^1, x_{1,0}^1,x_{0,0}^2, x_{1,0}^2,
x_{0,1}^0, x_{1,1}^0, x_{0,1}^1, x_{1,1}^1,x_{0,1}^2, x_{1,1}^2,
...,
x_{0,n-1}^0, x_{1,n-1}^0, x_{0,n-1}^1, x_{1,n-1}^1,x_{0,n-1}^2, x_{1,n-1}^2
]

Notes:

This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
The input array x contains the right-hand-side on entering the function, and the solution on exiting it.

Parameters

a	Array containing the sub-diagonal elements
b	Array containing the diagonal elements
c	Array containing the super-diagonal elements
x	Right-hand side; will contain the solution on exit
n	Local size of the system on this processor (not multiplied by the block size)
ns	Number of systems to solve
bs	Block size
r	Object of type TridiagLU
m	MPI communicator

Definition at line 88 of file blocktridiagIterJacobi.c.

 {
   TridiagLU  *context = (TridiagLU*) r;
   int        iter,d,i,j,NT,bs2=bs*bs,nsbs=ns*bs;
   double     norm=0,norm0=0,global_norm=0;
 
 #ifndef serial
   MPI_Comm  *comm = (MPI_Comm*) m;
   int       rank,nproc;
 
   if (comm) {
     MPI_Comm_size(*comm,&nproc);
     MPI_Comm_rank(*comm,&rank);
   } else {
     rank  = 0;
     nproc = 1;
   }
 #endif
 
   if (!context) {
     fprintf(stderr,"Error in tridiagIterJacobi(): NULL pointer passed for parameters!\n");
     return(-1);
   }
 
   double *rhs = (double*) calloc (ns*n*bs, sizeof(double));
   for (i=0; i<ns*n*bs; i++) rhs[i] = x[i]; /* save a copy of the rhs */
   /* initial guess */
   for (i=0; i<n; i++) {
     for (d=0; d<ns; d++) {
       double binv[bs2];
       _MatrixInvert_    (b+(i*ns+d)*bs2,binv,bs);
       _MatVecMultiply_  (binv,rhs+(i*ns+d)*bs,x+(i*ns+d)*bs,bs);
     }
   }
 
   double *recvbufL, *recvbufR, *sendbufL, *sendbufR;
   recvbufL = (double*) calloc (nsbs, sizeof(double));
   recvbufR = (double*) calloc (nsbs, sizeof(double));
   sendbufL = (double*) calloc (nsbs, sizeof(double));
   sendbufR = (double*) calloc (nsbs, sizeof(double));
 
   /* total number of points */
 #ifdef serial
   if (context->evaluate_norm)    NT = n;
   else                           NT = 0;
 #else
   if (context->evaluate_norm) MPI_Allreduce(&n,&NT,1,MPI_INT,MPI_SUM,*comm);
   else NT = 0;
 #endif
 
 #ifdef serial
     if (context->verbose) printf("\n");
 #else
     if (context->verbose && (!rank)) printf("\n");
 #endif
 
   iter = 0;
   while(1) {
 
     /* evaluate break conditions */
     if (    (iter >= context->maxiter) 
         ||  (iter && context->evaluate_norm && (global_norm < context->atol)) 
         ||  (iter && context->evaluate_norm && (global_norm/norm0 < context->rtol))  ) {
       break;
     }
 
     /* Communicate the boundary x values between processors */
     for (d=0; d<nsbs; d++)  recvbufL[d] = recvbufR[d] = 0;
 #ifndef serial
     MPI_Request req[4] =  {MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL};
     if (rank)             MPI_Irecv(recvbufL,nsbs,MPI_DOUBLE,rank-1,2,*comm,&req[0]);
     if (rank != nproc-1)  MPI_Irecv(recvbufR,nsbs,MPI_DOUBLE,rank+1,3,*comm,&req[1]);
     for (d=0; d<nsbs; d++) {
       sendbufL[d] = x[d];
       sendbufR[d] = x[(n-1)*nsbs+d];
     }
     if (rank)             MPI_Isend(sendbufL,nsbs,MPI_DOUBLE,rank-1,3,*comm,&req[2]);
     if (rank != nproc-1)  MPI_Isend(sendbufR,nsbs,MPI_DOUBLE,rank+1,2,*comm,&req[3]);
 #endif
 
     /* calculate error norm - interior */
     if (context->evaluate_norm) {
       norm = 0;
       for (i=1; i<n-1; i++) {
         for (d=0; d<ns; d++) {
           double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[(i*ns+d)*bs+j];
           _MatVecMultiplySubtract_(err,a+(i*ns+d)*bs2,x+((i-1)*ns+d)*bs,bs);
           _MatVecMultiplySubtract_(err,b+(i*ns+d)*bs2,x+((i  )*ns+d)*bs,bs);
           _MatVecMultiplySubtract_(err,c+(i*ns+d)*bs2,x+((i+1)*ns+d)*bs,bs);
           for (j=0; j<bs; j++) norm += (err[j]*err[j]);
         }
       }
     }
     /* calculate error norm - boundary */
 #ifndef serial
     MPI_Waitall(4,req,MPI_STATUS_IGNORE);
 #endif
     if (context->evaluate_norm) {
       if (n > 1) {
         for (d=0; d<ns; d++) {
           double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[d*bs+j];
           _MatVecMultiplySubtract_(err,a+d*bs2,recvbufL+d*bs,bs);
           _MatVecMultiplySubtract_(err,b+d*bs2,x+d*bs,bs);
           _MatVecMultiplySubtract_(err,c+d*bs2,x+(ns+d)*bs,bs);
           for (j=0; j<bs; j++) norm += (err[j]*err[j]);
         }
         for (d=0; d<ns; d++) {
           double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[(d+ns*(n-1))*bs+j];
           _MatVecMultiplySubtract_(err,a+(d+ns*(n-1))*bs2,x+(d+ns*(n-2))*bs,bs);
           _MatVecMultiplySubtract_(err,b+(d+ns*(n-1))*bs2,x+(d+ns*(n-1))*bs,bs);
           _MatVecMultiplySubtract_(err,c+(d+ns*(n-1))*bs2,recvbufR+d*bs,bs);
           for (j=0; j<bs; j++) norm += (err[j]*err[j]);
         }
       } else {
         for (d=0; d<ns; d++) {
           double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[d*bs+j];
           _MatVecMultiplySubtract_(err,a+d*bs2,recvbufL+d*bs,bs);
           _MatVecMultiplySubtract_(err,b+d*bs2,x+d*bs,bs);
           _MatVecMultiplySubtract_(err,c+d*bs2,recvbufR+d*bs,bs);
           for (j=0; j<bs; j++) norm += (err[j]*err[j]);
         }
       }
       /* sum over all processes */
 #ifdef serial
       global_norm = norm;
 #else
       MPI_Allreduce(&norm,&global_norm,1,MPI_DOUBLE,MPI_SUM,*comm);
 #endif
       global_norm = sqrt(global_norm/NT);
       if (!iter) norm0 = global_norm;
     } else {
       norm = -1.0;
       global_norm = -1.0;
     }
 
 #ifdef serial
     if (context->verbose)
 #else
     if (context->verbose && (!rank))
 #endif
       printf("\t\titer: %d, norm: %1.16E\n",iter,global_norm);
 
     /* correct the solution for this iteration */
     if (n > 1) {
       for (d=0; d<ns; d++) {
         double xt[bs],binv[bs2];
         
         i = 0;    
         for (j=0; j<bs; j++) xt[j] = rhs[(i*ns+d)*bs+j];
         _MatVecMultiplySubtract_(xt,a+(i*ns+d)*bs2,recvbufL+d*bs,bs);
         _MatVecMultiplySubtract_(xt,c+(i*ns+d)*bs2,x+(d+ns*(i+1))*bs,bs);
         _MatrixInvert_(b+(i*ns+d)*bs2,binv,bs);
         _MatVecMultiply_(binv,xt,x+(i*ns+d)*bs,bs);
 
         i = n-1;  
         for (j=0; j<bs; j++) xt[j] = rhs[(i*ns+d)*bs+j];
         _MatVecMultiplySubtract_(xt,a+(i*ns+d)*bs2,x+(d+ns*(i-1))*bs,bs);
         _MatVecMultiplySubtract_(xt,c+(i*ns+d)*bs2,recvbufR+d*bs,bs);
         _MatrixInvert_(b+(i*ns+d)*bs2,binv,bs);
         _MatVecMultiply_(binv,xt,x+(i*ns+d)*bs,bs);
       }
       for (i=1; i<n-1; i++) {
         for (d=0; d<ns; d++) {
           double xt[bs],binv[bs2];
           for (j=0; j<bs; j++) xt[j] = rhs[(i*ns+d)*bs+j];
           _MatVecMultiplySubtract_(xt,a+(i*ns+d)*bs2,x+(d+ns*(i-1))*bs,bs);
           _MatVecMultiplySubtract_(xt,c+(i*ns+d)*bs2,x+(d+ns*(i+1))*bs,bs);
           _MatrixInvert_(b+(i*ns+d)*bs2,binv,bs);
           _MatVecMultiply_(binv,xt,x+(i*ns+d)*bs,bs);
         }
       }
     } else {
       for (d=0; d<ns; d++) {
         double xt[bs],binv[bs2];
         for (j=0; j<bs; j++) xt[j] = rhs[d*bs+j];
         _MatVecMultiplySubtract_(xt,a+d*bs2,recvbufL+d*bs,bs);
         _MatVecMultiplySubtract_(xt,c+d*bs2,recvbufR+d*bs,bs);
         _MatrixInvert_(b+d*bs2,binv,bs);
         _MatVecMultiply_(binv,xt,x+d*bs,bs);
       }
     }
 
     /* finished with this iteration */
     iter++;
   }
 
   /* save convergence information */
   context->exitnorm = (context->evaluate_norm ? global_norm : -1.0);
   context->exititer = iter;
 
   free(rhs);
   free(sendbufL);
   free(sendbufR);
   free(recvbufL);
   free(recvbufR);
 
   return(0);
 }

int tridiagScaLPK	(	double *	a,
		double *	b,
		double *	c,
		double *	x,
		int	n,
		int	ns,
		void *	r,
		void *	m
	)

Solve tridiagonal (non-periodic) systems of equations using ScaLAPACK's pddtsv: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides.

This function is compiled only with the compilation flag "-Dwith_scalapack" is specified.
This function calls the ScaLAPACK function for solving tridiagonal systems individually for each system, and thus may not be efficient.

Array layout: The arguments a, b, c, and x are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, superdiagonal, and right-hand-side) of size (n X ns), where n is the local size of the system, and ns is the number of independent systems to solve. The ordering of the elements in these arrays is as follows:

Elements of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} b_0^k & c_0^k & & & \\ a_1^k & b_1^k & c_1^k & & \\ & a_2^k & b_2^k & c_2^k & & \\ & & a_3^k & b_3^k & c_3^k & \\ & & & a_4^k & b_4^k & c_4^k \\ \end{array}\right] \left[\begin{array}{c} x_0^k \\ x_1^k \\ x_2^k \\ x_3^k \\ x_4^k \end{array}\right] = \left[\begin{array}{c} r_0^k \\ r_1^k \\ r_2^k \\ r_3^k \\ r_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

and let \( ns = 3\). Note that in the code, \(x\) and \(r\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_0^0, b_0^1, b_0^2, (diagonal element of the first row in each system)
b_1^0, b_1^1, b_1^2, (diagonal element of the second row in each system)
...,
b_{n-1}^0, b_{n-1}^1, b_{n-1}^2 (diagonal element of the last row in each system)
]
The arrays a, c, and x are stored similarly.

Notes:

This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
The input array x contains the right-hand-side on entering the function, and the solution on exiting it.

Parameters

a	Array containing the sub-diagonal elements
b	Array containing the diagonal elements
c	Array containing the super-diagonal elements
x	Right-hand side; will contain the solution on exit
n	Local size of the system on this processor
ns	Number of systems to solve
r	Object of type TridiagLU
m	MPI communicator

Definition at line 71 of file tridiagScaLPK.c.

 {
   TridiagLU       *params = (TridiagLU*) r;
   int             rank,nproc,nglobal,nrhs,i,s,ia,ib,desca[9],descb[9],ierr,
                   lwork;
   double          *dl,*d,*du,*rhs,*work;
   struct timeval  start,end;
 
 #ifdef serial
   rank  = 0;
   nproc = 1;
   nglobal=n;
 #else
   MPI_Comm        *comm = (MPI_Comm*) m;
 
   if (comm) {
     MPI_Comm_size(*comm,&nproc);
     MPI_Comm_rank(*comm,&rank);
   } else {
     rank  = 0;
     nproc = 1;
   }
   MPI_Allreduce(&n,&nglobal,1,MPI_INT,MPI_SUM,*comm);
 #endif
 
   /* check */
   if (nglobal%n != 0) {
     if (!rank) {
       fprintf(stderr,"Error: The ScaLAPACK wrapper can only handle cases where the global ");
       fprintf(stderr,"size of system is an integer multiple of no. of processes.\n");
     }
     return(1);
   }
 
   if (!params) {
     fprintf(stderr,"Error in tridiagLU(): NULL pointer passed for parameters.\n");
     return(1);
   }
 
 
   nrhs = 1;
   ia = 1;
   ib = 1;
 
   lwork = (12*nproc+3*n) + ( (8*nproc) > (10*nproc+4*nrhs) ? (8*nproc) : (10*nproc+4*nrhs) );
 
   desca[0] = 501;
   desca[1] = params->blacs_ctxt;
   desca[2] = nglobal;
   desca[3] = n;
   desca[4] = 0;
   desca[5] = 0;
   desca[6] = 0;
   desca[7] = 0;
   desca[8] = 0;
 
   descb[0] = 502;
   descb[1] = params->blacs_ctxt;
   descb[2] = nglobal;
   descb[3] = n;
   descb[4] = 0;
   descb[5] = n;
   descb[6] = 0;
   descb[7] = 0;
   descb[8] = 0;
 
   dl    = (double*) calloc (n,sizeof(double));
   d     = (double*) calloc (n,sizeof(double));
   du    = (double*) calloc (n,sizeof(double));
   rhs   = (double*) calloc (n,sizeof(double));
   work  = (double*) calloc(lwork,sizeof(double));
 
   params->total_time = 0.0;
   params->stage1_time = 0.0;
   params->stage2_time = 0.0;
   params->stage3_time = 0.0;
   params->stage4_time = 0.0;
   for (s=0; s<ns; s++) {
 
     for (i=0; i<n; i++) {
       dl[i] = a[i*ns+s];
       d [i] = b[i*ns+s];
       du[i] = c[i*ns+s];
       rhs[i]= x[i*ns+s];
     }
 
     /* call the ScaLAPACK function */
     gettimeofday(&start,NULL);
     pddtsv_(&nglobal,&nrhs,dl,d,du,&ia,desca,rhs,&ib,descb,work,&lwork,&ierr);
     gettimeofday(&end,NULL);
     if (ierr) return(ierr);
     
     for (i=0; i<n; i++) x[i*ns+s] = rhs[i];
 
     long long walltime;
     walltime = ((end.tv_sec * 1000000 + end.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
     params->total_time += (double) walltime / 1000000.0;
   }
 
 
   free(dl);
   free(d);
   free(du);
   free(rhs);
   free(work);
 
   return(0);
 }

Data Structures

Macros

Functions

Detailed Description

Data Structure Documentation

Macro Definition Documentation

Function Documentation