TridiagLU  1.0
Scalable, parallel solver for tridiagonal system of equations
tridiagLU.h File Reference

Header file for TridiagLU. More...

Go to the source code of this file.

Data Structures

struct  TridiagLU
 

Macros

#define _TRIDIAG_JACOBI_   "jacobi"
 
#define _TRIDIAG_GS_   "gather-and-solve"
 

Functions

int tridiagLU (double *, double *, double *, double *, int, int, void *, void *)
 
int tridiagLUGS (double *, double *, double *, double *, int, int, void *, void *)
 
int tridiagIterJacobi (double *, double *, double *, double *, int, int, void *, void *)
 
int tridiagLUInit (void *, void *)
 
int blocktridiagLU (double *, double *, double *, double *, int, int, int, void *, void *)
 
int blocktridiagIterJacobi (double *, double *, double *, double *, int, int, int, void *, void *)
 
int tridiagScaLPK (double *, double *, double *, double *, int, int, void *, void *)
 

Detailed Description

Header file for TridiagLU.

Author
Debojyoti Ghosh

Definition in file tridiagLU.h.


Data Structure Documentation

struct TridiagLU

Definition at line 81 of file tridiagLU.h.

Data Fields
char reducedsolvetype[50]

Choice of solver for solving the reduced system. May be _TRIDIAG_JACOBI_ or _TRIDIAG_GS_.

int evaluate_norm

calculate norm at each iteration? (relevant only for iterative solvers)

int maxiter

maximum number of iterations (relevant only for iterative solvers)

double atol

absolute tolerance (relevant only for iterative solvers)

double rtol

relative tolerace (relevant only for iterative solvers)

int exititer

number of iterations it ran for (relevant only for iterative solvers)

double exitnorm

error norm at exit (relevant only for iterative solvers)

int verbose

print iterations and norms (relevant only for iterative solvers)

double total_time

Total wall time in seconds

double stage1_time

Wall time (in seconds) for stage 1 of tridiagLU() or blocktridiagLU()

double stage2_time

Wall time (in seconds) for stage 2 of tridiagLU() or blocktridiagLU()

double stage3_time

Wall time (in seconds) for stage 3 of tridiagLU() or blocktridiagLU()

double stage4_time

Wall time (in seconds) for stage 4 of tridiagLU() or blocktridiagLU()

int blacs_ctxt

Context variable for ScaLAPACK (relevant if compiled with ScaLAPACK support (-Dwith_scalapack)

See also
tridiagScaLPK

Macro Definition Documentation

#define _TRIDIAG_JACOBI_   "jacobi"

Jacobi method

See also
tridiagIterJacobi(), blocktridiagIterJacobi()

Definition at line 71 of file tridiagLU.h.

#define _TRIDIAG_GS_   "gather-and-solve"

"Gather-and-solve" method

See also
tridiagLUGS

Definition at line 73 of file tridiagLU.h.

Function Documentation

int tridiagLU ( double *  a,
double *  b,
double *  c,
double *  x,
int  n,
int  ns,
void *  r,
void *  m 
)

Solve tridiagonal (non-periodic) systems of equations using parallel LU decomposition: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The iterative substructuring method is used in this function that can be briefly described through the following 4 stages:

  • Stage 1: Parallel elimination of the tridiagonal blocks on each processor comprising all points of the subdomain except the 1st point (unless its the 1st global point, i.e., a physical boundary)
  • Stage 2: Elimination of the 1st row on each processor (except the 1st processor) using the last row of the previous processor.
  • Stage 3: Solution of the reduced tridiagonal system that represents the coupling of the system across the processors, using blocktridiagIterJacobi() in this implementation.
  • Stage 4: Backward-solve to obtain the final solution

Specific details of the method implemented here are available in:

More references on this class of parallel tridiagonal solvers:

  • E. Polizzi and A. H. Sameh, "A parallel hybrid banded system solver: The SPIKE algorithm", Parallel Comput., 32 (2006), pp. 177–194.
  • E. Polizzi and A. H. Sameh, "SPIKE: A parallel environment for solving banded linear systems", Comput. & Fluids, 36 (2007), pp. 113–120.

Array layout: The arguments a, b, c, and x are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, superdiagonal, and right-hand-side) of size (n X ns), where n is the local size of the system, and ns is the number of independent systems to solve. The ordering of the elements in these arrays is as follows:

  • Elements of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} b_0^k & c_0^k & & & \\ a_1^k & b_1^k & c_1^k & & \\ & a_2^k & b_2^k & c_2^k & & \\ & & a_3^k & b_3^k & c_3^k & \\ & & & a_4^k & b_4^k & c_4^k \\ \end{array}\right] \left[\begin{array}{c} x_0^k \\ x_1^k \\ x_2^k \\ x_3^k \\ x_4^k \end{array}\right] = \left[\begin{array}{c} r_0^k \\ r_1^k \\ r_2^k \\ r_3^k \\ r_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

and let \( ns = 3\). Note that in the code, \(x\) and \(r\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_0^0, b_0^1, b_0^2, (diagonal element of the first row in each system)
b_1^0, b_1^1, b_1^2, (diagonal element of the second row in each system)
...,
b_{n-1}^0, b_{n-1}^1, b_{n-1}^2 (diagonal element of the last row in each system)
]
The arrays a, c, and x are stored similarly.

Notes:

  • This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
  • The input array x contains the right-hand-side on entering the function, and the solution on exiting it.
Parameters
aArray containing the sub-diagonal elements
bArray containing the diagonal elements
cArray containing the super-diagonal elements
xRight-hand side; will contain the solution on exit
nLocal size of the system on this processor
nsNumber of systems to solve
rObject of type TridiagLU
mMPI communicator

Definition at line 83 of file tridiagLU.c.

93 {
94  TridiagLU *params = (TridiagLU*) r;
95  int d,i,istart,iend;
96  int rank,nproc;
97  struct timeval start,stage1,stage2,stage3,stage4;
98 
99 #ifdef serial
100  rank = 0;
101  nproc = 1;
102 #else
103  MPI_Comm *comm = (MPI_Comm*) m;
104  int ierr = 0;
105  const int nvar = 4;
106 
107  if (comm) {
108  MPI_Comm_size(*comm,&nproc);
109  MPI_Comm_rank(*comm,&rank);
110  } else {
111  rank = 0;
112  nproc = 1;
113  }
114 #endif
115 
116  if (!params) {
117  fprintf(stderr,"Error in tridiagLU(): NULL pointer passed for parameters.\n");
118  return(1);
119  }
120 
121  /* start */
122  gettimeofday(&start,NULL);
123 
124  if ((ns == 0) || (n == 0)) return(0);
125  double *xs1, *xp1;
126  xs1 = (double*) calloc (ns, sizeof(double));
127  xp1 = (double*) calloc (ns, sizeof(double));
128  for (i=0; i<ns; i++) xs1[i] = xp1[i] = 0;
129 
130  /* Stage 1 - Parallel elimination of subdiagonal entries */
131  istart = (rank == 0 ? 1 : 2);
132  iend = n;
133  for (i = istart; i < iend; i++) {
134  for (d = 0; d < ns; d++) {
135  if (b[(i-1)*ns+d] == 0) return(-1);
136  double factor = a[i*ns+d] / b[(i-1)*ns+d];
137  b[i*ns+d] -= factor * c[(i-1)*ns+d];
138  a[i*ns+d] = -factor * a[(i-1)*ns+d];
139  x[i*ns+d] -= factor * x[(i-1)*ns+d];
140  if (rank) {
141  double factor = c[d] / b[(i-1)*ns+d];
142  c[d] = -factor * c[(i-1)*ns+d];
143  b[d] -= factor * a[(i-1)*ns+d];
144  x[d] -= factor * x[(i-1)*ns+d];
145  }
146  }
147  }
148 
149  /* end of stage 1 */
150  gettimeofday(&stage1,NULL);
151 
152  /* Stage 2 - Eliminate the first sub- & super-diagonal entries */
153  /* This needs the last (a,b,c,x) from the previous process */
154 #ifndef serial
155  double *sendbuf, *recvbuf;
156  sendbuf = (double*) calloc (ns*nvar, sizeof(double));
157  recvbuf = (double*) calloc (ns*nvar, sizeof(double));
158  for (d=0; d<ns; d++) {
159  sendbuf[d*nvar+0] = a[(n-1)*ns+d];
160  sendbuf[d*nvar+1] = b[(n-1)*ns+d];
161  sendbuf[d*nvar+2] = c[(n-1)*ns+d];
162  sendbuf[d*nvar+3] = x[(n-1)*ns+d];
163  }
164  if (nproc > 1) {
165  MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
166  if (rank) MPI_Irecv(recvbuf,nvar*ns,MPI_DOUBLE,rank-1,1436,*comm,&req[0]);
167  if (rank != nproc-1) MPI_Isend(sendbuf,nvar*ns,MPI_DOUBLE,rank+1,1436,*comm,&req[1]);
168  MPI_Waitall(2,&req[0],MPI_STATUS_IGNORE);
169  }
170  /* The first process sits this one out */
171  if (rank) {
172  for (d = 0; d < ns; d++) {
173  double am1, bm1, cm1, xm1;
174  am1 = recvbuf[d*nvar+0];
175  bm1 = recvbuf[d*nvar+1];
176  cm1 = recvbuf[d*nvar+2];
177  xm1 = recvbuf[d*nvar+3];
178  double factor;
179  if (bm1 == 0) return(-1);
180  factor = a[d] / bm1;
181  b[d] -= factor * cm1;
182  a[d] = -factor * am1;
183  x[d] -= factor * xm1;
184  if (b[(n-1)*ns+d] == 0) return(-1);
185  factor = c[d] / b[(n-1)*ns+d];
186  b[d] -= factor * a[(n-1)*ns+d];
187  c[d] = -factor * c[(n-1)*ns+d];
188  x[d] -= factor * x[(n-1)*ns+d];
189  }
190  }
191  free(sendbuf);
192  free(recvbuf);
193 #endif
194 
195  /* end of stage 2 */
196  gettimeofday(&stage2,NULL);
197 
198  /* Stage 3 - Solve the reduced (nproc-1) X (nproc-1) tridiagonal system */
199 #ifndef serial
200  if (nproc > 1) {
201  double *zero, *one;
202  zero = (double*) calloc (ns, sizeof(double));
203  one = (double*) calloc (ns, sizeof(double));
204  for (d=0; d<ns; d++) {
205  zero[d] = 0.0;
206  one [d] = 1.0;
207  }
208  if (!strcmp(params->reducedsolvetype,_TRIDIAG_GS_)) {
209  /* Solving the reduced system by gather-and-solve algorithm */
210  if (rank) ierr = tridiagLUGS(a,b,c,x,1,ns,params,comm);
211  else ierr = tridiagLUGS(zero,one,zero,zero,1,ns,params,comm);
212  if (ierr) return(ierr);
213  } else if (!strcmp(params->reducedsolvetype,_TRIDIAG_JACOBI_)) {
214  /* Solving the reduced system iteratively with the Jacobi method */
215  if (rank) ierr = tridiagIterJacobi(a,b,c,x,1,ns,params,comm);
216  else ierr = tridiagIterJacobi(zero,one,zero,zero,1,ns,params,comm);
217  }
218  free(zero);
219  free(one);
220 
221  /* Each process, get the first x of the next process */
222  MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
223  for (d=0; d<ns; d++) xs1[d] = x[d];
224  if (rank+1 < nproc) MPI_Irecv(xp1,ns,MPI_DOUBLE,rank+1,1323,*comm,&req[0]);
225  if (rank) MPI_Isend(xs1,ns,MPI_DOUBLE,rank-1,1323,*comm,&req[1]);
226  MPI_Waitall(2,&req[0],MPI_STATUS_IGNORE);
227  }
228 #else
229  if (nproc > 1) {
230  fprintf(stderr,"Error: nproc > 1 for a serial run!\n");
231  return(1);
232  }
233 #endif /* if not serial */
234  /* end of stage 3 */
235  gettimeofday(&stage3,NULL);
236 
237  /* Stage 4 - Parallel back-substitution to get the solution */
238  istart = n-1;
239  iend = (rank == 0 ? 0 : 1);
240 
241  for (d = 0; d < ns; d++) {
242  if (b[istart*ns+d] == 0) return(-1);
243  x[istart*ns+d] = (x[istart*ns+d]-a[istart*ns+d]*x[d]-c[istart*ns+d]*xp1[d]) / b[istart*ns+d];
244  }
245  for (i = istart-1; i > iend-1; i--) {
246  for (d = 0; d < ns; d++) {
247  if (b[i*ns+d] == 0) return(-1);
248  x[i*ns+d] = (x[i*ns+d]-c[i*ns+d]*x[(i+1)*ns+d]-a[i*ns+d]*x[d]) / b[i*ns+d];
249  }
250  }
251 
252  /* end of stage 4 */
253  gettimeofday(&stage4,NULL);
254 
255  /* Done - now x contains the solution */
256  free(xs1);
257  free(xp1);
258 
259  /* save runtimes if needed */
260  long long walltime;
261  walltime = ((stage1.tv_sec * 1000000 + stage1.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
262  params->stage1_time = (double) walltime / 1000000.0;
263  walltime = ((stage2.tv_sec * 1000000 + stage2.tv_usec) - (stage1.tv_sec * 1000000 + stage1.tv_usec));
264  params->stage2_time = (double) walltime / 1000000.0;
265  walltime = ((stage3.tv_sec * 1000000 + stage3.tv_usec) - (stage2.tv_sec * 1000000 + stage2.tv_usec));
266  params->stage3_time = (double) walltime / 1000000.0;
267  walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (stage3.tv_sec * 1000000 + stage3.tv_usec));
268  params->stage4_time = (double) walltime / 1000000.0;
269  walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
270  params->total_time = (double) walltime / 1000000.0;
271  return(0);
272 }
double stage3_time
Definition: tridiagLU.h:101
int tridiagLUGS(double *, double *, double *, double *, int, int, void *, void *)
Definition: tridiagLUGS.c:64
double stage2_time
Definition: tridiagLU.h:100
double stage1_time
Definition: tridiagLU.h:99
#define _TRIDIAG_JACOBI_
Definition: tridiagLU.h:71
double stage4_time
Definition: tridiagLU.h:102
char reducedsolvetype[50]
Definition: tridiagLU.h:88
#define _TRIDIAG_GS_
Definition: tridiagLU.h:73
int tridiagIterJacobi(double *, double *, double *, double *, int, int, void *, void *)
double total_time
Definition: tridiagLU.h:98
int tridiagLUGS ( double *  a,
double *  b,
double *  c,
double *  x,
int  n,
int  ns,
void *  r,
void *  m 
)

Solve tridiagonal (non-periodic) systems of equations using the gather-and-solve approach: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The "gather-and-solve" approach gathers a tridiagonal system on one processor and solves it using tridiagLU() (sending NULL as the argument for MPI communicator to indicate that a serial solution is desired). Given multiple tridiagonal systems (ns > 1), this function will gather the systems on different processors in an optimal way, and thus each processor will solve some of the systems. After the system(s) is (are) solved, the solution(s) is (are) scattered back to the original processors.

Array layout: The arguments a, b, c, and x are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, superdiagonal, and right-hand-side) of size (n X ns), where n is the local size of the system, and ns is the number of independent systems to solve. The ordering of the elements in these arrays is as follows:

  • Elements of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} b_0^k & c_0^k & & & \\ a_1^k & b_1^k & c_1^k & & \\ & a_2^k & b_2^k & c_2^k & & \\ & & a_3^k & b_3^k & c_3^k & \\ & & & a_4^k & b_4^k & c_4^k \\ \end{array}\right] \left[\begin{array}{c} x_0^k \\ x_1^k \\ x_2^k \\ x_3^k \\ x_4^k \end{array}\right] = \left[\begin{array}{c} r_0^k \\ r_1^k \\ r_2^k \\ r_3^k \\ r_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

and let \( ns = 3\). Note that in the code, \(x\) and \(r\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_0^0, b_0^1, b_0^2, (diagonal element of the first row in each system)
b_1^0, b_1^1, b_1^2, (diagonal element of the second row in each system)
...,
b_{n-1}^0, b_{n-1}^1, b_{n-1}^2 (diagonal element of the last row in each system)
]
The arrays a, c, and x are stored similarly.

Notes:

  • This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
  • The input array x contains the right-hand-side on entering the function, and the solution on exiting it.
Parameters
aArray containing the sub-diagonal elements
bArray containing the diagonal elements
cArray containing the super-diagonal elements
xRight-hand side; will contain the solution on exit
nLocal size of the system on this processor
nsNumber of systems to solve
rObject of type TridiagLU
mMPI communicator

Definition at line 64 of file tridiagLUGS.c.

74 {
75  TridiagLU *context = (TridiagLU*) r;
76  if (!context) {
77  fprintf(stderr,"Error in tridiagLUGS(): NULL pointer passed for parameters.\n");
78  return(-1);
79  }
80 #ifdef serial
81 
82  /* Serial compilation */
83  return(tridiagLU(a,b,c,x,n,ns,context,m));
84 
85 #else
86 
87  int d,i,ierr = 0,dstart,istart,p,q;
88  const int nvar = 4;
89  double *sendbuf,*recvbuf;
90  int rank,nproc;
91 
92  /* Parallel compilation */
93  MPI_Comm *comm = (MPI_Comm*) m;
94  if (!comm) return(tridiagLU(a,b,c,x,n,ns,context,NULL));
95  MPI_Comm_size(*comm,&nproc);
96  MPI_Comm_rank(*comm,&rank);
97 
98  if ((ns == 0) || (n == 0)) return(0);
99 
100  /*
101  each process needs to know the local sizes of every other process
102  and total size of the system
103  */
104  int *N = (int*) calloc (nproc, sizeof(int));
105  MPI_Allgather(&n,1,MPI_INT,N,1,MPI_INT,*comm);
106  int NT = 0; for (i=0; i<nproc; i++) NT += N[i];
107 
108  /* counts and displacements for gather and scatter operations */
109  int *counts = (int*) calloc (nproc, sizeof(int));
110  int *displ = (int*) calloc (nproc, sizeof(int));
111 
112  /* on all processes, calculate the number of systems each */
113  /* process has to solve */
114  int *ns_local = (int*) calloc (nproc,sizeof(int));
115  for (p=0; p<nproc; p++) ns_local[p] = ns / nproc;
116  for (p=0; p<ns%nproc; p++) ns_local[p]++;
117 
118  /* allocate the arrays for the gathered tridiagonal systems */
119  double *ra=NULL,*rb=NULL,*rc=NULL,*rx=NULL;
120  if (ns_local[rank] > 0) {
121  ra = (double*) calloc (ns_local[rank]*NT,sizeof(double));
122  rb = (double*) calloc (ns_local[rank]*NT,sizeof(double));
123  rc = (double*) calloc (ns_local[rank]*NT,sizeof(double));
124  rx = (double*) calloc (ns_local[rank]*NT,sizeof(double));
125  }
126 
127  /* Gather the systems on each process */
128  /* allocate receive buffer */
129  if (ns_local[rank] > 0)
130  recvbuf = (double*) calloc (ns_local[rank]*nvar*NT,sizeof(double));
131  else recvbuf = NULL;
132  dstart = 0;
133  for (p = 0; p < nproc; p++) {
134  if (ns_local[p] > 0) {
135  /* allocate send buffer and form the send packet of data */
136  sendbuf = (double*) calloc (nvar*n*ns_local[p],sizeof(double));
137  for (d = 0; d < ns_local[p]; d++) {
138  for (i = 0; i < n; i++) {
139  sendbuf[n*nvar*d+n*0+i] = a[d+dstart+ns*i];
140  sendbuf[n*nvar*d+n*1+i] = b[d+dstart+ns*i];
141  sendbuf[n*nvar*d+n*2+i] = c[d+dstart+ns*i];
142  sendbuf[n*nvar*d+n*3+i] = x[d+dstart+ns*i];
143  }
144  }
145  dstart += ns_local[p];
146 
147  /* gather these reduced systems on process with rank = p */
148  if (rank == p) {
149  for (q = 0; q < nproc; q++) {
150  counts[q] = nvar*N[q]*ns_local[p];
151  displ [q] = (q == 0 ? 0 : displ[q-1]+counts[q-1]);
152  }
153  }
154  MPI_Gatherv(sendbuf,nvar*n*ns_local[p],MPI_DOUBLE,
155  recvbuf,counts,displ,MPI_DOUBLE,p,*comm);
156 
157  /* deallocate send buffer */
158  free(sendbuf);
159  }
160  }
161  /* extract the data from the recvbuf and solve */
162  istart = 0;
163  for (q = 0; q < nproc; q++) {
164  for (d = 0; d < ns_local[rank]; d++) {
165  for (i = 0; i < N[q]; i++) {
166  ra[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+0*N[q]+i];
167  rb[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+1*N[q]+i];
168  rc[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+2*N[q]+i];
169  rx[d+ns_local[rank]*(istart+i)] = recvbuf[istart*nvar*ns_local[rank]+d*nvar*N[q]+3*N[q]+i];
170  }
171  }
172  istart += N[q];
173  }
174  /* deallocate receive buffer */
175  if (recvbuf) free(recvbuf);
176 
177  /* solve the gathered systems in serial */
178  ierr = tridiagLU(ra,rb,rc,rx,NT,ns_local[rank],context,NULL);
179  if (ierr) return(ierr);
180 
181  /* allocate send buffer and save the data to send */
182  if (ns_local[rank] > 0)
183  sendbuf = (double*) calloc (ns_local[rank]*NT,sizeof(double));
184  else sendbuf = NULL;
185  istart = 0;
186  for (q = 0; q < nproc; q++) {
187  for (i = 0; i < N[q]; i++) {
188  for (d = 0; d < ns_local[rank]; d++) {
189  sendbuf[istart*ns_local[rank]+d*N[q]+i] = rx[d+ns_local[rank]*(istart+i)];
190  }
191  }
192  istart += N[q];
193  }
194  dstart = 0;
195  for (p = 0; p < nproc; p++) {
196  if (ns_local[p] > 0) {
197 
198  /* allocate receive buffer */
199  recvbuf = (double*) calloc (ns_local[p]*n, sizeof(double));
200 
201  /* scatter the solution back */
202  for (q = 0; q < nproc; q++) {
203  counts[q] = ns_local[p]*N[q];
204  displ[q] = (q == 0 ? 0 : displ[q-1]+counts[q-1]);
205  }
206  MPI_Scatterv(sendbuf,counts,displ,MPI_DOUBLE,
207  recvbuf,ns_local[p]*n,MPI_DOUBLE,
208  p,*comm);
209  /* save the solution on all root processes */
210  for (d = 0; d < ns_local[p]; d++) {
211  for (i = 0; i < n; i++) {
212  x[d+dstart+ns*i] = recvbuf[d*n+i];
213  }
214  }
215  dstart += ns_local[p];
216  /* deallocate receive buffer */
217  free(recvbuf);
218  }
219  }
220  /* deallocate send buffer */
221  if (sendbuf) free(sendbuf);
222 
223  /* clean up */
224  if (ns_local[rank] > 0) {
225  free(ra);
226  free(rb);
227  free(rc);
228  free(rx);
229  }
230  free(ns_local);
231  free(N);
232  free(displ);
233  free(counts);
234 
235  return(0);
236 #endif
237 }
int tridiagLU(double *, double *, double *, double *, int, int, void *, void *)
Definition: tridiagLU.c:83
int tridiagIterJacobi ( double *  a,
double *  b,
double *  c,
double *  x,
int  n,
int  ns,
void *  r,
void *  m 
)

Solve tridiagonal (non-periodic) systems of equations using point Jacobi iterations: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The initial guess is taken as the solution of

\begin{equation} {\rm diag}\left[{\bf b}\right]{\bf x} = {\bf r} \end{equation}

where \({\bf b}\) represents the diagonal elements of the tridiagonal system, and \({\bf r}\) is the right-hand-side, stored in \({\bf x}\) at the start of this function.

Array layout: The arguments a, b, c, and x are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, superdiagonal, and right-hand-side) of size (n X ns), where n is the local size of the system, and ns is the number of independent systems to solve. The ordering of the elements in these arrays is as follows:

  • Elements of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} b_0^k & c_0^k & & & \\ a_1^k & b_1^k & c_1^k & & \\ & a_2^k & b_2^k & c_2^k & & \\ & & a_3^k & b_3^k & c_3^k & \\ & & & a_4^k & b_4^k & c_4^k \\ \end{array}\right] \left[\begin{array}{c} x_0^k \\ x_1^k \\ x_2^k \\ x_3^k \\ x_4^k \end{array}\right] = \left[\begin{array}{c} r_0^k \\ r_1^k \\ r_2^k \\ r_3^k \\ r_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

and let \( ns = 3\). Note that in the code, \(x\) and \(r\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_0^0, b_0^1, b_0^2, (diagonal element of the first row in each system)
b_1^0, b_1^1, b_1^2, (diagonal element of the second row in each system)
...,
b_{n-1}^0, b_{n-1}^1, b_{n-1}^2 (diagonal element of the last row in each system)
]
The arrays a, c, and x are stored similarly.

Notes:

  • This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
  • The input array x contains the right-hand-side on entering the function, and the solution on exiting it.
Parameters
aArray containing the sub-diagonal elements
bArray containing the diagonal elements
cArray containing the super-diagonal elements
xRight-hand side; will contain the solution on exit
nLocal size of the system on this processor
nsNumber of systems to solve
rObject of type TridiagLU
mMPI communicator

Definition at line 64 of file tridiagIterJacobi.c.

74 {
75  TridiagLU *context = (TridiagLU*) r;
76  int iter,d,i,NT;
77  double norm=0,norm0=0,global_norm=0;
78 
79 #ifndef serial
80  MPI_Comm *comm = (MPI_Comm*) m;
81  int rank,nproc;
82 
83  if (comm) {
84  MPI_Comm_size(*comm,&nproc);
85  MPI_Comm_rank(*comm,&rank);
86  } else {
87  rank = 0;
88  nproc = 1;
89  }
90 #endif
91 
92  if (!context) {
93  fprintf(stderr,"Error in tridiagIterJacobi(): NULL pointer passed for parameters!\n");
94  return(-1);
95  }
96 
97  /* check for zero along the diagonal */
98  for (i=0; i<n; i++) {
99  for (d=0; d<ns; d++) {
100  if (b[i*ns+d]*b[i*ns+d] < context->atol*context->atol) {
101  fprintf(stderr,"Error in tridiagIterJacobi(): Encountered zero on main diagonal!\n");
102  return(1);
103  }
104  }
105  }
106 
107  double *rhs = (double*) calloc (ns*n, sizeof(double));
108  for (i=0; i<n; i++) {
109  for (d=0; d<ns; d++) {
110  rhs[i*ns+d] = x[i*ns+d]; /* save a copy of the rhs */
111  x[i*ns+d] /= b[i*ns+d]; /* initial guess */
112  }
113  }
114 
115  double *recvbufL, *recvbufR, *sendbufL, *sendbufR;
116  recvbufL = (double*) calloc (ns, sizeof(double));
117  recvbufR = (double*) calloc (ns, sizeof(double));
118  sendbufL = (double*) calloc (ns, sizeof(double));
119  sendbufR = (double*) calloc (ns, sizeof(double));
120 
121  /* total number of points */
122 #ifdef serial
123  if (context->evaluate_norm) NT = n;
124  else NT = 0;
125 #else
126  if (context->evaluate_norm) MPI_Allreduce(&n,&NT,1,MPI_INT,MPI_SUM,*comm);
127  else NT = 0;
128 #endif
129 
130 #ifdef serial
131  if (context->verbose) printf("\n");
132 #else
133  if (context->verbose && (!rank)) printf("\n");
134 #endif
135 
136  iter = 0;
137  while(1) {
138 
139  /* evaluate break conditions */
140  if ( (iter >= context->maxiter)
141  || (iter && context->evaluate_norm && (global_norm < context->atol))
142  || (iter && context->evaluate_norm && (global_norm/norm0 < context->rtol)) ) {
143  break;
144  }
145 
146  /* Communicate the boundary x values between processors */
147  for (d=0; d<ns; d++) recvbufL[d] = recvbufR[d] = 0;
148 #ifndef serial
149  MPI_Request req[4] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL};
150  if (rank) MPI_Irecv(recvbufL,ns,MPI_DOUBLE,rank-1,2,*comm,&req[0]);
151  if (rank != nproc-1) MPI_Irecv(recvbufR,ns,MPI_DOUBLE,rank+1,3,*comm,&req[1]);
152  for (d=0; d<ns; d++) { sendbufL[d] = x[d]; sendbufR[d] = x[(n-1)*ns+d]; }
153  if (rank) MPI_Isend(sendbufL,ns,MPI_DOUBLE,rank-1,3,*comm,&req[2]);
154  if (rank != nproc-1) MPI_Isend(sendbufR,ns,MPI_DOUBLE,rank+1,2,*comm,&req[3]);
155 #endif
156 
157  /* calculate error norm - interior */
158  if (context->evaluate_norm) {
159  norm = 0;
160  for (i=1; i<n-1; i++) {
161  for (d=0; d<ns; d++) {
162  norm += ( (a[i*ns+d]*x[(i-1)*ns+d] + b[i*ns+d]*x[i*ns+d] + c[i*ns+d]*x[(i+1)*ns+d] - rhs[i*ns+d])
163  * (a[i*ns+d]*x[(i-1)*ns+d] + b[i*ns+d]*x[i*ns+d] + c[i*ns+d]*x[(i+1)*ns+d] - rhs[i*ns+d]) );
164  }
165  }
166  }
167  /* calculate error norm - boundary */
168 #ifndef serial
169  MPI_Waitall(4,req,MPI_STATUS_IGNORE);
170 #endif
171  if (context->evaluate_norm) {
172  if (n > 1) {
173  for (d=0; d<ns; d++) {
174  norm += ( (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*x[d+ns*1]- rhs[d])
175  * (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*x[d+ns*1]- rhs[d]) );
176  }
177  for (d=0; d<ns; d++) {
178  norm += ( (a[d+ns*(n-1)]*x[d+ns*(n-2)] + b[d+ns*(n-1)]*x[d+ns*(n-1)] + c[d+ns*(n-1)]*recvbufR[d] - rhs[d+ns*(n-1)])
179  * (a[d+ns*(n-1)]*x[d+ns*(n-2)] + b[d+ns*(n-1)]*x[d+ns*(n-1)] + c[d+ns*(n-1)]*recvbufR[d] - rhs[d+ns*(n-1)]) );
180  }
181  } else {
182  for (d=0; d<ns; d++) {
183  norm += ( (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*recvbufR[d] - rhs[d])
184  * (a[d]*recvbufL[d] + b[d]*x[d] + c[d]*recvbufR[d] - rhs[d]) );
185  }
186  }
187  /* sum over all processes */
188 #ifdef serial
189  global_norm = norm;
190 #else
191  MPI_Allreduce(&norm,&global_norm,1,MPI_DOUBLE,MPI_SUM,*comm);
192 #endif
193  global_norm = sqrt(global_norm/NT);
194  if (!iter) norm0 = global_norm;
195  } else {
196  norm = -1.0;
197  global_norm = -1.0;
198  }
199 
200 #ifdef serial
201  if (context->verbose)
202 #else
203  if (context->verbose && (!rank))
204 #endif
205  printf("\t\titer: %d, norm: %1.16E\n",iter,global_norm);
206 
207  /* correct the solution for this iteration */
208  if (n > 1) {
209  for (d=0; d<ns; d++) {
210  i = 0; x[i*ns+d] = (rhs[i*ns+d] - a[i*ns+d]*recvbufL[d] - c[i*ns+d]*x[d+ns*(i+1)] ) / b[i*ns+d];
211  i = n-1; x[i*ns+d] = (rhs[i*ns+d] - a[i*ns+d]*x[d+ns*(i-1)] - c[i*ns+d]*recvbufR[d]) / b[i*ns+d];
212  }
213  for (i=1; i<n-1; i++) {
214  for (d=0; d<ns; d++) {
215  x[i*ns+d] = (rhs[i*ns+d] - a[i*ns+d]*x[d+ns*(i-1)] - c[i*ns+d]*x[d+ns*(i+1)]) / b[i*ns+d];
216  }
217  }
218  } else {
219  for (d=0; d<ns; d++) x[d] = (rhs[d] - a[d]*recvbufL[d] - c[d]*recvbufR[d]) / b[d];
220  }
221 
222  /* finished with this iteration */
223  iter++;
224  }
225 
226  /* save convergence information */
227  context->exitnorm = (context->evaluate_norm ? global_norm : -1.0);
228  context->exititer = iter;
229 
230  free(rhs);
231  free(sendbufL);
232  free(sendbufR);
233  free(recvbufL);
234  free(recvbufR);
235 
236  return(0);
237 }
int maxiter
Definition: tridiagLU.h:91
int exititer
Definition: tridiagLU.h:94
int evaluate_norm
Definition: tridiagLU.h:90
double atol
Definition: tridiagLU.h:92
int verbose
Definition: tridiagLU.h:96
double exitnorm
Definition: tridiagLU.h:95
int tridiagLUInit ( void *  r,
void *  c 
)

Initialize the tridiagLU solver by setting the various parameters in TridiagLU to their default values. If the file lusolver.inp is available, read it and set the parameters.

The file lusolver.inp must be in the following format:

    begin
        <keyword>   <value>
        <keyword>   <value>
        <keyword>   <value>
        ...
        <keyword>   <value>
    end

where the list of keywords are:

Keyword name Type Variable Default value
evaluate_norm int TridiagLU::evaluate_norm 1
maxiter int TridiagLU::maxiter 10
atol double TridiagLU::atol 1e-12
rtol double TridiagLU::rtol 1e-10
verbose int TridiagLU::verbose 0
reducedsolvetype char[] TridiagLU::reducedsolvetype _TRIDIAG_JACOBI_
Parameters
rObject of type TridiagLU
cMPI communicator

Definition at line 39 of file tridiagLUInit.c.

43 {
44  TridiagLU *t = (TridiagLU*) r;
45  int rank,ierr;
46 #ifdef serial
47  rank = 0;
48 #else
49  MPI_Comm *comm = (MPI_Comm*) c;
50  if (!comm) rank = 0;
51  else MPI_Comm_rank(*comm,&rank);
52 #endif
53 
54  /* default values */
56  t->evaluate_norm = 1;
57  t->maxiter = 10;
58  t->atol = 1e-12;
59  t->rtol = 1e-10;
60  t->verbose = 0;
61 
62  /* read from file, if available */
63  if (!rank) {
64  FILE *in;
65  in = fopen("lusolver.inp","r");
66  if (!in) {
67  printf("tridiagLUInit: File \"lusolver.inp\" not found. Using default values.\n");
68  } else {
69  char word[100];
70  ierr = fscanf(in,"%s",word); if (ierr != 1) return(1);
71  if (!strcmp(word, "begin")) {
72  while (strcmp(word, "end")) {
73  ierr = fscanf(in,"%s",word); if (ierr != 1) return(1);
74  if (!strcmp(word, "evaluate_norm" )) ierr = fscanf(in,"%d" ,&t->evaluate_norm );
75  else if (!strcmp(word, "maxiter" )) ierr = fscanf(in,"%d" ,&t->maxiter );
76  else if (!strcmp(word, "atol" )) ierr = fscanf(in,"%lf",&t->atol );
77  else if (!strcmp(word, "rtol" )) ierr = fscanf(in,"%lf",&t->rtol );
78  else if (!strcmp(word, "verbose" )) ierr = fscanf(in,"%d" ,&t->verbose );
79  else if (!strcmp(word, "reducedsolvetype")) ierr = fscanf(in,"%s" ,t->reducedsolvetype);
80  else if (strcmp(word,"end")) {
81  char useless[100];
82  ierr = fscanf(in,"%s",useless);
83  printf("Warning: keyword %s in file \"lusolver.inp\" with value %s not recognized or extraneous. Ignoring.\n",word,useless);
84  }
85  if (ierr != 1) return(1);
86  }
87  } else {
88  fprintf(stderr,"Error: Illegal format in file \"solver.inp\".\n");
89  return(1);
90  }
91  fclose(in);
92  }
93  }
94 
95  /* broadcast to all processes */
96 #ifndef serial
97  if (comm) {
98  MPI_Bcast(t->reducedsolvetype,50,MPI_CHAR,0,*comm);
99  MPI_Bcast(&t->evaluate_norm,1,MPI_INT,0,*comm);
100  MPI_Bcast(&t->maxiter,1,MPI_INT,0,*comm);
101  MPI_Bcast(&t->verbose,1,MPI_INT,0,*comm);
102  MPI_Bcast(&t->atol,1,MPI_DOUBLE,0,*comm);
103  MPI_Bcast(&t->rtol,1,MPI_DOUBLE,0,*comm);
104  }
105 #endif
106 
107  return(0);
108 }
int maxiter
Definition: tridiagLU.h:91
double rtol
Definition: tridiagLU.h:92
int evaluate_norm
Definition: tridiagLU.h:90
#define _TRIDIAG_JACOBI_
Definition: tridiagLU.h:71
char reducedsolvetype[50]
Definition: tridiagLU.h:88
double atol
Definition: tridiagLU.h:92
int verbose
Definition: tridiagLU.h:96
int blocktridiagLU ( double *  a,
double *  b,
double *  c,
double *  x,
int  n,
int  ns,
int  bs,
void *  r,
void *  m 
)

Solve block tridiagonal (non-periodic) systems of equations using parallel LU decomposition: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The iterative substructuring method is used in this function that can be briefly described through the following 4 stages:

  • Stage 1: Parallel elimination of the tridiagonal blocks on each processor comprising all points of the subdomain except the 1st point (unless its the 1st global point, i.e., a physical boundary)
  • Stage 2: Elimination of the 1st row on each processor (except the 1st processor) using the last row of the previous processor.
  • Stage 3: Solution of the reduced tridiagonal system that represents the coupling of the system across the processors, using blocktridiagIterJacobi() in this implementation.
  • Stage 4: Backward-solve to obtain the final solution

Specific details of the method implemented here are available in:

More references on this class of parallel tridiagonal solvers:

  • E. Polizzi and A. H. Sameh, "A parallel hybrid banded system solver: The SPIKE algorithm", Parallel Comput., 32 (2006), pp. 177–194.
  • E. Polizzi and A. H. Sameh, "SPIKE: A parallel environment for solving banded linear systems", Comput. & Fluids, 36 (2007), pp. 113–120.

Array layout: The arguments a, b, and c are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, and superdiagonal) of size (n X ns X bs^2), and x is a local 1D array (containing this processor's part of the right-hand-side, and will contain the solution on exit) of size (n X ns X bs), where n is the local size of the system, ns is the number of independent systems to solve, and bs is the block size. The ordering of the elements in these arrays is as follows:

  • Each block is stored in the row-major format.
  • Blocks of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} B_0^k & C_0^k & & & \\ A_1^k & B_1^k & C_1^k & & \\ & A_2^k & B_2^k & C_2^k & & \\ & & A_3^k & B_3^k & C_3^k & \\ & & & A_4^k & B_4^k & C_4^k \\ \end{array}\right] \left[\begin{array}{c} X_0^k \\ X_1^k \\ X_2^k \\ X_3^k \\ X_4^k \end{array}\right] = \left[\begin{array}{c} R_0^k \\ R_1^k \\ R_2^k \\ R_3^k \\ R_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

where \(A\), \(B\), and \(C\) are matrices of size bs = 2 (say), and let \( ns = 3\). In the equation above, we have

\begin{equation} B_i^k = \left[\begin{array}{cc} b_{00,i}^k & b_{01,i}^k \\ b_{10,i}^k & b_{11,i}^k \end{array}\right], X_i^k = \left[\begin{array}{c} x_{0,i}^k \\ x_{1,i}^k \end{array} \right], R_i^k = \left[\begin{array}{c} r_{0,i}^k \\ r_{1,i}^k \end{array} \right] \end{equation}

Note that in the code, \(X\) and \(R\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_{00,0}^0, b_{01,0}^0, b_{10,0}^0, b_{11,0}^0, b_{00,0}^1, b_{01,0}^1, b_{10,0}^1, b_{11,0}^1, b_{00,0}^2, b_{01,0}^2, b_{10,0}^2, b_{11,0}^2,
b_{00,1}^0, b_{01,1}^0, b_{10,1}^0, b_{11,1}^0, b_{00,1}^1, b_{01,1}^1, b_{10,1}^1, b_{11,1}^1, b_{00,1}^2, b_{01,1}^2, b_{10,1}^2, b_{11,1}^2,
...,
b_{00,n-1}^0, b_{01,n-1}^0, b_{10,n-1}^0, b_{11,n-1}^0, b_{00,n-1}^1, b_{01,n-1}^1, b_{10,n-1}^1, b_{11,n-1}^1, b_{00,n-1}^2, b_{01,n-1}^2, b_{10,n-1}^2, b_{11,n-1}^2
]
The arrays a and c are stored similarly.

The array corresponding to a vector (the solution and the right-hand-side x) must be a 1D array with the following layout of elements:
[
x_{0,0}^0, x_{1,0}^0, x_{0,0}^1, x_{1,0}^1,x_{0,0}^2, x_{1,0}^2,
x_{0,1}^0, x_{1,1}^0, x_{0,1}^1, x_{1,1}^1,x_{0,1}^2, x_{1,1}^2,
...,
x_{0,n-1}^0, x_{1,n-1}^0, x_{0,n-1}^1, x_{1,n-1}^1,x_{0,n-1}^2, x_{1,n-1}^2
]

Notes:

  • This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
  • The input array x contains the right-hand-side on entering the function, and the solution on exiting it.
Parameters
aArray containing the sub-diagonal elements
bArray containing the diagonal elements
cArray containing the super-diagonal elements
xRight-hand side; will contain the solution on exit
nLocal size of the system on this processor (not multiplied by the block size)
nsNumber of systems to solve
bsBlock size
rObject of type TridiagLU (contains wall times at exit)
mMPI communicator

Definition at line 107 of file blocktridiagLU.c.

119 {
120  TridiagLU *params = (TridiagLU*) r;
121  int d,i,j,istart,iend,size;
122  int rank,nproc,bs2=bs*bs,nsbs=ns*bs;
123  struct timeval start,stage1,stage2,stage3,stage4;
124 
125 #ifdef serial
126  rank = 0;
127  nproc = 1;
128 #else
129  MPI_Comm *comm = (MPI_Comm*) m;
130  const int nvar = 4;
131  int ierr = 0;
132 
133  if (comm) {
134  MPI_Comm_size(*comm,&nproc);
135  MPI_Comm_rank(*comm,&rank);
136  } else {
137  rank = 0;
138  nproc = 1;
139  }
140 #endif
141 
142  if (!params) {
143  fprintf(stderr,"Error in tridiagLU(): NULL pointer passed for parameters.\n");
144  return(1);
145  }
146 
147  /* start */
148  gettimeofday(&start,NULL);
149 
150  if ((ns == 0) || (n == 0) || (bs == 0)) return(0);
151  double *xs1, *xp1;
152  xs1 = (double*) calloc (nsbs, sizeof(double));
153  xp1 = (double*) calloc (nsbs, sizeof(double));
154  for (i=0; i<nsbs; i++) xs1[i] = xp1[i] = 0;
155 
156  /* Stage 1 - Parallel elimination of subdiagonal entries */
157  istart = (rank == 0 ? 1 : 2);
158  iend = n;
159  for (i = istart; i < iend; i++) {
160  double binv[bs2], factor[bs2];
161  for (d = 0; d < ns; d++) {
162  _MatrixInvert_ (b+((i-1)*ns+d)*bs2,binv,bs);
163  _MatrixMultiply_ (a+(i*ns+d)*bs2,binv,factor,bs);
164  _MatrixMultiplySubtract_ (b+(i*ns+d)*bs2,factor,c+((i-1)*ns+d)*bs2,bs);
165  _MatrixZero_ (a+(i*ns+d)*bs2,bs);
166  _MatrixMultiplySubtract_ (a+(i*ns+d)*bs2,factor,a+((i-1)*ns+d)*bs2,bs);
167  _MatVecMultiplySubtract_ (x+(i*ns+d)*bs ,factor,x+((i-1)*ns+d)*bs ,bs);
168  if (rank) {
169  _MatrixMultiply_ (c+d*bs2,binv,factor,bs);
170  _MatrixZero_ (c+d*bs2,bs);
171  _MatrixMultiplySubtract_ (c+d*bs2,factor,c+((i-1)*ns+d)*bs2,bs);
172  _MatrixMultiplySubtract_ (b+d*bs2,factor,a+((i-1)*ns+d)*bs2,bs);
173  _MatVecMultiplySubtract_ (x+d*bs ,factor,x+((i-1)*ns+d)*bs ,bs);
174  }
175  }
176  }
177 
178  /* end of stage 1 */
179  gettimeofday(&stage1,NULL);
180 
181  /* Stage 2 - Eliminate the first sub- & super-diagonal entries */
182  /* This needs the last (a,b,c,x) from the previous process */
183 #ifndef serial
184  double *sendbuf, *recvbuf;
185  size = ns*bs2*(nvar-1)+nsbs;
186  sendbuf = (double*) calloc (size, sizeof(double));
187  recvbuf = (double*) calloc (size, sizeof(double));
188  for (d=0; d<ns; d++) {
189  for (i=0; i<bs2; i++) {
190  sendbuf[(0*ns+d)*bs2+i] = a[((n-1)*ns+d)*bs2+i];
191  sendbuf[(1*ns+d)*bs2+i] = b[((n-1)*ns+d)*bs2+i];
192  sendbuf[(2*ns+d)*bs2+i] = c[((n-1)*ns+d)*bs2+i];
193  }
194  for (i=0; i<bs; i++) sendbuf[3*ns*bs2+d*bs+i] = x[((n-1)*ns+d)*bs+i];
195  }
196  if (nproc > 1) {
197  MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
198  if (rank) MPI_Irecv(recvbuf,size,MPI_DOUBLE,rank-1,1436,*comm,&req[0]);
199  if (rank != nproc-1) MPI_Isend(sendbuf,size,MPI_DOUBLE,rank+1,1436,*comm,&req[1]);
200  MPI_Waitall(2,&req[0],MPI_STATUS_IGNORE);
201  }
202  /* The first process sits this one out */
203  if (rank) {
204  for (d = 0; d < ns; d++) {
205  double am1[bs2], bm1[bs2], cm1[bs2], xm1[bs];
206  for (i=0; i<bs2; i++) {
207  am1[i] = recvbuf[(0*ns+d)*bs2+i];
208  bm1[i] = recvbuf[(1*ns+d)*bs2+i];
209  cm1[i] = recvbuf[(2*ns+d)*bs2+i];
210  }
211  for (i=0; i<bs; i++) xm1[i] = recvbuf[3*ns*bs2+d*bs+i];
212  double factor[bs2], binv[bs2];
213  _MatrixInvert_ (bm1,binv,bs);
214  _MatrixMultiply_ (a+d*bs2,binv,factor,bs);
215  _MatrixMultiplySubtract_ (b+d*bs2,factor,cm1,bs);
216  _MatrixZero_ (a+d*bs2,bs);
217  _MatrixMultiplySubtract_ (a+d*bs2,factor,am1,bs);
218  _MatVecMultiplySubtract_ (x+d*bs ,factor,xm1,bs);
219 
220  _MatrixInvert_ (b+((n-1)*ns+d)*bs2,binv,bs); if (ierr) return(ierr);
221  _MatrixMultiply_ (c+d*bs2,binv,factor,bs);
222  _MatrixMultiplySubtract_ (b+d*bs2,factor,a+((n-1)*ns+d)*bs2,bs);
223  _MatrixZero_ (c+d*bs2,bs);
224  _MatrixMultiplySubtract_ (c+d*bs2,factor,c+((n-1)*ns+d)*bs2,bs);
225  _MatVecMultiplySubtract_ (x+d*bs ,factor,x+((n-1)*ns+d)*bs ,bs);
226  }
227  }
228  free(sendbuf);
229  free(recvbuf);
230 #endif
231 
232  /* end of stage 2 */
233  gettimeofday(&stage2,NULL);
234 
235  /* Stage 3 - Solve the reduced (nproc-1) X (nproc-1) tridiagonal system */
236 #ifndef serial
237  if (nproc > 1) {
238  double *zero, *eye;
239  zero = (double*) calloc (ns*bs2, sizeof(double));
240  eye = (double*) calloc (ns*bs2, sizeof(double));
241  for (d=0; d<ns*bs2; d++) zero[d] = eye[d] = 0.0;
242  for (d=0; d<ns; d++) {
243  for (i=0; i<bs; i++) eye[d*bs2+(i*bs+i)] = 1.0;
244  }
245 
246  if (!strcmp(params->reducedsolvetype,_TRIDIAG_GS_)) {
247  /* not supported */
248  fprintf(stderr,"Error in blocktridiagLU(): Gather-and-solve for reduced system not available.\n");
249  return(1);
250  } else if (!strcmp(params->reducedsolvetype,_TRIDIAG_JACOBI_)) {
251  /* Solving the reduced system iteratively with the Jacobi method */
252  if (rank) ierr = blocktridiagIterJacobi(a,b,c,x,1,ns,bs,params,comm);
253  else ierr = blocktridiagIterJacobi(zero,eye,zero,zero,1,ns,bs,params,comm);
254  }
255  free(zero);
256  free(eye);
257 
258  /* Each process, get the first x of the next process */
259  MPI_Request req[2] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL};
260  for (d=0; d<nsbs; d++) xs1[d] = x[d];
261  if (rank+1 < nproc) MPI_Irecv(xp1,nsbs,MPI_DOUBLE,rank+1,1323,*comm,&req[0]);
262  if (rank) MPI_Isend(xs1,nsbs,MPI_DOUBLE,rank-1,1323,*comm,&req[1]);
263  MPI_Waitall(2,&req[0],MPI_STATUS_IGNORE);
264  }
265 #else
266  if (nproc > 1) {
267  fprintf(stderr,"Error: nproc > 1 for a serial run!\n");
268  return(1);
269  }
270 #endif /* if not serial */
271  /* end of stage 3 */
272  gettimeofday(&stage3,NULL);
273 
274  /* Stage 4 - Parallel back-substitution to get the solution */
275  istart = n-1;
276  iend = (rank == 0 ? 0 : 1);
277 
278  for (d = 0; d < ns; d++) {
279  double binv[bs2],xt[bs];
280  _MatrixInvert_ (b+(istart*ns+d)*bs2,binv,bs);
281  _MatVecMultiplySubtract_ (x+(istart*ns+d)*bs ,a+(istart*ns+d)*bs2,x +d*bs,bs);
282  _MatVecMultiplySubtract_ (x+(istart*ns+d)*bs ,c+(istart*ns+d)*bs2,xp1+d*bs,bs);
283  _MatVecMultiply_ (binv,x+(istart*ns+d)*bs,xt,bs);
284  for (j=0; j<bs; j++) x[(istart*ns+d)*bs+j]=xt[j];
285  }
286  for (i = istart-1; i > iend-1; i--) {
287  for (d = 0; d < ns; d++) {
288  double binv[bs2],xt[bs];
289  _MatrixInvert_ (b+(i*ns+d)*bs2,binv,bs);
290  _MatVecMultiplySubtract_ (x+(i*ns+d)*bs ,c+(i*ns+d)*bs2,x+((i+1)*ns+d)*bs,bs);
291  _MatVecMultiplySubtract_ (x+(i*ns+d)*bs ,a+(i*ns+d)*bs2,x+d*bs,bs);
292  _MatVecMultiply_ (binv,x+(i*ns+d)*bs,xt,bs);
293  for (j=0; j<bs; j++) x[(i*ns+d)*bs+j] = xt[j];
294  }
295  }
296 
297  /* end of stage 4 */
298  gettimeofday(&stage4,NULL);
299 
300  /* Done - now x contains the solution */
301  free(xs1);
302  free(xp1);
303 
304  /* save runtimes if needed */
305  long long walltime;
306  walltime = ((stage1.tv_sec * 1000000 + stage1.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
307  params->stage1_time = (double) walltime / 1000000.0;
308  walltime = ((stage2.tv_sec * 1000000 + stage2.tv_usec) - (stage1.tv_sec * 1000000 + stage1.tv_usec));
309  params->stage2_time = (double) walltime / 1000000.0;
310  walltime = ((stage3.tv_sec * 1000000 + stage3.tv_usec) - (stage2.tv_sec * 1000000 + stage2.tv_usec));
311  params->stage3_time = (double) walltime / 1000000.0;
312  walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (stage3.tv_sec * 1000000 + stage3.tv_usec));
313  params->stage4_time = (double) walltime / 1000000.0;
314  walltime = ((stage4.tv_sec * 1000000 + stage4.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
315  params->total_time = (double) walltime / 1000000.0;
316  return(0);
317 }
#define _MatrixMultiplySubtract_(C, A, B, N)
Definition: matops.h:51
double stage3_time
Definition: tridiagLU.h:101
#define _MatrixMultiply_(A, B, C, N)
Definition: matops.h:22
double stage2_time
Definition: tridiagLU.h:100
double stage1_time
Definition: tridiagLU.h:99
#define _TRIDIAG_JACOBI_
Definition: tridiagLU.h:71
double stage4_time
Definition: tridiagLU.h:102
#define _MatrixZero_(A, N)
Definition: matops.h:12
char reducedsolvetype[50]
Definition: tridiagLU.h:88
int blocktridiagIterJacobi(double *, double *, double *, double *, int, int, int, void *, void *)
#define _TRIDIAG_GS_
Definition: tridiagLU.h:73
#define _MatVecMultiplySubtract_(y, A, x, N)
Definition: matops.h:78
#define _MatVecMultiply_(A, x, y, N)
Definition: matops.h:64
double total_time
Definition: tridiagLU.h:98
#define _MatrixInvert_(A, B, N)
Definition: matops.h:90
int blocktridiagIterJacobi ( double *  a,
double *  b,
double *  c,
double *  x,
int  n,
int  ns,
int  bs,
void *  r,
void *  m 
)

Solve block tridiagonal (non-periodic) systems of equations using point Jacobi iterations: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides. The initial guess is taken as the solution of

\begin{equation} {\rm diag}\left[{\bf b}\right]{\bf x} = {\bf r} \end{equation}

where \({\bf b}\) represents the diagonal elements of the tridiagonal system, and \({\bf r}\) is the right-hand-side, stored in \({\bf x}\) at the start of this function.

Array layout: The arguments a, b, and c are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, and superdiagonal) of size (n X ns X bs^2), and x is a local 1D array (containing this processor's part of the right-hand-side, and will contain the solution on exit) of size (n X ns X bs), where n is the local size of the system, ns is the number of independent systems to solve, and bs is the block size. The ordering of the elements in these arrays is as follows:

  • Each block is stored in the row-major format.
  • Blocks of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} B_0^k & C_0^k & & & \\ A_1^k & B_1^k & C_1^k & & \\ & A_2^k & B_2^k & C_2^k & & \\ & & A_3^k & B_3^k & C_3^k & \\ & & & A_4^k & B_4^k & C_4^k \\ \end{array}\right] \left[\begin{array}{c} X_0^k \\ X_1^k \\ X_2^k \\ X_3^k \\ X_4^k \end{array}\right] = \left[\begin{array}{c} R_0^k \\ R_1^k \\ R_2^k \\ R_3^k \\ R_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

where \(A\), \(B\), and \(C\) are matrices of size bs = 2 (say), and let \( ns = 3\). In the equation above, we have

\begin{equation} B_i^k = \left[\begin{array}{cc} b_{00,i}^k & b_{01,i}^k \\ b_{10,i}^k & b_{11,i}^k \end{array}\right], X_i^k = \left[\begin{array}{c} x_{0,i}^k \\ x_{1,i}^k \end{array} \right], R_i^k = \left[\begin{array}{c} r_{0,i}^k \\ r_{1,i}^k \end{array} \right] \end{equation}

Note that in the code, \(X\) and \(R\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_{00,0}^0, b_{01,0}^0, b_{10,0}^0, b_{11,0}^0, b_{00,0}^1, b_{01,0}^1, b_{10,0}^1, b_{11,0}^1, b_{00,0}^2, b_{01,0}^2, b_{10,0}^2, b_{11,0}^2,
b_{00,1}^0, b_{01,1}^0, b_{10,1}^0, b_{11,1}^0, b_{00,1}^1, b_{01,1}^1, b_{10,1}^1, b_{11,1}^1, b_{00,1}^2, b_{01,1}^2, b_{10,1}^2, b_{11,1}^2,
...,
b_{00,n-1}^0, b_{01,n-1}^0, b_{10,n-1}^0, b_{11,n-1}^0, b_{00,n-1}^1, b_{01,n-1}^1, b_{10,n-1}^1, b_{11,n-1}^1, b_{00,n-1}^2, b_{01,n-1}^2, b_{10,n-1}^2, b_{11,n-1}^2
]
The arrays a and c are stored similarly.

The array corresponding to a vector (the solution and the right-hand-side x) must be a 1D array with the following layout of elements:
[
x_{0,0}^0, x_{1,0}^0, x_{0,0}^1, x_{1,0}^1,x_{0,0}^2, x_{1,0}^2,
x_{0,1}^0, x_{1,1}^0, x_{0,1}^1, x_{1,1}^1,x_{0,1}^2, x_{1,1}^2,
...,
x_{0,n-1}^0, x_{1,n-1}^0, x_{0,n-1}^1, x_{1,n-1}^1,x_{0,n-1}^2, x_{1,n-1}^2
]

Notes:

  • This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
  • The input array x contains the right-hand-side on entering the function, and the solution on exiting it.
Parameters
aArray containing the sub-diagonal elements
bArray containing the diagonal elements
cArray containing the super-diagonal elements
xRight-hand side; will contain the solution on exit
nLocal size of the system on this processor (not multiplied by the block size)
nsNumber of systems to solve
bsBlock size
rObject of type TridiagLU
mMPI communicator

Definition at line 88 of file blocktridiagIterJacobi.c.

100 {
101  TridiagLU *context = (TridiagLU*) r;
102  int iter,d,i,j,NT,bs2=bs*bs,nsbs=ns*bs;
103  double norm=0,norm0=0,global_norm=0;
104 
105 #ifndef serial
106  MPI_Comm *comm = (MPI_Comm*) m;
107  int rank,nproc;
108 
109  if (comm) {
110  MPI_Comm_size(*comm,&nproc);
111  MPI_Comm_rank(*comm,&rank);
112  } else {
113  rank = 0;
114  nproc = 1;
115  }
116 #endif
117 
118  if (!context) {
119  fprintf(stderr,"Error in tridiagIterJacobi(): NULL pointer passed for parameters!\n");
120  return(-1);
121  }
122 
123  double *rhs = (double*) calloc (ns*n*bs, sizeof(double));
124  for (i=0; i<ns*n*bs; i++) rhs[i] = x[i]; /* save a copy of the rhs */
125  /* initial guess */
126  for (i=0; i<n; i++) {
127  for (d=0; d<ns; d++) {
128  double binv[bs2];
129  _MatrixInvert_ (b+(i*ns+d)*bs2,binv,bs);
130  _MatVecMultiply_ (binv,rhs+(i*ns+d)*bs,x+(i*ns+d)*bs,bs);
131  }
132  }
133 
134  double *recvbufL, *recvbufR, *sendbufL, *sendbufR;
135  recvbufL = (double*) calloc (nsbs, sizeof(double));
136  recvbufR = (double*) calloc (nsbs, sizeof(double));
137  sendbufL = (double*) calloc (nsbs, sizeof(double));
138  sendbufR = (double*) calloc (nsbs, sizeof(double));
139 
140  /* total number of points */
141 #ifdef serial
142  if (context->evaluate_norm) NT = n;
143  else NT = 0;
144 #else
145  if (context->evaluate_norm) MPI_Allreduce(&n,&NT,1,MPI_INT,MPI_SUM,*comm);
146  else NT = 0;
147 #endif
148 
149 #ifdef serial
150  if (context->verbose) printf("\n");
151 #else
152  if (context->verbose && (!rank)) printf("\n");
153 #endif
154 
155  iter = 0;
156  while(1) {
157 
158  /* evaluate break conditions */
159  if ( (iter >= context->maxiter)
160  || (iter && context->evaluate_norm && (global_norm < context->atol))
161  || (iter && context->evaluate_norm && (global_norm/norm0 < context->rtol)) ) {
162  break;
163  }
164 
165  /* Communicate the boundary x values between processors */
166  for (d=0; d<nsbs; d++) recvbufL[d] = recvbufR[d] = 0;
167 #ifndef serial
168  MPI_Request req[4] = {MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL,MPI_REQUEST_NULL};
169  if (rank) MPI_Irecv(recvbufL,nsbs,MPI_DOUBLE,rank-1,2,*comm,&req[0]);
170  if (rank != nproc-1) MPI_Irecv(recvbufR,nsbs,MPI_DOUBLE,rank+1,3,*comm,&req[1]);
171  for (d=0; d<nsbs; d++) {
172  sendbufL[d] = x[d];
173  sendbufR[d] = x[(n-1)*nsbs+d];
174  }
175  if (rank) MPI_Isend(sendbufL,nsbs,MPI_DOUBLE,rank-1,3,*comm,&req[2]);
176  if (rank != nproc-1) MPI_Isend(sendbufR,nsbs,MPI_DOUBLE,rank+1,2,*comm,&req[3]);
177 #endif
178 
179  /* calculate error norm - interior */
180  if (context->evaluate_norm) {
181  norm = 0;
182  for (i=1; i<n-1; i++) {
183  for (d=0; d<ns; d++) {
184  double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[(i*ns+d)*bs+j];
185  _MatVecMultiplySubtract_(err,a+(i*ns+d)*bs2,x+((i-1)*ns+d)*bs,bs);
186  _MatVecMultiplySubtract_(err,b+(i*ns+d)*bs2,x+((i )*ns+d)*bs,bs);
187  _MatVecMultiplySubtract_(err,c+(i*ns+d)*bs2,x+((i+1)*ns+d)*bs,bs);
188  for (j=0; j<bs; j++) norm += (err[j]*err[j]);
189  }
190  }
191  }
192  /* calculate error norm - boundary */
193 #ifndef serial
194  MPI_Waitall(4,req,MPI_STATUS_IGNORE);
195 #endif
196  if (context->evaluate_norm) {
197  if (n > 1) {
198  for (d=0; d<ns; d++) {
199  double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[d*bs+j];
200  _MatVecMultiplySubtract_(err,a+d*bs2,recvbufL+d*bs,bs);
201  _MatVecMultiplySubtract_(err,b+d*bs2,x+d*bs,bs);
202  _MatVecMultiplySubtract_(err,c+d*bs2,x+(ns+d)*bs,bs);
203  for (j=0; j<bs; j++) norm += (err[j]*err[j]);
204  }
205  for (d=0; d<ns; d++) {
206  double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[(d+ns*(n-1))*bs+j];
207  _MatVecMultiplySubtract_(err,a+(d+ns*(n-1))*bs2,x+(d+ns*(n-2))*bs,bs);
208  _MatVecMultiplySubtract_(err,b+(d+ns*(n-1))*bs2,x+(d+ns*(n-1))*bs,bs);
209  _MatVecMultiplySubtract_(err,c+(d+ns*(n-1))*bs2,recvbufR+d*bs,bs);
210  for (j=0; j<bs; j++) norm += (err[j]*err[j]);
211  }
212  } else {
213  for (d=0; d<ns; d++) {
214  double err[bs]; for (j=0; j<bs; j++) err[j] = rhs[d*bs+j];
215  _MatVecMultiplySubtract_(err,a+d*bs2,recvbufL+d*bs,bs);
216  _MatVecMultiplySubtract_(err,b+d*bs2,x+d*bs,bs);
217  _MatVecMultiplySubtract_(err,c+d*bs2,recvbufR+d*bs,bs);
218  for (j=0; j<bs; j++) norm += (err[j]*err[j]);
219  }
220  }
221  /* sum over all processes */
222 #ifdef serial
223  global_norm = norm;
224 #else
225  MPI_Allreduce(&norm,&global_norm,1,MPI_DOUBLE,MPI_SUM,*comm);
226 #endif
227  global_norm = sqrt(global_norm/NT);
228  if (!iter) norm0 = global_norm;
229  } else {
230  norm = -1.0;
231  global_norm = -1.0;
232  }
233 
234 #ifdef serial
235  if (context->verbose)
236 #else
237  if (context->verbose && (!rank))
238 #endif
239  printf("\t\titer: %d, norm: %1.16E\n",iter,global_norm);
240 
241  /* correct the solution for this iteration */
242  if (n > 1) {
243  for (d=0; d<ns; d++) {
244  double xt[bs],binv[bs2];
245 
246  i = 0;
247  for (j=0; j<bs; j++) xt[j] = rhs[(i*ns+d)*bs+j];
248  _MatVecMultiplySubtract_(xt,a+(i*ns+d)*bs2,recvbufL+d*bs,bs);
249  _MatVecMultiplySubtract_(xt,c+(i*ns+d)*bs2,x+(d+ns*(i+1))*bs,bs);
250  _MatrixInvert_(b+(i*ns+d)*bs2,binv,bs);
251  _MatVecMultiply_(binv,xt,x+(i*ns+d)*bs,bs);
252 
253  i = n-1;
254  for (j=0; j<bs; j++) xt[j] = rhs[(i*ns+d)*bs+j];
255  _MatVecMultiplySubtract_(xt,a+(i*ns+d)*bs2,x+(d+ns*(i-1))*bs,bs);
256  _MatVecMultiplySubtract_(xt,c+(i*ns+d)*bs2,recvbufR+d*bs,bs);
257  _MatrixInvert_(b+(i*ns+d)*bs2,binv,bs);
258  _MatVecMultiply_(binv,xt,x+(i*ns+d)*bs,bs);
259  }
260  for (i=1; i<n-1; i++) {
261  for (d=0; d<ns; d++) {
262  double xt[bs],binv[bs2];
263  for (j=0; j<bs; j++) xt[j] = rhs[(i*ns+d)*bs+j];
264  _MatVecMultiplySubtract_(xt,a+(i*ns+d)*bs2,x+(d+ns*(i-1))*bs,bs);
265  _MatVecMultiplySubtract_(xt,c+(i*ns+d)*bs2,x+(d+ns*(i+1))*bs,bs);
266  _MatrixInvert_(b+(i*ns+d)*bs2,binv,bs);
267  _MatVecMultiply_(binv,xt,x+(i*ns+d)*bs,bs);
268  }
269  }
270  } else {
271  for (d=0; d<ns; d++) {
272  double xt[bs],binv[bs2];
273  for (j=0; j<bs; j++) xt[j] = rhs[d*bs+j];
274  _MatVecMultiplySubtract_(xt,a+d*bs2,recvbufL+d*bs,bs);
275  _MatVecMultiplySubtract_(xt,c+d*bs2,recvbufR+d*bs,bs);
276  _MatrixInvert_(b+d*bs2,binv,bs);
277  _MatVecMultiply_(binv,xt,x+d*bs,bs);
278  }
279  }
280 
281  /* finished with this iteration */
282  iter++;
283  }
284 
285  /* save convergence information */
286  context->exitnorm = (context->evaluate_norm ? global_norm : -1.0);
287  context->exititer = iter;
288 
289  free(rhs);
290  free(sendbufL);
291  free(sendbufR);
292  free(recvbufL);
293  free(recvbufR);
294 
295  return(0);
296 }
int maxiter
Definition: tridiagLU.h:91
int exititer
Definition: tridiagLU.h:94
int evaluate_norm
Definition: tridiagLU.h:90
#define _MatVecMultiplySubtract_(y, A, x, N)
Definition: matops.h:78
#define _MatVecMultiply_(A, x, y, N)
Definition: matops.h:64
int verbose
Definition: tridiagLU.h:96
double exitnorm
Definition: tridiagLU.h:95
#define _MatrixInvert_(A, B, N)
Definition: matops.h:90
int tridiagScaLPK ( double *  a,
double *  b,
double *  c,
double *  x,
int  n,
int  ns,
void *  r,
void *  m 
)

Solve tridiagonal (non-periodic) systems of equations using ScaLAPACK's pddtsv: This function can solve multiple independent systems with one call. The systems need not share the same left- or right-hand-sides.

  • This function is compiled only with the compilation flag "-Dwith_scalapack" is specified.
  • This function calls the ScaLAPACK function for solving tridiagonal systems individually for each system, and thus may not be efficient.

Array layout: The arguments a, b, c, and x are local 1D arrays (containing this processor's part of the subdiagonal, diagonal, superdiagonal, and right-hand-side) of size (n X ns), where n is the local size of the system, and ns is the number of independent systems to solve. The ordering of the elements in these arrays is as follows:

  • Elements of the same row for each of the independent systems are stored adjacent to each other.

For example, consider the following systems:

\begin{equation} \left[\begin{array}{ccccc} b_0^k & c_0^k & & & \\ a_1^k & b_1^k & c_1^k & & \\ & a_2^k & b_2^k & c_2^k & & \\ & & a_3^k & b_3^k & c_3^k & \\ & & & a_4^k & b_4^k & c_4^k \\ \end{array}\right] \left[\begin{array}{c} x_0^k \\ x_1^k \\ x_2^k \\ x_3^k \\ x_4^k \end{array}\right] = \left[\begin{array}{c} r_0^k \\ r_1^k \\ r_2^k \\ r_3^k \\ r_4^k \end{array}\right]; \ \ k= 1,\cdots,ns \end{equation}

and let \( ns = 3\). Note that in the code, \(x\) and \(r\) are the same array x.

Then, the array b must be a 1D array with the following layout of elements:
[
b_0^0, b_0^1, b_0^2, (diagonal element of the first row in each system)
b_1^0, b_1^1, b_1^2, (diagonal element of the second row in each system)
...,
b_{n-1}^0, b_{n-1}^1, b_{n-1}^2 (diagonal element of the last row in each system)
]
The arrays a, c, and x are stored similarly.

Notes:

  • This function does not preserve the sub-diagonal, diagonal, super-diagonal elements and the right-hand-sides.
  • The input array x contains the right-hand-side on entering the function, and the solution on exiting it.
Parameters
aArray containing the sub-diagonal elements
bArray containing the diagonal elements
cArray containing the super-diagonal elements
xRight-hand side; will contain the solution on exit
nLocal size of the system on this processor
nsNumber of systems to solve
rObject of type TridiagLU
mMPI communicator

Definition at line 71 of file tridiagScaLPK.c.

81 {
82  TridiagLU *params = (TridiagLU*) r;
83  int rank,nproc,nglobal,nrhs,i,s,ia,ib,desca[9],descb[9],ierr,
84  lwork;
85  double *dl,*d,*du,*rhs,*work;
86  struct timeval start,end;
87 
88 #ifdef serial
89  rank = 0;
90  nproc = 1;
91  nglobal=n;
92 #else
93  MPI_Comm *comm = (MPI_Comm*) m;
94 
95  if (comm) {
96  MPI_Comm_size(*comm,&nproc);
97  MPI_Comm_rank(*comm,&rank);
98  } else {
99  rank = 0;
100  nproc = 1;
101  }
102  MPI_Allreduce(&n,&nglobal,1,MPI_INT,MPI_SUM,*comm);
103 #endif
104 
105  /* check */
106  if (nglobal%n != 0) {
107  if (!rank) {
108  fprintf(stderr,"Error: The ScaLAPACK wrapper can only handle cases where the global ");
109  fprintf(stderr,"size of system is an integer multiple of no. of processes.\n");
110  }
111  return(1);
112  }
113 
114  if (!params) {
115  fprintf(stderr,"Error in tridiagLU(): NULL pointer passed for parameters.\n");
116  return(1);
117  }
118 
119 
120  nrhs = 1;
121  ia = 1;
122  ib = 1;
123 
124  lwork = (12*nproc+3*n) + ( (8*nproc) > (10*nproc+4*nrhs) ? (8*nproc) : (10*nproc+4*nrhs) );
125 
126  desca[0] = 501;
127  desca[1] = params->blacs_ctxt;
128  desca[2] = nglobal;
129  desca[3] = n;
130  desca[4] = 0;
131  desca[5] = 0;
132  desca[6] = 0;
133  desca[7] = 0;
134  desca[8] = 0;
135 
136  descb[0] = 502;
137  descb[1] = params->blacs_ctxt;
138  descb[2] = nglobal;
139  descb[3] = n;
140  descb[4] = 0;
141  descb[5] = n;
142  descb[6] = 0;
143  descb[7] = 0;
144  descb[8] = 0;
145 
146  dl = (double*) calloc (n,sizeof(double));
147  d = (double*) calloc (n,sizeof(double));
148  du = (double*) calloc (n,sizeof(double));
149  rhs = (double*) calloc (n,sizeof(double));
150  work = (double*) calloc(lwork,sizeof(double));
151 
152  params->total_time = 0.0;
153  params->stage1_time = 0.0;
154  params->stage2_time = 0.0;
155  params->stage3_time = 0.0;
156  params->stage4_time = 0.0;
157  for (s=0; s<ns; s++) {
158 
159  for (i=0; i<n; i++) {
160  dl[i] = a[i*ns+s];
161  d [i] = b[i*ns+s];
162  du[i] = c[i*ns+s];
163  rhs[i]= x[i*ns+s];
164  }
165 
166  /* call the ScaLAPACK function */
167  gettimeofday(&start,NULL);
168  pddtsv_(&nglobal,&nrhs,dl,d,du,&ia,desca,rhs,&ib,descb,work,&lwork,&ierr);
169  gettimeofday(&end,NULL);
170  if (ierr) return(ierr);
171 
172  for (i=0; i<n; i++) x[i*ns+s] = rhs[i];
173 
174  long long walltime;
175  walltime = ((end.tv_sec * 1000000 + end.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec));
176  params->total_time += (double) walltime / 1000000.0;
177  }
178 
179 
180  free(dl);
181  free(d);
182  free(du);
183  free(rhs);
184  free(work);
185 
186  return(0);
187 }
double stage3_time
Definition: tridiagLU.h:101
double stage2_time
Definition: tridiagLU.h:100
int blacs_ctxt
Definition: tridiagLU.h:105
double stage1_time
Definition: tridiagLU.h:99
double stage4_time
Definition: tridiagLU.h:102
void pddtsv_()
double total_time
Definition: tridiagLU.h:98