TridiagLU  1.0
Scalable, parallel solver for tridiagonal system of equations
Scalability

The following strong scalability results were obtained on Vesta (Blue Gene/Q) at the Argonne Leadership Computing Facility - 2048 nodes, each node has a 1600 Mhz PowerPC A2 processor with 16 cores (32,768 cores).

Note: The results here are most likely the "best possible". They were obtained when the number of other users of the cluster was minimal.

See documentation of tridiagLU() to understand what the stages refer to in the following plots. They are briefly summarized as follows:

  • Stage 1: Independent elimination of interior points (perfectly scalable)
  • Stage 2: Elimination of non-zeros created by stage 1 that leads to the formation of the reduced system (requires one one-way communication, followed by independent computations)
  • Stage 3: Solution of the reduced system - the reduced system is of size nproc and each processor has one element of the system. Thus, this stage has a tight coupling between processors if solved directly.
  • Stage 4: Independent back-substitution at the interior points (perfectly scalable)

Example: tridiagLU() with tridiagIterJacobi() (Jacobi method) for reduced systems

  • Total size of system: 1,048,576
  • Number of systems: 1
  • Number of solves for walltime measurement: 1
  • Scalable up to: ~512 processors

    tridiaglu_scalability_001m_jac.png
  • Total size of system: 16,777,216
  • Number of systems: 1
  • Number of solves for walltime measurement: 1
  • Scalable up to: ~8,192 processors

    tridiaglu_scalability_016m_jac.png
  • Total size of system: 268,435,456
  • Number of systems: 1
  • Number of solves for walltime measurement: 1
  • Scalable up to: >16,384 processors

    tridiaglu_scalability_260m_jac.png

Example: tridiagLU() with tridiagLUGS() (Gather-and-Solve method) for reduced systems

  • Total size of system: 1,048,576
  • Number of systems: 1
  • Number of solves for walltime measurement: 1
  • Scalable up to: ~64 processors

    tridiaglu_scalability_001m_gs.png
  • Total size of system: 16,777,216
  • Number of systems: 1
  • Number of solves for walltime measurement: 1
  • Scalable up to: ~1,024 processors

    tridiaglu_scalability_016m_gs.png
  • Total size of system: 268,435,456
  • Number of systems: 1
  • Number of solves for walltime measurement: 1
  • Scalable up to: ~2,048 processors

    tridiaglu_scalability_260m_gs.png