Keywords: Volterra integro-differential equation
parallel algorithm
Direct methods of solution for solving nonlinear Volterra integral and integro-differential equations are inherently serial and therefore have not received much attention for use on a parallel computer. It is possible, however, to make significant gains in speedup by employing some novel approaches to existing methods. It can be observed that when approximating the integral term, all approximations up to and including the kth step in the interval can be evaluated at the same time. The resulting algorithm is described here and a numerical example illustrates the results. With the four-processor multicomputer utilized, speedups of 3 to 4 were realized.
INTRODUCTIONConsider a nonlinear Volterra integro-differential equation (VIDE) of the form
where

K(x,t,y(t))dt
and y(x) is the unknown function. Equations of this type occur in a number of areas such as wave power hydraulics [6], population dynamics [5,12] and reactor dynamics [11]. By setting z(x)=0, equation (1) reduces to a first order ordinary differential equation (ODE) of the form
In general, a VIDE cannot be reduced to an ODE through differentiation. There have been numerous papers on parallel code for solving ODEs [2,3,8,9,16,17,18,19,20] but work on Volterra type equations has currently been limited to only a few authors [4,13,14,15,16,20]. This is due to the inherently serial nature of obtaining a numerical approximation to the integral term in (1). Papers by Crisci, van der Houwen, Russo and Vecchio [4] and Vecchio [21] concentrated on the stability aspects of parallel iteration of Volterra-Runge-Kutta (VRK) methods for solving Volterra integral equations on parallel computers. VRK methods are step-by-step methods and can take advantage of parallel architecture. A paper by Sommeijer, Couzy and van der Houwen [16] covered the stability of parallel block methods for ODEs and included equations of the integro-differential type in their discussion. The work in this paper concentrates on modifying the algorithmic side of the numerical solution process for (1) for use on a parallel processor while consciously utilizing methods that are known to be stable.
To define a class of problems of the form (1), let R1 and R2 be two regions defined by
}
}. i) K and F are uniformly continuous and bounded for all (x,t,y) in R1 and (x,y,z) in R2, respectively, and
ii) the following Lipschitz conditions are satisfied
L1|y - y*|;
L2|z - z*|;
L3|y - y*|;
where L1, L2 and L3 are the Lipschitz constants, then it is known [7] that equation (1) has a unique solution.
To find a numerical solution for equations of the type (1) we will use the fourth-order Adams-Bashforth-Moulton (ABM) predictor-corrector (PC) method with the fourth-order Runge-Kutta method being used to obtain the starting values. Newton-Gregory quadrature will be used to approximate the integral term in equation (1) and the fourth-order Runge-Kutta method will be used for starting values here as well. It is this integral term that leads to the problem of adapting the numerical methods for VIDEs for use on parallel processors. This paper suggests a unique approach to solving this problem that allows the resulting algorithm to be parallelized. The algorithm was coded on a four-processor Pentium Pro and the numerical examples show actual speedups of 3 to 4. Preliminary work on a 16 processor supercomputer shows similar speedups.
NUMERICAL METHODSLet IN be a partition of I=[0,a] where IN={xN=nh, n=0(1)N, Nh=a}. The problem is to find approximations yn to the solution y(xn) of equation (1) for each xn in IN. A multistep method for an integro-differential equation is given by
wj F(xn+j, yn+j, zn+j)
=0
where
+
cn+j,i
K(xn+j, xi, yi), y0 = 
=0
where is a constant, the weights {wj} depend on the k-step method selected and the weights {cn+j,i} are those of a Newton-Cotes or Newton-Gregory quadrature rule. The fourth order Adams-Bashforth predictor and Adams-Moulton corrector are the multistep methods used in this paper. On uniprocessor computers multistep PC methods of the Adams type are the most popular [19]. It has been suggested by van der Houwen, Sommeijer and de Swart [19] that the ABM method is one of several efficient parallel PC methods that can be used to solve initial value problems of the form (2). The integral term in equation (1) is solved by Newton-Gregory quadrature.
For the integration of equation (1) over [xk-3,xk+1], the PC method is given by
where for each zi the Newton-Gregory quadrature weights are given in Table 2.1. The fourth order Runge-Kutta method is used to obtain starting values for both the ABM PC multistep method and the Newton-Gregory quadrature rules.
| 2 | 8 | 32 | 8 | 0 | 0 | 0 | 0 |
| 3 | 9 | 27 | 27 | 9 | 0 | 0 | 0 |
| 4 | 9 | 28 | 22 | 28 | 9 | 0 | 0 |
| 5 | 9 | 28 | 23 | 23 | 28 | 9 | 0 |
| 6 | 9 | 28 | 23 | 24 | 23 | 28 | 9 |
When solving VIDEs the integral term as given in equation (1) must be considered before attempting to convert the algorithm for parallel processing. Any direct method of solution requires that the approximation to each zi in the interval of integration involve the points [x0,xi-1]. This is obviously a serial process and not a good candidate for parallelization. It can be observed, however, that once the starting values are obtained and the value of yk for the k-step method is known, all approximations {zi}, i=1(1)N can be evaluated up to and including this point. At each point xi+1 to xN all values of zi+1 to zN can also be evaluated simultaneously. This results in an algorithm that can then take advantage of a parallel processing computer.
To accommodate this approach for solving equation (1), a matrix of weights can be assigned to correspond to the quadrature rule employed. The matrix that results from the Newton-Gregory rules is given by Table 2.1. Note that, since the fourth order ABM method requires four starting values, z4 is the first integral term where the modification to allow for parallel processing can be utilized. Using the convention of Akl [1], where braces indicate comments, gives the following algorithm:
ALGORITHM1) {Set up matrix of weights for quadrature rule}
2) {Set up N+1 mesh points}
3) {Find starting values}
4) {Use a k-step method to approximate equation (1) over [xk,xN]}
It should be noted that setting up the matrix of weights can also be a parallel procedure. This, combined with the unique approach used to approximate the integral term in step 4 of the algorithm, results in an algorithm that gives speedups of up to 4 as compared with no speedup in the original algorithm.
NUMERICAL EXAMPLESIn this section, a nonlinear problem is used to illustrate the method described. The code is run in both scalar and parallel mode for different values of N where N is the number of mesh points. The computer system used is a four-processor Pentium Pro running Linux. Portland Group's pgf77 Fortran compiler is used with OpenMP style directives. The maximum error in the approximation over the interval [0,1] and the mean CPU times in milliseconds are given in Table 4.1 for the multistep method described. The speedups in Table 4.1 for successively larger values of N illustrate the relationship between speedups and number of mesh points. The scaled efficiency of the processors, a better measure of the quality of the parallel algorithm, is included as well and illustrates up to a 96% efficiency for large values of N.
Example:

y'(x) = xe(1-y(x)) - 1/(1+x)2 - x -
x/(1+t)2 e(1-y(t))dt
| N | |y(xi)-yi| | Mean scalar CPU time | Mean parallel CPU time | Speedup | Scaled Efficiency |
| 160 | 8.007x10-9 | 0.033 | 0.012 | 2.75 | 68.75% |
| 320 | 5.165x10-10 | 0.120 | 0.043 | 2.79 | 69.75% |
| 640 | 3.265x10-11 | 0.478 | 0.145 | 3.30 | 82.50% |
| 1280 | 2.053x10-12 | 1.890 | 0.520 | 3.63 | 90.87% |
| 2560 | 1.287x10-13 | 7.560 | 2.000 | 3.78 | 94.50% |
| 5120 | 8.157x10-15 | 30.290 | 7.850 | 3.86 | 96.46% |
For the multistep method described, the results indicate that as the number of mesh points increase the speedup is more significant. It can also be noted that, for larger values of N, the numerical approximation to the solution of (1) becomes more accurate. In addition, the scaled efficiency of the processors reaches about 96% for large N. This indicates that the processor utilization for this algorithm is very high.
ACKNOWLEDGEMENTThis work is supported by NSERC. The author would also like to thank Danny Lizotte, a 1999 NSERC student, for his programming efforts.