Exercises and examples of Chapter 3 in P. Arbenz and W. Petersen,
Introduction to Parallel Computing, Oxford Univ. Press, 2004.

EXERCISES (Uebungen):

Exercise 3.1 (Uebung 3.1): variants of matrix-matrix multiply.
Exercise 3.2 (Uebung 3.2): SSUM and SAXPY implemented using Intel SSE intrinsics. See BLAS Level-1
routines from BLAS.
Exercise 3.1 (Uebung 3.1) solution: variants of matrix-matrix multiply problem.
Exercise 3.2 (Uebung 3.2) solution. SSUM and SAXPY implemented using Intel SSE intrinsics. See BLAS Level-1
routines from BLAS .

TEST PROGRAMS/routines:

Altivec FFT in-line: binary radix FFT using workspace and Apple Altivec intrinsics. This version expands step in-line.
Otherwise, it is similar to "genericfft.c" below (Section 3.6).
Altivec dot product: unit stride sdot for Apple Altivec (Section 3.5.5). See BLAS Level-1 routines from BLAS.
Altivec isamax: unit stride isamax0 for Apple Altivec (Section 3.5.7). See BLAS Level-1 routines from BLAS.
Altivec FFT: binary radix FFT using binary radix FFT using a workspace and Apple Altivec intrinsics. It is
similar to "genericfft.c" below (Section 3.6).
Generic FFT: generic binary radix FFT using a workspace but no SSE or Altivec intrinsics (Section 3.6).
Multiple tridiagonal: sub-procedure for multiple right hand side solution of tridiagonal systems via simple one-step
recurrence formula - after Forsythe and Moler (see Section 3.5.2).
SGEFA: tests variants of the simple parallel version of sgefa in (Section 3.4.2). There is a README
file in this gzipped tar file describing the variants.
Rpoly: recursive doubling version of polynomial evaluation (Section 3.5).
SSE FFT in-line: version of workspace version of binary radix FFT with step in-lined - same as genericfft.c
above but using Intel SSE intrinsics. (from Section 3.6).
SSE isamax: SSE example of isamax0, a unit stride isamax (from Section 3.5.7). See BLAS Level-1
routines from BLAS .
SSE FFT: SSE version of workspace version of binary radix FFT, same as genericfft.c
above but using Intel SSE intrinsics (see Section 3.6).
Tridiagonal system tests: Tridiagonal system solver tests, also compares timings for the simple recurrence method
(from Forsythe and Moler's book) (see Section 3.5.2).