Track: Computers and Computing
Abstract
Abstract— the true internal working of a parallel algorithm depends on the method of exploitation, as well as hardware, capability and environment to which it is being exploited either for data intensive or scientific purposes. In this paper, we perform parallel programming of matrix multiplication using CUDA on a GPU and make comparisons between the results obtained to the sequential execution results on the CPU. Tests are also carried out on varied topology when the parallel algorithm is executed in its best dimension to proffer suitability. It has been observed, however, that for large computational domains, the parallel implementation of the matrix multiplication provides a significant reduction.
Keywords—GPGPU; CUDA; Parallel Computing; Topology; Dimension; Speedup.