5th Annual International Conference on Industrial Engineering and Operations Management

Optimizing SIMT Architecture using CUDA

Christopher Umahaeyo
Publisher: IEOM Society International
0 Paper Citations
1 Views
1 Downloads
Track: Computers and Computing
Abstract

Abstract— the true internal working of a parallel algorithm depends on the method of exploitation, as well as hardware, capability and environment to which it is being exploited either for data intensive or scientific purposes. In this paper, we perform parallel programming of matrix multiplication using CUDA on a GPU and make comparisons between the results obtained to the sequential execution results on the CPU. Tests are also carried out on varied topology when the parallel algorithm is executed in its best dimension to proffer suitability. It has been observed, however, that for large computational domains, the parallel implementation of the matrix multiplication provides a significant reduction.

Keywords—GPGPU; CUDA; Parallel Computing; Topology; Dimension; Speedup.

Published in: 5th Annual International Conference on Industrial Engineering and Operations Management, Dubai, United Arab Emirates

Publisher: IEOM Society International
Date of Conference: March 3-5, 2015

ISBN: 978-0-9855497-2-5
ISSN/E-ISSN: 2169-8767