Showing posts with label CUDA TILES. Show all posts
Showing posts with label CUDA TILES. Show all posts

Friday, 25 March 2016

TILED Matrix Multiplication Using Shared Memory in CUDA

Tiled Matrix Multiplication using Shared Memory in CUDA
Tiled Matrix Multiplication in CUDA
 Today, I am going to discuss Matrix Multiplication in CUDA. In CUDA, number of memories are present. As we have already discussed about the same in previous post "What is CUDA". Matrix Multiplication is very basic but a crucial algorithm in the field of Engineering & Computer Science. I assumed that one who is reading this post knows how to perform Matrix Multiplication in at least one programming language. (C, C++, Python, etc).










Wednesday, 27 May 2015

One Dimensional (1D) Image Convolution in CUDA by using TILES

          Tiled algorithms are a special case in CUDA as we can Optimize the algorithm implementation, by using this strategy. It is very useful when we want to achieve maximum usage of available GPU hardware, present in the system. It has several advantages over naive CUDA implementations such as improved Memory bandwidth, reduced memory read/write operations,etc. Tiled implementation uses Shared memory available in GPU hardware which is much faster as compared to Global Memory in GPU. In any naive CUDA implementation only Global memory is used for all read and write operations. So, if these memory (read/write) operations are huge in number then the more time is wasted only in transferring the data which results in low/poor performance.