CS-Tech-Era

Friday, 17 July 2015

Vector Arithmetic Operations in CUDA

After learning how to perform addition of two numbers, the next step is to learn how to perform addition and subtraction of two Vectors. The only difference between these two programs is the memory required to store the two Vectors. Also the kernel functions to perform the task.

Addition of two numbers in CUDA: A Simple Approach

Addition is the very Basic & one of the arithmetic operation. To perform it in C language is also a very easy and simple task. In this post, we will convert the C language code into a CUDA code. The steps to remember for writing a CUDA code for any program are as follows:

Two Dimensional (2D) Image Convolution in CUDA by Shared & Constant Memory: An Optimized way

After learning the concept of two dimension (2D) Convolution and its implementation in C language; the next step is to learn to optimize it. As Convolution is one of the most Compute Intensive task in Image Processing, it is always better to save time required for it. So, today I am going to share a technique to optimize the Convolution process by using CUDA. Here we will use Shared Memory and Constant Memory resources available in CUDA to get fastest implementation of Convolution.

Two Dimensional (2D) Image Convolution : A Basic Approch

Image Convolution is a very basic operation in the field of Image Processing. It is required in many algorithms in Image Processing. Also it is very compute intensive task as it involves operation with pixels.

What is CUDA?

CUDA (Compute Unified Device Architecture) is invented by NVIDIA Corporation Pvt. Ltd. First release of CUDA was in the year mid 2007.
CUDA is a Parallel Computing Platform one of its first kind which enables General Purpose Computing (widely known as GPGPU) in a very efficient and easy way. CUDA enables user to exploit the computing power of Graphics Processing Unit (GPU) present in underlying hardware. Before the inception of CUDA, one had to use DirectX or OpenGL

One Dimensional (1D) Image Convolution in CUDA by using TILES

Tiled algorithms are a special case in CUDA as we can Optimize the algorithm implementation, by using this strategy. It is very useful when we want to achieve maximum usage of available GPU hardware, present in the system. It has several advantages over naive CUDA implementations such as improved Memory bandwidth, reduced memory read/write operations,etc. Tiled implementation uses Shared memory available in GPU hardware which is much faster as compared to Global Memory in GPU. In any naive CUDA implementation only Global memory is used for all read and write operations. So, if these memory (read/write) operations are huge in number then the more time is wasted only in transferring the data which results in low/poor performance.

One Dimensional (1D) Image Convolution in CUDA

First let me tell you that if you are reading this page then you are already looking for some advance stuff in today's technology as both CUDA & Image Processing are highly demanding as well as advanced technologies. On this blog we will be mainly focusing on use of CUDA(Compute Unified Device Architecture) technology to improve Image Processing Algorithms.The improvement is mainly with respective to Time & Space required to Image Processing Algorithm. You may refer to concern links provided to get more information about both the fields.

CS-Tech-Era

Pages

Friday, 17 July 2015

Vector Arithmetic Operations in CUDA

Sunday, 12 July 2015

Addition of two numbers in CUDA: A Simple Approach

Sunday, 5 July 2015

Two Dimensional (2D) Image Convolution in CUDA by Shared & Constant Memory: An Optimized way

Tuesday, 23 June 2015

Two Dimensional (2D) Image Convolution : A Basic Approch

Thursday, 28 May 2015

What is CUDA?

Wednesday, 27 May 2015

One Dimensional (1D) Image Convolution in CUDA by using TILES

Thursday, 30 April 2015

One Dimensional (1D) Image Convolution in CUDA