Wednesday, March 9, 2011

CUDA: GPGPU without Graphics Knowledge.

[Tip] 
CUDA is NVIDIA’s parallel computing architecture. It enables dramatic increases in computing performance by harnessing the power of the GPU.
 
[Details]
"CUDA programming" and "GPGPU programming" are not the same (although CUDA runs on GPUs). Previously, writing software for a GPU meant programming in the language of the GPU. CUDA permits working with familiar programming concepts while developing software that can run on a GPU. It also avoids the performance overhead of graphics layer APIs by compiling your software directly to the hardware (GPU assembly language, for instance), thereby providing great performance.
With millions of CUDA-enabled GPUs sold to date, software developers, scientists and researchers are finding broad-ranging uses for CUDA, including image and video processing, computational biology and chemistry, fluid dynamics simulation, CT image reconstruction, seismic analysis, ray tracing, and much more.
Computing is evolving from "central processing" on the CPU to "co-processing" on the CPU and GPU. To enable this new computing paradigm, NVIDIA invented the CUDA parallel computing architecture that is now shipping in GeForce, ION, Quadro, and Tesla GPUs, representing a significant installed base for application developers.

Here is a very simple example code Details are expalined at http://drdobbs.com/cpp/207402986.
Sample code to increment all elements in the Array in Host(CPU) and Device(GPU):
void incrementArrayOnHost(float *a, int N )
{
  int i;
  for (i=0; i < N; i++) a[i] = a[i]+1.f;
}
//
__global__ void incrementArrayOnDevice(float *a, int N)
{
/*
In the kernel on the CUDA-enabled device, several built-in variables are available
that were set by the execution configuration of the kernel invocation.
They are:

blockIdx which contains the block index within the grid.
threadIdx contains the thread index within the block.
blockDim contains the number of threads in a block.
*/
  int idx = blockIdx.x*blockDim.x + threadIdx.x;
  if (idx<N) a[idx] = a[idx]+1.f;
}

// Calling CUDA kernel for incrementing an Array[a_d] of size ARRAY_MAX
incrementArrayOnDevice <<< 1, ARRAY_MAX>>> (a_d, ARRAY_MAX); // a_d is the float array in GPU, ARRAY_MAX is the size of Array.


[Reference]

Posted By : Santhosh G.

No comments:

Post a Comment