Introduction to OpenCL, 2010

  • Published on
    13-May-2015

  • View
    674

  • Download
    4

Embed Size (px)

DESCRIPTION

Introduction to OpenCL, presentation from OpenCL workshop at OzViz 2010 held in Brisbane, Australia.

Transcript

<ul><li>1.Introduction to OpenCL How to select OpenCL devices, initialise a compute context, allocate device memory, compile and run kernels, output resultsOpenCL Workshop | December 1, 2010 | Brisbane, Australia! Tomasz Bednarz, CESRE!</li></ul><p>2. OpenCL is a trademark of Apple, Inc.Welcome to Open Computing Language (OpenCLTM) N-Body Simulation Demo" Khronos Group and OpenCL standard" OpenCL Anatomy" Platform Model" Execution Model" Memory Model" Short Introduction to OpenCL Programming " OpenCL C language" Supported data types" Synchronisation primitives" Additional information and resources."CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 3. N-Body Simulation: demo 4. N-Body SimulationLars Nyland, Mark Harris, Jan Prins Fast N-Body Simulation with CUDA. In Hubert Nguyen, editor, GPU Gems 3, chapter 31, pages 677-695, Addison Wesley 2007. Applications" Molecular dynamics" Astronomical and astrophysical simulations" Fluid dynamics simulation" Radiosity (Radiometric transfer)" N2 interactions to compute per time-step" For the brute force all-pairs approach discussed here" Highly Parallel" High Arithmetic intensity"Two of these galaxies attract each other. CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 5. N-Body Simulation (http://developer.nvidia.com/gpugems3) N-Body simulation models the motion of particles subject to a force due to the particle-particle interactions between all particles in the system" Typical example: simulation of stars in a galaxy subject to the gravitational force" Given N bodies with an initial position xj and velocity vj for 1iN, the force fij on body i caused by its gravitational attraction to body j is given by the following:"fij = Gmi m j rij2!rij rijFi =#fij = Gmi1! j!N i" j#m j rij1! j!N i" jrij3where mi and mj are the masses of bodies i and j." The acceleration is computed as:" Fai =jimiCSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.irij = x j ! xi 6. N-Body Simulation As bodies approach each other, the force between them grows without bound, therefore softening factor e2&gt;0 may be added"Fi ! Gmi# 1" j"Nm j rij(2rij + e2)32 The softening factor limits the magnitude of the force between the bodies, which is desirable for numerical integration of the system state" Acceleration:"F ai = i ! G " $ mi 1# j#Nm j rij(2rij + e2)32CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 7. N-Body Simulation: parallel conceptsingle interaction between i and jOuter Loop (i)Particle iParticle jInner Loop (j) Particles i, j interact with each other" OpenCL can be used to compute acceleration on all bodies in parallel " N/p work groups of p work items process p bodies at a time" Every work item loads all other body positions from off-chip memory" N2 loads bandwidth bound = poor performance " Optimization (using tiles) to be presented in the afternoon session" 8. N-Body Simulation: body-body force calculationFi ! Gmi# 1" j"Nai =Fi ! G" $ mi 1# j#Nm j rij( (http://developer.download.nvidia.com/compute/opencl/sdk/website/samples.html#oclNbody http://developer.apple.com/library/mac/#samplecode/OpenCL_NBody_Simulation_Example/Introduction/Intro.html2rij + e2m j rij 2rij + e2) )3322 9. N-Body Simulation: demo 10. The Khronos Group 11. http://www.khronos.org/opencl/ 12. http://www.khronos.org/opencl/ 13. http://www.khronos.org/opencl/What is OpenCL? OpenCL - Open Computing Language: open, royalty-free standard for programming heterogeneous parallel computing at the intersection of GPU and multi-core CPU capabilities.CPUs Multiple cores driving performance increasesMulti-processor programming, threading libraries - e.g. OpenMPGPUs Emerging IntersectionHeterogeneous ComputingCSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.Increasingly general purpose data-parallel computingGraphics APIs and Shading Languages, Vendor Compute APIsCourtesy of 14. What is OpenCL? Roadmap convergenceOpenGL 4.0 and OpenGL ES 2.0 are both streamlined, programmable pipelines. GL and ES working groups are working on convergence. WebGL is a positive pressure for portable 3D content for all platforms.Desktop Visual ComputingOpenGL and OpenCL have direct interoperability. OpenCL objects can be Created from OpenGL Textures, Buffer Objects and Renderbuffers.Parallel computing and visualisation OpenCL the center of a visual computing ecosystem with parallel computations, 3D, video, audio, and image processing on desktop, embedded and mobile systems!Desktop 3D EcosystemCross-platform desktop 3D3D for Web Heterogeneous Parallel Programing Embedded 3DSurface and synch abstractionStreaming Media and Image ProcessingMobile Visual Computing Compute, graphics and AV APIs interoperate through EGL.CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.Hundreds of men years invested by industry experts in coordinated ecosystem!Streamlined APIs for mobile and embedded graphics, media and compute acceleration Based on http://www.khronos.org/opencl/ 15. OpenCL Timeline OpenCL 1.0 was released six months after the proposal was created" OpenCL ships rst on Apples Mac OS X Snow Leopard" 18 month cadence between OpenCL 1.0 and OpenCL 1.1" Backward compatible to protect software investment" Multiple conformant implementations ship across diverse OS and platforms.!Khronos releases publicly OpenCL 1.1 as royalty-free specication.!June 2008May 2009 December 2008OpenCL working group! is proposed by Apple. ! Draft spec is contributed! to Khronos.!June 2010 2nd Half 2009Khronos releases OpenCL 1.0 conformance tests to ensure highquality implementations.!CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.OpenCL 1.1 spec is released and rst implementation ship.!Based on http://www.khronos.org/opencl/ 16. OCL Quick Reference Cardshttp://www.khronos.org/files/opencl-quick-reference-card.pdf 17. Design goals of OpenCL Enable all compute resources in system" CPUs, GPUs, and other processors enabled as peers" Data- and task- parallel compute model" Efcient parallel programming model" ANSI C99 based kernel language" Low-level abstraction" Abstracts the specics of the underlying hardware" High-performance, but device independent " Dene precision requirements for all oating-point computations" Consistent results on all platforms and devices" Interoperability with Graphics APIs" Dedicated support for OpenGL, OpenGL ES and DirectX" Drive future hardware requirements" Applicable to both consumer and HPC applications" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 18. OpenCL Platform Model 19. Its heterogeneous world Platform model encapsulates compute resources" A modern platform includes:" One or more CPUs" One or more GPUs" Optional accelerators (e.g. DSPs)" Other?"Using OpenCL Programmers write a single portable program that uses ALL resources ! in the heterogeneous platform!CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.Based on http://www.khronos.org/opencl/ 20. OpenCL Platform Model One Host connected to one or more Compute Devices" Compute device can be a CPU, GPU or other processor" Each Compute Device is composed of one or more Compute Units" Compute Unit can may be a core, multi-processor, etc." Each Compute Unit is further divided into one or more Processing Elements " Processing Elements execute code as SIMD or SPMD! PROCESSING ELEMENT. COMPUTE UNITCOMPUTE UNITCOMPUTE UNITCOMPUTE UNITCOMPUTE UNITCOMPUTE UNIT.....COMPUTE DEVICECOMPUTE DEVICEHOST! CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.COMPUTE UNIT 21. Anatomy of OpenCL Application OpenCL Application Device Code - Written in OpenCL C - Executes on the deviceHost Code - Written in C/C++ - Executes on the hostCOMPUTE UNITCOMPUTE UNITCOMPUTE UNITCOMPUTE UNITCOMPUTE DEVICE.HOST!COMPUTE UNITCOMPUTE UNIT.....COMPUTE DEVICESCOMPUTE UNITCOMPUTE DEVICE Host code sends commands to the Devices:" To transfer data between host memory and device memories! To execute device code!CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 22. Anatomy of OpenCL Application Serial code executes in a Host (CPU) thread" Parallel code executes in many Device (GPU) threads across multiple processing elements" OCL Application Serial code Parallel code Serial code Parallel codeHost = CPUDevice = GPUHost = CPUDevice = GPUCSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 23. OpenCL Execution Model 24. OpenCL Execution Model OpenCL application runs on a Host which submits work to the Compute Devices! Work item: the basic unit of work on an OpenCL device" Kernel: the code for a work item, which is basically C function" Program: Collection of kernels and other functions (analogous to a dynamic library). Managed by host." Context: The environment within which work-items execute, which includes devices and their memories and command queues (contains all resources for computation)" Command queue: A queue used by the Host application to submit work to a Device (kernel execution instances)" Work is queued in-order, one queue per device" Work can be executed in-order or out of order" Events are used for synchronisation" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.MEMORY!GPU!CPU!CONTEXT GPU &amp; CPU QueuesCOMMANDS 25. OpenCL Execution Model Portable execution model that allows a kernel to execute at each point in a problem domain (N-dimensional computational domain) decomposition of a task into work-items! Traditional loop as a function in COpenCL C kernelvoid ! addVector(const float *A,! const float *B,! float *C,! int N)! {! int index;!__kernel void ! addVector(__global const float *A,! __global const float *B,! __global float *C,! int N)! {! int index = get_global_id(0);!!! for (index=0; index platformIDs;" err = clGetPlatformIDs(NULL, NULL, &amp;num_platforms); if (err != CL_SUCCESS) { } platformIDs.resize(num_platforms); // get all OpenCL platform IDs err = clGetPlatformIDs(num_platforms, &amp;platformIDs[0], NULL);CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.If NULL, the arguments are ignored 41. Context creation: device IDs SIMPLE: get rst GPU associated with the platform:" "cl_device_id device;" err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &amp;device, NULL);" Get all platform IDs:" "cl_uint nDevices;" cl_device_type deviceType;" vector deviceIDs;" "cl_int clGetDeviceIDs(! cl_platform_id platform," cl_device_type device_type," cl_uint num_entries," cl_device_id *devices," cl_uint *num_devices)" DEVICE TYPE:!if (platformIDs.size() == 0) {" CL_DEVICE_TYPE_CPU" // get number of device IDs for default platform" CL_DEVICE_TYPE_GPU" CL_DEVICE_TYPE_ACCELERATOR" err = clGetDeviceIDs(NULL, deviceType, 0, NULL, &amp;nDevices); " CL_DEVICE_TYPE_DEFAULT" } else {" CL_DEVICE_TYPE_ALL" // get number of device IDs for selected platform" err = clGetDeviceIDs(platformIDs[selectedPlatform], deviceType, 0, NULL, &amp;nDevices); " }" deviceIDs.resize(nDevices);" if (platformIDs.size() == 0) {" // get default device IDs of default platform" err = clGetDeviceIDs(NULL, deviceType, nDevices, &amp;deviceIDs[0], NULL); " } else {" // get device IDs of selected platform" err = clGetDeviceIDs(platformIDs[selectedPlatform], deviceType, nDevices, &amp;deviceIDs[0], NULL); " }" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 42. Context creation SIMPLE EXAMPLE: create context object! "cl_context context;" context = clCreateContext(NULL, 1, &amp;device, NULL, NULL, NULL);" Create OpenCL context for few devices:! "cl_int err;" cl_context context; context = clCreateContext(NULL, deviceIDs.size(), &amp;deviceIDs[0], NULL, NULL, &amp;err); if (err != CL_SUCCESS) { } cl_context clCreateContext(! const cl_context_properties *properties," cl_uint num_devices," const cl_device_id *devices, " void CL_CALLBACK *pfn_notify," void *user_data," cl_int *errcode_ret)"cl_contet_properties_enum:! CL_CONTEXT_PLATFORM" CL_CONTEXT_D3D10_DEVICE_KHR" CL_GL_CONTEXT_KHR" CL_EGL_DISPLAY_KHR" ..." "CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 43. Error Handling and Resource Deallocation Error handling:" All host functions return an error code" Context error callback" The callback function may be called asynchronously by OpenCL and it is the applications responsibility to ensure that the callback function is thread-safe" Resource deallocation" Reference counting API: clRetain*(), clRelease*()" clRetainContext();" clReleaseContext();" clRetainMemObject();" clReleaseMemObject();" clRetainKernel();" clReleaseKernel();"CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 44. OpenCL C Derived from ISO C99! Features added to the language:! Work-items and work-groups" Vector types" Synchronisation" Address space qualiers" Also includes a large set of built-in functions:! Image manipulation" Work-item manipulation" Math functions"CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 45. OpenCL C Language Restrictions:! No functions dened in C99 standard headers" No recursion supported" Pointers to function are not permitted" Pointers to pointers allowed within a kernel, but not as an argument" No variable length arrays and structures" Bit elds are not supported" Writes to a pointer to a type less than 32 bits are not supported*" Double types are not supported, but reserved" 3D Image writes are not supported" "" *Some restrictions are addressed through extensions"CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 46. OpenCL C Optional Extensions Extensions are optional features exposed through OpenCL" The OpenCL working group has already approved many extensions to the OpenCL specication:" Double precision oating-point types" Built-in functions to support doubles" Atomic functions*" Byte-addressable stores (write to pointers to types &lt; 32 bits)*" 3D Image writes" Built-in functions to support half types"* New core features in OpenCL 1.1CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 47. OpenCL C: Data Types Scalar data types" char, uchar, short, ushort, int, uint, long, ulong, oat" bool, intptr_t, ptrdiff_t, size_t, uintptr_t, void, half (storage)" Image types" Image2d_t, image3d_t, sampler_t, event_t" Vector data types" Vector lengths 2, 3*, 4, 8, 16 (char2, ushort4, int8, oat16, double2^, )" Endian safe" Aligned at vector length" Vector operations" Built-in function "* New core features in OpenCL 1.1 ^ Double is optional type in OpenCL...</p>