OpenCL Peter Holvenstot. OpenCL Designed as an API and language specification Standards maintained by the Khronos group  Currently 1.0, 1.1, and 1.2

  • Published on
    17-Dec-2015

  • View
    212

  • Download
    0

Embed Size (px)

Transcript

  • Slide 1
  • OpenCL Peter Holvenstot
  • Slide 2
  • OpenCL Designed as an API and language specification Standards maintained by the Khronos group Currently 1.0, 1.1, and 1.2 Manufacturers release their own SDK and drivers Major backers: Apple, AMD/ATI, Intel
  • Slide 3
  • OpenCL Alternative to CUDA Not limited to ATI GPUs Designed for heterogenous computing Executable on many devices, including CPUs, GPUs, DSPs, and FPGAs
  • Slide 4
  • OpenCL Similar structure of host programs and kernels Set of compute devices is called a 'context' Kernels executed by 'processing elements' Kernels can be compiled at run-time or build-time
  • Slide 5
  • OpenCL Task Parallelism many kernels running at once OpenCL 1.2 device can be partitioned down to single Compute Unit Built-in kernels for device-specific functionality
  • Slide 6
  • Advantages Same code can be run on different devices Can also be run on NVIDIA GPUs! AMD/ATI attempting to integrate compute elements into other platforms (Accelerated Processing Units) Limited library of portable math routines Most common BLAST and FFT routines
  • Slide 7
  • Performance
  • Slide 8
  • Slide 9
  • Slide 10
  • Disadvantages No official implementation Vendors may meet specs or add restrictions Apple adds restrictions on group size Devices need appropriate settings to perform well Different capabilities different performance Solution: Tuning/load balancing framework
  • Slide 11
  • Non-Optimized Performance
  • Slide 12
  • Slide 13
  • Restrictions No recursion, variadics, or function pointer Cannot dynamically allocate memory from device No native variable-length arrays, double-precision Some can be worked around by extensions
  • Slide 14
  • Terminology CUDA: Scalar Core Streaming Multiprocssr Warp PTX OpenCL: Stream Core Compute Unit Wavefront Intermediate Language
  • Slide 15
  • Terminology CUDA: Host Memory Global/Device Memory Local Memory Constant Memory Shared Memory Registers OpenCL: Host Memory Global Memory Constant Memory Local Memory Private Memory
  • Slide 16
  • Terminology CUDA: Grid Block Thread Thread ID Block Index Thread Index OpenCL: NDRange Work group Work item Global ID Block ID Local ID
  • Slide 17
  • References http://blog.accelereyes.com/blog/wp- content/uploads/2012/02/CUDAvsOpenCL.pdf http://blog.accelereyes.com/blog/wp- content/uploads/2012/02/CUDAvsOpenCL.pdf https://wiki.aalto.fi/download/attachments/40025977 /Cuda+and+OpenCL+API+comparison_presented.p df https://wiki.aalto.fi/download/attachments/40025977 /Cuda+and+OpenCL+API+comparison_presented.p df http://www.hpcwire.com/hpcwire/2012-02- 28/opencl_gains_ground_on_cuda.html http://www.hpcwire.com/hpcwire/2012-02- 28/opencl_gains_ground_on_cuda.html http://www.netlib.org/utk/people/JackDongarra/PAP ERS/parcocudaopencl.pdf http://www.netlib.org/utk/people/JackDongarra/PAP ERS/parcocudaopencl.pdf http://www.netlib.org/lapack/lawnspdf/lawn228.pdf

Recommended

View more >