opencl processing elements

can't query number of PEs

PE = virtual scalar processor


kernel = expression for one work item
global size = number of work items (kernel duplication over data)
local size = number of work items in a work-group

guess: work-group determines SIMD

a work group executes on a single compute unit?

so basically:
- local size large enough to fill the simd / simt
- enough groups to cover all compute units

it's possible to pass NULL as local size



groups are also important because they can share __local memory.