i'm trying program opencl.
there 2 types of memory object. 1 buffer , 1 image.
some blogs , web site,white papers 'image object little bit faster buffer because of cache'.
i'm trying use image object , reason 'clamp', make kernel code more simpler , faster(my opinion)
my question 'is possible use image object , local memory , faster(than using buffer object local memory)?"
data-> image object-> copy local memory -> operations -> write other image object.
as far understood, cannot use async_work_group_copy instruction local memory in case.
so have copy , synchronize manually local memory. make overhead lot.
the real answer "it depends". implementations don't have value in doing async_work_group_copy. image reads may higher latency buffer reads when there cache hit, may better cache behaviour them on architectures. clamping, address calculation , filtering free operations performed dedicated hardware, you'd have shift shader code when using buffers, reduces read latency , may increase throughput.
if going big caching benefits images, local memory may in way. cost of writing it, synchronizing, reading it, calculating addresses , on may cost you.
sadly 1 of things you'll have experiment on target architectures.
Comments
Post a Comment