python 2.7 - Bench marking of PyOpenCL programs -
I am trying to benchmark my FFT program on a GPU using PyOpenCL. When using OpenCL 'Profiling' and Python's 'Time' module, I see completely different results. To use profiling, I do something like this,
queue = cl.CommandQueue (ctx, properties = cl.command_queue_properties.PROFILING_ENABLE) & lt; Other code & gt; In category (n): events.append (prg3.butterfly (qi, (lid (twid), none, twid_dev, & lt; buffers & gt;)) events [i] .wait () category For i (n): elapsed = elapsed + 1e-9 * (event [i]. Profile.and - event [i] .profile. Start)
time Time can be used in such a way, for the category (n)
k = time.time (): event = prg3.butterfly (line, twid) ), None, twid_dev, & lt; buffers & gt;) print time.time (s) - k
Since both of these Do not give completely different results for = 20, (unless the answer is right and true!), I have the following question.
- What does event profiling really do and it is adding time spent in event.wait ()?
- Since the answer is similar to the event without incident. (2) In case, is it just the right amount of time spent executing the kernel?
Please highlight me about the correct method of benchmarking OpenCL programs in Python
Your second case is just a capture taken to run the kernel, not really to run it. As soon as the kernel orientation is placed in the queue as the return of this anch kernel call - the kernel will be run asynchronous with its host code. For the time of kernel execution, simply wait for the end of all encrypted commands to add the call:
k = time.time () to the category (n): event = prg3. Butterfly (queue, (twid), none, twid_dev, & lt; buffers & gt;) queue.finish () print time.time () - k
your first The case is right time spent within kernel execution, but unnecessarily blocking the host between each kernel invoice. You can enqueue all commands once again, then you can use queue.finish ()
:
for i category (n): Events.append (prg3.butterfly (line (twid), none, twid_dev, & lt; buffers & gt;)) range (n) for queue.finish (): elapsed = elapsed + 1e -9 * (Event [i] .profile.end - Event [i] .profile.start) Print has passed
Both of these approaches should be returned almost the same time.
Comments
Post a Comment