This changes the enqueue_nd_range_kernel() method to return an
event object. This allows clients to monitor the progress of a
kernel executing on a device.
This cleans up the constructor methods for the OpenCL wrapper
classes and unifies the API used for creating a wrapper class
object from the underlying OpenCL objects.
Now, every wrapper class has a constructor taking the OpenCL
object and an optional boolean retain parameter which indicates
whether the constructor should increment the reference count.
This remove the enqueue_wait_for_event() method from the
command_queue class as the clEnqueueWaitForEvents() function
has been deprecated in OpenCL 1.2.