This refactors the system::default_device() method. Now, the
default compute device for the system is only found once and
stored in a static variable. This eliminates many redundant
calls to clGetPlatformIDs() and clGetDeviceIDs().
Also, the default_cpu_device() and default_gpu_device() methods
have been removed and their usages replaced with default_device().
This changes the enqueue_nd_range_kernel() method to return an
event object. This allows clients to monitor the progress of a
kernel executing on a device.
This cleans up the constructor methods for the OpenCL wrapper
classes and unifies the API used for creating a wrapper class
object from the underlying OpenCL objects.
Now, every wrapper class has a constructor taking the OpenCL
object and an optional boolean retain parameter which indicates
whether the constructor should increment the reference count.
This remove the enqueue_wait_for_event() method from the
command_queue class as the clEnqueueWaitForEvents() function
has been deprecated in OpenCL 1.2.