This refactors the system::default_device() method. Now, the
default compute device for the system is only found once and
stored in a static variable. This eliminates many redundant
calls to clGetPlatformIDs() and clGetDeviceIDs().
Also, the default_cpu_device() and default_gpu_device() methods
have been removed and their usages replaced with default_device().
This adds checks to the device test-suite to ensure that the
current device supports the partitioning types before attempting
to use the corresponding device::partition_*() methods.
This fixes a couple of narrowing conversion warnings in the
device partitioning methods which were seen when compiling
VexCL with Boost.Compute in C++11 mode.
This adds a get<N>() function which returns the n'th element
of an aggregate type (e.g. vector type, pair, tuple).
This unifies the functionality of, and replaces, the get_pair()
and vector_component() functions.
This changes the vector class to not auto-initialize values
when it is created or resized. This improves performance by
eliminating a call to fill(). If needed, user code can call
fill() explicitly on the newly allocated values.
This increases the work-group size for the copy() kernel to be
up to 32 items based on the size of the input. This increases the
performance of copy() and related algorithms (e.g. transform()).
This changes the clamp_range() test to use float values instead
of int values. The OpenCL clamp() function is only defined for
float values and this test caused kernel compilation errors on
certain platforms.
Also updates the test to use the new global context.
This adds a clamp_range() algorithm which clamps a range
of values between a low and high value. This is based on
the algorithm of the same name in Boost.Algorithm.
This removes the documentation for the non-existent platforms()
and platform_count() methods in the platform class. These methods
have been moved to the system class and are documented there.
refs kylelutz/compute#9
device, context, and queue are initialized statically in `context_setup.hpp`.
With this change all tests are able to complete when an NVIDIA GPU is in
exclusive compute mode.
Side effect of the change:
Time for all tests to complete reduced from 15.71 to 13.03 sec Tesla C2075.
This adds a test for the enqueue_write_buffer_rect() method
in the command_queue class. This method copies a rectangular
region of memory from the host to a device buffer.
This changes the enqueue_nd_range_kernel() method to return an
event object. This allows clients to monitor the progress of a
kernel executing on a device.
boost::compute::system::default_device() supports the following
environment variables:
BOOST_COMPUTE_DEFAULT_DEVICE for device name
BOOST_COMPUTE_DEFAULT_PLATFORM for OpenCL platform name
BOOST_COMPUTE_DEFAULT_VENDOR for device vendor name
If one or more of these variables is set, then device that satisfies
all conditions gets selected. If such a device is unavailable, then
the first available GPU is selected. If there are no GPUs in the
system, then the first available CPU is selected. Otherwise,
default_device() returns null device.
The hello_world example is modified to use default_device() instead
of default_gpu_device().
This adds a specialization of multiplies<T> for std::complex<T>
which implements complex number multiplication.
Also adds a simple test using transform() to verify the complex
multiplication works correctly.
This fixes an unused variable warning which occurs in the
get_base_iterator_buffer() function when the base iterator
is not a buffer iterator and thus the iter argument is not
used.
This fixes a bug in which boost::result_of() would return the
wrong result type for a function due to the new implementation
using decltype instead of the result_of protocol on compilers
that sufficently support C++11 (such as clang >= 3.2).
Now, boost::tr1_result_of() is used to explicitly request that
the result_of protocol be used even when decltype is supported
by the compiler.
This cleans up the constructor methods for the OpenCL wrapper
classes and unifies the API used for creating a wrapper class
object from the underlying OpenCL objects.
Now, every wrapper class has a constructor taking the OpenCL
object and an optional boolean retain parameter which indicates
whether the constructor should increment the reference count.
This updates the constructors for the image2d and image3d
classes to use the new clCreateImage() function instead of
the deprecated clCreateImage2D/3D() functions.