This adds an experimental algorithm like copy_if() which copies
the index of the values for which predicate returns true instead
of the values themselves.
This adds an error handler function which is invoked when an OpenCL
context encounters an error condition. The context error is converted
to a C++ exception containing the error information and thrown.
This adds a new function which will return the named field
from a value. For example, this can be used to return one of
the components of a pair object or to swizzle a vector value.
This adds a new macro to ease the definition of custom user
functions. The BOOST_COMPUTE_FUNCTION() macro creates a new
boost::compute::function<> object with the provided return
type, argument types, function name and OpenCL source code.
This refactors the invoked_function<> classes. Previously each
function arity (e.g. unary, binary) had a separate invoked_function<>
template class. Now they all use the same class which simplifies the
logic in function<> and meta_kernel.
This fixes a bug in which type definitions were being inserted
into meta_kernel's multiple times. Also forces zip_iterator to
insert its type definitions when used in a kernel.
This adds a macro for registering custom type names for C++ types
to be used in OpenCL kernel code. Internally the macro specializes
the type_name<T>() function.
This adds a new unpack() function adaptor which converts
a function with N arguments to a function which takes a
single tuple argument with N components.
This is useful for calling built-in functions with the tuples
values returned from zip_iterator. This also removes the now
un-needed binary_transform_iterator.
This adds a test for computing the minimum and maximum
values of a vector simultaneously using reduce() with a
custom reduction function.
Also fixes a bug in reduce() in which inplace_reduce() was
being used even if the input type and result type differed.
This fixes an issue in which the source strings for binary
and ternary functions were not being stored and thus not
being inserted into kernels when they were invoked.
This adds a program cache which can be used by algorithms and other
functions to store programs which may be re-used. This improves
performance by reducing the need for costly recompilation of commonly
used programs.
Program caches are context specific and multiple copies of the same
context will use the same program cache. They are created and accessed
by the global get_program_cache() function.
For now, only a few algorithms and functions (radix sort, mersenne
twister, fixed size sorts) make use of the program cache.
This adds a sort_by_transform() algorithm which sorts a sets of
values based on the value of a transform function.
For example, this can be used to sort a set of vectors by their
length (when used with the length<T>() function) or by a single
component (when used with the get<N>() function).
This adds a new sort_by_key() algorithm which sorts a range
of values by a range of keys with a comparison operator.
For now this is only implemented by the serial insertion sort
algorithm. In the future it will be ported to the other sorting
algorithms (e.g. radix sort).
This adds an output iterator result argument to the reduce()
algorithm. Now, instead of returning the reduced result, the
result is written to an output iterator. This allows the value
to stay on the device and avoids a device-to-host copy in cases
where the result is not needed on the host (e.g. it is part of
a larger computation).
This is an API breaking change to users of reduce(). Affected code
should now declare a result variable and then pass a pointer to it
as the new result argument.
This adds a copy() specialization for host-to-host transfers
which simply forwards the call to std::copy().
This is useful in templated algorithms which may in certain
circumstances copy() between data ranges on the host.
This adds a new scan_on_cpu() algorithm which implements the scan()
algorithm for CPU devices. Also renames the existing scan() algorithm
to scan_on_gpu().
This fixes some tests failures on POCL which were caused by the prior
GPU scan() algorithm not functioning properly with POCL.
This changes the checks for the device type to use the bitwise-and
operator instead of the equaility operator. The returned type is a
bitset and this would cause errors when multiple bits were set.
This fixes a bug on POCL which returns the device type as a
combination of CL_DEVICE_TYPE_DEFAULT and CL_DEVICE_TYPE_CPU. Now
the correct device type (device::cpu) is detected for POCL.
This fixes an issue in which comparison operators (e.g. <, ==)
in lambda expressions would return the wrong result type causing
compilation errors.
Also adds a few test cases to ensure the correct result type
and that lambda expressions can be properly used with count_if().
This adds a random number distribution which generates random
numbers in a uniform distribution.
Also adds a convenience algorithm which fills a range with
uniformly distributed random numbers between two values.
This adds an enqueue_migrate_memory_objects() method to the
command_queue class which allows memory objects to be migrated
between compute devices and to the host.
This makes a few tweaks to the reduce() algorithm in order to
improve performance. An unnecessary barrier() has been removed
and now multiple values are reduced on the initial read.
This changes the meta_kernel::add_arg() overload with a name
and a value to a separate method. This fixes conflict when
using add_arg() with string values.
This adds a specialization for the get<N>() function when used
with zip_iterator's. Now, only the N'th iterator for the expression
will be dereferenced instead of dereferencing all of the iterators
into a tuple and then extracting the N'th component.
This removes the cv-qualifiers for the value-type returned from
get<N>() expressions. This fixes issues when specializing based
on the type (e.g. pair, tuple).
This fixes a bug in the meta_kernel streaming operators with
float values. Now, float scalar and vector literals are inserted
into the kernel source with the proper 'f' suffix.
This makes some improvements to the system::find_default_device()
method. Now, the devices on the system will only be queried once
when searching for the default device. This reduces the number of
calls to clGetPlatformIDs() and clGetDeviceIDs().
Also, in the case that no GPU or CPU devices are found, the first
device on the system will be selected as the default device. This
fixes issues when using Boost.Compute with pocl.