This changes the vector<T> constructors which copy or initialize
data to take a queue argument used for performing the operations.
Previously they just took a context argument used to initialize the
buffer and then created a new command queue to use. This improves
performance by not requiring a new command queue and also fixes issues
when performing operations on a different command queue while the
vector was still being initialized.
This uses Boost.Preprocessor macros to allow zip iterators to work with
arbitrary number of elements (the current limit is maximum boost::tuple
size which is 10 by default).
Refs #50
This makes online cache use sha1 of the program source as key.
Introduces boost::compute::detail::sha1() function, which is moved
from compute::program into its own header file.
detail::getenv() function was not declared inline, which led to
`multiple definition` errors at link time when a program consisted of
multiple objects that included Boost.Compute headers.
Fixed the problem and added core.multiple_objects test.
Instead of building the program from source with the added comment
block (used for distinction between different platforms and devices
when offline cache is in use), only use the altered source for the
hash computation. This way users will not get unexpected results from
program.source().
This adds interoperability support between Boost.Compute and various
other C/C++ libraries (Eigen, OpenCV, OpenGL, Qt and VTK). This eases
development for users using external libraries with Boost.Compute.
See kylelutz/compute#21
This adds program::build_with_source() function that both creates and
builds the program for the given context with supplied source and
compile options. In case BOOST_COMPUTE_USE_OFFLINE_CACHE macro is
defined, it also saves the compiled program binary for reuse in the
offline cache located in $HOME/.boost_compute folder on UNIX-like
systems and in %APPDATA%/boost_compute folder on Windows.
All internal uses of program::create_with_source() followed by
program::build() are replaced with program::build_with_source().
This adds a improved reduce() algorithm implementation for
GPUs. Also adds checks to accumulate() which allow it to
use the higher-performance reduce() algorithm if possible.
This adds adds an overload of the reduce() function which
uses plus<T>() as the reductor. This simplifies the common
case of calculating the sum for a range of values.
This removes the init argument from reduce. This simplifies the
implementation and avoids copying a value from the host to the
device on every call to reduce.
If an initial value is required, the accumulate function can be
called instead.
This fixes a compilation error which occurs on Windows when
registering the default error handler callback when creating
a new context object.
In OpenCL 1.1 and later the callback function is expected to
use the __stdcall calling convention. This is optionally defined
by the CL_CALLBACK macro on WIN32 platforms. If available, it is
defined with the BOOST_COMPUTE_CL_CALLBACK macro which is then
used to annotate the callback functions.
This adds an experimental algorithm like copy_if() which copies
the index of the values for which predicate returns true instead
of the values themselves.
This adds an error handler function which is invoked when an OpenCL
context encounters an error condition. The context error is converted
to a C++ exception containing the error information and thrown.
This adds a new function which will return the named field
from a value. For example, this can be used to return one of
the components of a pair object or to swizzle a vector value.
This adds a new macro to ease the definition of custom user
functions. The BOOST_COMPUTE_FUNCTION() macro creates a new
boost::compute::function<> object with the provided return
type, argument types, function name and OpenCL source code.
This refactors the invoked_function<> classes. Previously each
function arity (e.g. unary, binary) had a separate invoked_function<>
template class. Now they all use the same class which simplifies the
logic in function<> and meta_kernel.
This fixes a bug in which type definitions were being inserted
into meta_kernel's multiple times. Also forces zip_iterator to
insert its type definitions when used in a kernel.
This adds a macro for registering custom type names for C++ types
to be used in OpenCL kernel code. Internally the macro specializes
the type_name<T>() function.