2
0
mirror of https://github.com/boostorg/compute.git synced 2026-02-01 08:22:17 +00:00
Commit Graph

68 Commits

Author SHA1 Message Date
Kyle Lutz
3bc5bfaf78 Remove timer class
This removes the timer class. The technique of measuring the time
difference between two different OpenCL markers on a command queue
is not portable to all OpenCL implementations (only works on NVIDIA).

A new internal timer class has been added which uses boost::chrono
(or std::chrono if BOOST_COMPUTE_TIMER_USE_STD_CHRONO is defined).
This new timer is used by the benchmarks to measure time elapsed
on the host.
2013-05-20 21:08:42 -04:00
Kyle Lutz
fab7be5f43 Add inplace_merge() algorithm
This adds a simple inplace_merge() algorithm which merges
two contiguous sorted ranges in-place.

For now, the implementation simply copies the ranges to
two temporary vectors and calls merge().
2013-05-20 20:50:12 -04:00
Kyle Lutz
b43e79b983 Add support for get<N>() in lambda expressions
This adds support for using the get<N>() function in lambda
expressions to extract a single component of an aggregate type.

Also adds a test of using boost::tuple<> to store a user-defined
data type on the device and sort them by their first component
using a lambda expression as the comparator.
2013-05-20 20:50:10 -04:00
Kyle Lutz
e46828a9d6 Fix issues involving iterators with void value_type
This fixes a few issues encountered when using iterators with a
void value_type (e.g. std::insert_iterator<>).

The is_contiguous_iterator meta-function was refactored to always
return false for iterators with a void value_type and avoid
instantiating types for containers with a void value_type
(e.g. std::vector<void>::iterator) which previously resulted
in compilation errors.
2013-05-20 19:57:13 -04:00
Kyle Lutz
4ab37ada07 Add system-wide default command queue
This adds a system-wide default command queue. This queue is
accessible via the new static system::default_queue() method.
The default command queue is created for the default compute
device in the default context and is analogous to the default
stream in CUDA.

This changes how algorithms operate when invoked without an
explicit command queue. Previously, each algorithm had two
overloads, the first expected a command queue to be explicitly
passed and the second would create and use a temporary command
queue. Now, all algorithms take a command queue argument which
has a default value equal to system::default_queue().

This fixes a number of race-conditions and performance issues
througout the library associated with create, using, and
destroying many separate command queues.
2013-05-15 20:59:56 -04:00
Kyle Lutz
a2bda0610d Fix memory issues with device_ptr and allocator
This fixes a few memory handling issues between device_ptr,
buffer_iterator, buffer_value, allocator, and malloc/free.

Previously, memory buffers that were allocated by allocator and
malloc were being retained (via clRetainMemObject() in buffer's
constructor) by device_ptr, buffer_iterator and buffer_value.

Now, false is passed for the retain parameter to buffer's
constructor so that the buffer's reference count is not
incremented. Furthermore, the classes now set the buffer to
null before being destructed so that they will not decrement its
reference count (which normally occurs buffer's destructor).

The main effect of this change is that objects which refer to a
memory buffer but do not own it (e.g. device_ptr, buffer_iterator)
will not modify the reference count for the buffer. This fixes a
number of memory leaks which occured in longer running programs.
2013-05-13 22:27:02 -04:00
Kyle Lutz
a5ddeae614 Add scalar<T> container
This adds a new scalar<T> "container" which stores a single
value in a memory buffer. This simplifies memory handling in
algorithms which read and write a single value.
2013-05-11 20:20:27 -04:00
Kyle Lutz
130f8c30f1 Rename kernel::num_args() method to arity()
This renames the kernel::num_args() method to arity().
2013-05-11 20:15:00 -04:00
Kyle Lutz
ffec5fd34a Remove unnecessary includes from transform_reduce
This removes a couple of unnecessary includes from the
transform_reduce.hpp header file.
2013-05-11 20:10:28 -04:00
Kyle Lutz
178676df4f Refactor the system::default_device() method
This refactors the system::default_device() method. Now, the
default compute device for the system is only found once and
stored in a static variable. This eliminates many redundant
calls to clGetPlatformIDs() and clGetDeviceIDs().

Also, the default_cpu_device() and default_gpu_device() methods
have been removed and their usages replaced with default_device().
2013-05-10 22:49:05 -04:00
Kyle Lutz
d40eddc56b Fix compilation error with get<N>() and tuple
This fixes a compilation error which occured when using
the get<N>() function with tuple types.
2013-05-10 21:51:28 -04:00
Kyle Lutz
705b3f35a3 Fix narrowing conversion warnings in device
This fixes a couple of narrowing conversion warnings in the
device partitioning methods which were seen when compiling
VexCL with Boost.Compute in C++11 mode.
2013-05-09 22:04:00 -04:00
Kyle Lutz
9a64f6b39a Add get<N>() function
This adds a get<N>() function which returns the n'th element
of an aggregate type (e.g. vector type, pair, tuple).

This unifies the functionality of, and replaces, the get_pair()
and vector_component() functions.
2013-05-05 12:46:05 -04:00
Kyle Lutz
3e840fa306 Add transform_if() algorithm
This adds a new algorithm named transform_if() which applies
a given unary function to an input value only if it passes a
separate predicate function.
2013-05-05 11:51:21 -04:00
Kyle Lutz
49a34442e5 Remove unused histogram() algorithm
This removes the unused histogram() algorithm.
2013-05-05 10:56:14 -04:00
Dominic Meiser
7c5e321c2a Fixing build issues under windows 2013-05-03 18:37:09 -04:00
Kyle Lutz
3e93d01475 Add default constructors to image2d and image3d
This adds default constructors to the image2d and image3d
classes which initialize them with null memory objects.
2013-05-02 21:01:30 -04:00
Kyle Lutz
5d28d3887e Make pick_copy_work_group_size() inline
This makes the pick_copy_work_group_size() function inline.
2013-05-02 20:55:22 -04:00
Kyle Lutz
0ab2fe85eb Don't auto-initialize values in vector
This changes the vector class to not auto-initialize values
when it is created or resized. This improves performance by
eliminating a call to fill(). If needed, user code can call
fill() explicitly on the newly allocated values.
2013-04-27 10:30:26 -04:00
Kyle Lutz
03195275b3 Increase work-group size for copy() kernel
This increases the work-group size for the copy() kernel to be
up to 32 items based on the size of the input. This increases the
performance of copy() and related algorithms (e.g. transform()).
2013-04-27 10:21:47 -04:00
Kyle Lutz
ea107ae5d6 Add clamp_range() algorithm
This adds a clamp_range() algorithm which clamps a range
of values between a low and high value. This is based on
the algorithm of the same name in Boost.Algorithm.
2013-04-22 22:06:04 -04:00
Kyle Lutz
8142e5d5f9 Add move-constructors to wrapper classes
This adds move-constructors and move-assignment operators
to the OpenCL wrapper classes.
2013-04-17 20:45:04 -04:00
Kyle Lutz
4bdec761cd Add memory_object::reference_count() method
This adds a reference_count() method to the memory_object
class which returns its current reference count.
2013-04-13 11:07:04 -04:00
Kyle Lutz
d58b7c0902 Return event from command_queue::enqueue_task()
This changes the command_queue::enqueue_task() method to return
an event object.
2013-04-13 10:23:29 -04:00
Kyle Lutz
da4cb81679 Return event from command_queue::enqueue_nd_range_kernel()
This changes the enqueue_nd_range_kernel() method to return an
event object. This allows clients to monitor the progress of a
kernel executing on a device.
2013-04-13 10:23:01 -04:00
Kyle Lutz
001b3ff7fe Add get() methods to wrapper classes
This adds a get() method to each wrapper class which returns
a reference to the underlying OpenCL object.
2013-04-13 09:44:51 -04:00
Denis Demidov
8b78d4187d Adds support for selecting devices with environment variables
boost::compute::system::default_device() supports the following
environment variables:

BOOST_COMPUTE_DEFAULT_DEVICE   for device name
BOOST_COMPUTE_DEFAULT_PLATFORM for OpenCL platform name
BOOST_COMPUTE_DEFAULT_VENDOR   for device vendor name

If one or more of these variables is set, then device that satisfies
all conditions gets selected. If such a device is unavailable, then
the first available GPU is selected. If there are no GPUs in the
system, then the first available CPU is selected. Otherwise,
default_device() returns null device.

The hello_world example is modified to use default_device() instead
of default_gpu_device().
2013-04-12 17:22:25 -04:00
Kyle Lutz
1be19a6305 Add multiplies<T> specialization for std::complex<T>
This adds a specialization of multiplies<T> for std::complex<T>
which implements complex number multiplication.

Also adds a simple test using transform() to verify the complex
multiplication works correctly.
2013-04-10 22:04:04 -04:00
Kyle Lutz
8d13920dc4 Move swizzle_iterator to detail namespace
This moves the swizzle_iterator class to the detail
namespace.
2013-04-10 21:51:24 -04:00
Kyle Lutz
bcc3aed40f Move pixel_input_iterator to detail namespace
This moves the pixel_input_iterator class to the detail
namespace.
2013-04-10 21:38:05 -04:00
Kyle Lutz
5cce555d8c Move binary_transform_iterator to detail namespace
This moves the binary_transform_iterator class to the
detail namespace.
2013-04-10 21:33:29 -04:00
Kyle Lutz
e30ec9f26c Move adjacent_transform_iterator to detail namespace
This moves the adjacent_transform_iterator class to the
detail namespace.
2013-04-10 21:24:15 -04:00
Kyle Lutz
6dd6e11c7d Fix unused variable warning in get_base_iterator_buffer()
This fixes an unused variable warning which occurs in the
get_base_iterator_buffer() function when the base iterator
is not a buffer iterator and thus the iter argument is not
used.
2013-04-10 21:09:17 -04:00
Kyle Lutz
6fdffd8a2b Replace usages of result_of() with tr1_result_of()
This fixes a bug in which boost::result_of() would return the
wrong result type for a function due to the new implementation
using decltype instead of the result_of protocol on compilers
that sufficently support C++11 (such as clang >= 3.2).

Now, boost::tr1_result_of() is used to explicitly request that
the result_of protocol be used even when decltype is supported
by the compiler.
2013-04-10 20:17:34 -04:00
Kyle Lutz
652f99e449 Fix bug in get_buffer() for iterator adaptors
This fixes a bug in which the get_buffer() method was not properly
disabled for iterator adaptors with a non-buffer base iterator.
2013-04-09 21:56:24 -04:00
Kyle Lutz
5164ab4bd0 Cleanup constructors for wrapper classes
This cleans up the constructor methods for the OpenCL wrapper
classes and unifies the API used for creating a wrapper class
object from the underlying OpenCL objects.

Now, every wrapper class has a constructor taking the OpenCL
object and an optional boolean retain parameter which indicates
whether the constructor should increment the reference count.
2013-04-07 15:03:24 -04:00
Kyle Lutz
25a084deda Fix indentation in kernel::get_arg_info()
This fixes the indentation in the kernel::get_arg_info()
method.
2013-04-07 12:57:26 -04:00
Kyle Lutz
48e1bb4da0 Update image2d/3d constuctors for OpenCL 1.2
This updates the constructors for the image2d and image3d
classes to use the new clCreateImage() function instead of
the deprecated clCreateImage2D/3D() functions.
2013-03-31 15:01:30 -04:00
Kyle Lutz
d56e58b48e Add OpenCL 1.2 error codes to runtime_exception
This adds support for the OpenCL 1.2 error codes to the
runtime_exception class.
2013-03-31 14:58:00 -04:00
Kyle Lutz
0aa3d024dc Fix command_queue::enqueue_marker() for OpenCL 1.2
This changes the enqueue_marker() method in the command_queue
class to use clEnqueueMarkerWithWaitList() instead of the
deprecated clEnqueueMarker() function when compiling with
OpenCL 1.2.
2013-03-31 12:08:23 -04:00
Kyle Lutz
7e7e09b704 Fix command_queue::enqueue_barrier() for OpenCL 1.2
This changes the enqueue_barrier() method in the command_queue
class to use clEnqueueBarrierWithWaitList() instead of the
deprecated clEnqueueBarrier() function when compiling with
OpenCL 1.2.
2013-03-31 12:03:38 -04:00
Kyle Lutz
52fef4de6b Remove command_queue::enqueue_wait_for_event() method
This remove the enqueue_wait_for_event() method from the
command_queue class as the clEnqueueWaitForEvents() function
has been deprecated in OpenCL 1.2.
2013-03-31 11:59:14 -04:00
Kyle Lutz
c7a3bc8af6 Move unload_compiler() method to platform
This moves the unload_compiler() method from the system class
to the platform class. Also changes the method to use the
clUnloadPlatformCompiler() function instead of the deprecated
clUnloadCompiler() when compiling with OpenCL 1.2.
2013-03-31 11:29:40 -04:00
Kyle Lutz
d28354184c Move get_extension_function_address() method to platform
This moves the get_extension_function_address() method from
the system class to the platform class. Also changes the method
to use the clGetExtensionFunctionAddressForPlatform() function
instead of the deprecated clGetExtensionFunctionAddress() when
compiling with OpenCL 1.2.
2013-03-31 11:26:18 -04:00
Kyle Lutz
00fcb737cc Fix bug in move-constuctor for vector<T>
This fixes a bug in the move-constuctor for the vector<T>
class.

Previously, the moved-from object was also deallocating the
memory buffer leading to an error when the moved-to object
attempted to use it. Now, the constructor checks if the buffer
is non-empty before deallocating it.
2013-03-30 19:55:51 -04:00
Kyle Lutz
1161f89031 Make get_object_info() inline
This marks the get_object_info() method as inline.
2013-03-30 19:38:40 -04:00
Kyle Lutz
d585fbebad Make stream operator for vector types inline
This marks the stream operator for vector types as
inline.
2013-03-30 19:35:56 -04:00
Kyle Lutz
da1d7794b5 Remove support for cl_half
This removes support for cl_half (typedef'd to half_).

The issue is that the cl_half type is indistinguishable
from the cl_ushort type (both are typedefs for uint16_t)
which caused the cl_khr_fp16 pragma to be injected into
kernels using cl_ushort which causes errors on platforms
that do not support the cl_khr_fp16 extension.
2013-03-27 00:09:51 -04:00
Kyle Lutz
4338b311f7 Add device::partition() method
This adds a new set of methods to the device class allowing
device objects to be partitioned into multiple sub-devices
using the clCreateSubDevices() function.

For now, device partitioning is only supported on systems
with OpenCL version 1.2 (or later).
2013-03-26 23:36:54 -04:00
Kyle Lutz
4752fb2404 Support returning std::vector<T> from get_info<T>()
This adds support for returning a std::vector<T> from the
various get_info<T>() methods. This provides a simpler
interface to get the values in an array returned from one
of the clGet*Info() functions.

This also adds a test using the new API to get the maximum
work item sizes in each dimension for a device.
2013-03-26 22:44:56 -04:00