This adds an output iterator result argument to the reduce()
algorithm. Now, instead of returning the reduced result, the
result is written to an output iterator. This allows the value
to stay on the device and avoids a device-to-host copy in cases
where the result is not needed on the host (e.g. it is part of
a larger computation).
This is an API breaking change to users of reduce(). Affected code
should now declare a result variable and then pass a pointer to it
as the new result argument.
refs kylelutz/compute#9
device, context, and queue are initialized statically in `context_setup.hpp`.
With this change all tests are able to complete when an NVIDIA GPU is in
exclusive compute mode.
Side effect of the change:
Time for all tests to complete reduced from 15.71 to 13.03 sec Tesla C2075.