This fixes the check for the local memory size in the
get_work_group_info kernel test.
While the kernel only allocates 16 float's, some platforms
will use more local memory. This changes the test to check
for at least 16 float's worth of local memory.
This adds a test for the sort() method which sorts a container
of 3D vectors by their length. This uses a lambda expression to
generate the compare function for the sort() algorithm.
This implements the merge() algorithm which merges two
ranges of sorted values into a single sorted range.
The current implementation uses a simple serial merge
algorithm. A GPU optimized version is coming soon.