Sample: CUDA Parallel Prefix Sum with Shuffle Intrinsics (SHFL_Scan)
Minimum spec: SM 3.0

This example demonstrates how to use the shuffle intrinsic __shfl_up to perform a scan operation across a thread block.  A GPU with Compute Capability SM 3.0. is required to run the sample

Key concepts:
