5. Limitations

The following are known issues with the current release.
  • A security vulnerability issue required profiling tools to disable all the features for non-root or non-admin users. As a result, CUPTI cannot profile the application when using a Windows 419.17 or Linux 418.43 or later driver. More details about the issue and the solutions can be found on this web page.
  • The new profiling APIs in the header cupti_profiler_target.h and Perfworks metric APIs in the headers nvperf_host.h and nvperf_target.h are only supported for devices with compute capability 7.0 and 7.5. One can use the events and metrics APIs from headers cupti_events.h and cupti_metrics.h respectively for the devices with compute capability 7.0 and lower.
  • Both the old and the new metric APIs are supported for compute capability 7.0. This is to enable transition of code to the new metric APIs, but you cannot mix the usage of the old and the new metric APIs.
  • The new profiling APIs in the header cupti_profiler_target.h and Perfworks metric APIs in the headers nvperf_host.h and nvperf_target.h are supported on Linux x86, Windows and IBM POWER platforms. Those are not supported on Mac and Tegra platforms.
  • Profiling results might be inconsistent when auto boost is enabled. Profiler tries to disable auto boost by default. But it might fail to do so in some conditions and profiling will continue and results will be inconsistent. API cuptiGetAutoBoostState() can be used to query the auto boost state of the device. This API returns error CUPTI_ERROR_NOT_SUPPORTED on devices that don't support auto boost. Note that auto boost is supported only on certain Tesla devices with compute capability 3.0 and higher.
  • CUPTI doesn't populate the activity structures which are deprecated, instead the newer version of the activity structure is filled with the information.
  • While collecting events in continuous mode, event reporting may be delayed i.e. event values may be returned by a later call to readEvent(s) API and the event values for the last readEvent(s) API may get lost.
  • When profiling events, it is possible that the domain instance that gets profiled gives event value 0 due to absence of workload on the domain instance since CUPTI profiles one instance of the domain by default. To profile all instances of the domain, user can set event group attribute CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES through API cuptiEventGroupSetAttribute().
  • Starting CUDA Toolkit 9.0, CUPTI doesn't support CUDA Dynamic Parallelism (CDP) kernel launch tracing and source level metrics for devices with compute capability 7.0 and later.
  • CUPTI doesn't support tracing and profiling on virtualized GPUs.
  • Profiling results might be incorrect for CUDA applications compiled with nvcc version older than 9.0 for devices with compute capability 6.0 and 6.1. Profiling session will continue and CUPTI will notify it using error code CUPTI_ERROR_CUDA_COMPILER_NOT_COMPATIBLE. It is advised to recompile the application code with nvcc version 9.0 or later. Ignore this warning if code is already compiled with the recommended nvcc version
  • Because of the low resolution of the timer on Windows, the start and end timestamps can be same for activities having short execution duration on Windows.
  • Profiling (event and metric collection) is not supported for multidevice cooperative kernels, that is, kernels launched by using the API functions cudaLaunchCooperativeKernelMultiDevice or cuLaunchCooperativeKernelMultiDevice.
  • The application which calls CUPTI APIs cannot be used with Nvidia tools like nvprof, Nvidia Visual Profiler, Nsight Compute, Nsight Systems, Nvidia Nsight Visual Studio Edition, cuda-gdb and cuda-memcheck.
  • Profiling is not supported for CUDA kernel nodes launched by a CUDA Graph.
  • CUDA runtime and driver API callbacks for kernel launch are not issued when the stream is in the capture mode.
  • Tracing of a CUDA Graph may change its performance characteristics.
  • PCIE and NVLINK records are not captured when CUPTI is initialized lazily after the CUDA initialization.
  • CUPTI fails to profile the OpenACC application when the OpenACC library linked with the application has missing definition of the OpenACC API routine/s. This is indicated by the error code CUPTI_ERROR_OPENACC_UNDEFINED_ROUTINE.