#define CUPTI_AUTO_BOOST_INVALID_CLIENT_PID 0 |
An invalid/unknown process id.
#define CUPTI_CORRELATION_ID_UNKNOWN 0 |
An invalid/unknown correlation ID. A correlation ID of this value indicates that there is no correlation for the activity record.
#define CUPTI_FUNCTION_INDEX_ID_INVALID 0 |
An invalid function index ID.
#define CUPTI_GRID_ID_UNKNOWN 0LL |
An invalid/unknown grid ID.
#define CUPTI_MAX_NVLINK_PORTS 16 |
Maximum NVLink port numbers.
#define CUPTI_NVLINK_INVALID_PORT -1 |
Invalid/unknown NVLink port number.
#define CUPTI_SOURCE_LOCATOR_ID_UNKNOWN 0 |
The source-locator ID that indicates an unknown source location. There is not an actual CUpti_ActivitySourceLocator object corresponding to this value.
#define CUPTI_SYNCHRONIZATION_INVALID_VALUE -1 |
An invalid/unknown value.
#define CUPTI_TIMESTAMP_UNKNOWN 0LL |
An invalid/unknown timestamp for a start, end, queued, submitted, or completed time.
typedef void( * CUpti_BuffersCallbackCompleteFunc)(CUcontext context, uint32_t streamId, uint8_t *buffer, size_t size, size_t validSize) |
This callback function returns to the CUPTI client a buffer containing activity records. The buffer contains validSize
bytes of activity records which should be read using cuptiActivityGetNextRecord. The number of dropped records can be read using cuptiActivityGetNumDroppedRecords. After this call CUPTI relinquished ownership of the buffer and will not use it anymore. The client may return the buffer to CUPTI using the CUpti_BuffersCallbackRequestFunc callback. Note: CUDA 6.0 onwards, all buffers returned by this callback are global buffers i.e. there is no context/stream specific buffer. User needs to parse the global buffer to extract the context/stream specific activity records.
context | The context this buffer is associated with. If NULL, the buffer is associated with the global activities. This field is deprecated as of CUDA 6.0 and will always be NULL. | |
streamId | The stream id this buffer is associated with. This field is deprecated as of CUDA 6.0 and will always be NULL. | |
buffer | The activity record buffer. | |
size | The total size of the buffer in bytes as set in CUpti_BuffersCallbackRequestFunc. | |
validSize | The number of valid bytes in the buffer. |
typedef void( * CUpti_BuffersCallbackRequestFunc)(uint8_t **buffer, size_t *size, size_t *maxNumRecords) |
This callback function signals the CUPTI client that an activity buffer is needed by CUPTI. The activity buffer is used by CUPTI to store activity records. The callback function can decline the request by setting *buffer
to NULL. In this case CUPTI may drop activity records.
buffer | Returns the new buffer. If set to NULL then no buffer is returned. | |
size | Returns the size of the returned buffer. | |
maxNumRecords | Returns the maximum number of records that should be placed in the buffer. If 0 then the buffer is filled with as many records as possible. If > 0 the buffer is filled with at most that many records before it is returned. |
These attributes are used to control the behavior of the activity API.
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE |
The device memory size (in bytes) reserved for storing profiling data for concurrent kernels (activity kind CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL), for each buffer on a context. The value is a size_t. There is a limit on how many device buffers can be allocated per context. User can query and set this limit using the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT. CUPTI doesn't pre-allocate all the buffers, it pre-allocates only those many buffers as set by the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_PRE_ALLOCATE_VALUE. When all of the data in a buffer is consumed, it is added in the reuse pool, and CUPTI picks a buffer from this pool when a new buffer is needed. Thus memory footprint does not scale with the kernel count. Applications with the high density of kernels might result in having CUPTI to allocate more device buffers. CUPTI allocates another buffer only when it runs out of the buffers in the reuse pool. Since buffer allocation happens in the main application thread, this might result in stalls in the critical path. CUPTI pre-allocates 3 buffers of the same size to mitigate this issue. User can query and set the pre-allocation limit using the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_PRE_ALLOCATE_VALUE. Having larger buffer size leaves less device memory for the application. Having smaller buffer size increases the risk of dropping timestamps for kernel records if too many kernels are launched/replayed at one time. This value only applies to new buffer allocations. Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations. The default value is 3200000 (~3MB) which can accommodate profiling data for 100,000 kernels. Note: Starting with the CUDA 11.2 release, CUPTI allocates profiling buffer in the pinned host memory by default as this might help in improving the performance of the tracing run. Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_MEM_ALLOCATION_TYPE_HOST_PINNED for more details. Size of the memory and maximum number of pools are still controlled by the attributes CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE and CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT. Note: The actual amount of device memory per buffer reserved by CUPTI might be larger. |
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE_CDP |
The device memory size (in bytes) reserved for storing profiling data for CDP operations for each buffer on a context. The value is a size_t. Having larger buffer size means less flush operations but consumes more device memory. This value only applies to new allocations. Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations. The default value is 8388608 (8MB). Note: The actual amount of device memory per context reserved by CUPTI might be larger. |
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT |
The maximum number of device memory buffers per context. The value is a size_t. For an application with high rate of kernel launches, having a bigger pool limit helps in timestamp collection for all the kernels, at the expense of a larger memory footprint. Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE for more details. Setting this value will not modify the number of memory buffers currently stored. Set this value before initializing CUDA to ensure the limit is not exceeded. The default value is 250. |
CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_SIZE |
The profiling semaphore pool size reserved for storing profiling data for memcopies and serialized kernels tracing (activity kind CUPTI_ACTIVITY_KIND_KERNEL) for each context. The value is a size_t. There is a limit on how many semaphore pools can be allocated per context. User can query and set this limit using the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_LIMIT. CUPTI doesn't pre-allocate all the semaphore pools, it pre-allocates only those many semaphore pools as set by the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_PRE_ALLOCATE_VALUE. When all of the data in a semaphore pool is consumed, it is added in the reuse pool, and CUPTI picks a semaphore pool from the reuse pool when a new semaphore pool is needed. Thus memory footprint does not scale with the memcopy and kernel count. Applications with the high density of memcopy and kernels might result in having CUPTI to allocate more semaphore pools. CUPTI allocates another semaphore pool only when it runs out of the semaphore pools in the reuse pool. Since semaphore pool allocation happens in the main application thread, this might result in stalls in the critical path. CUPTI pre-allocates 3 semaphore pools of the same size to mitigate this issue. User can query and set the pre-allocation limit using the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_PRE_ALLOCATE_VALUE. Having larger semaphore pool size leaves less device memory for the application. Having smaller semaphore pool size increases the risk of dropping timestamps for memcopy and kernel records if too many memcopy or kernels are issued/launched at one time. This value only applies to new semaphore pool allocations. Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations. The default value is 25000 which can accommodate profiling data for upto 25,000 memcopies and kernels combined. |
CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_LIMIT |
The maximum number of profiling semaphore pools per context. The value is a size_t. For an application with high rate of memcopy and kernel launches, having a bigger pool limit helps in timestamp collection for all the memcopies and kernels, at the expense of a larger device memory footprint. Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_SIZE for more details. Set this value before initializing CUDA to ensure the limit is not exceeded. The default value is 250. |
CUPTI_ACTIVITY_ATTR_ZEROED_OUT_ACTIVITY_BUFFER |
The flag to indicate whether user should provide activity buffer of zero value. The value is a uint8_t. If the value of this attribute is non-zero, user should provide a zero value buffer in the CUpti_BuffersCallbackRequestFunc. If the user does not provide a zero value buffer after setting this to non-zero, the activity buffer may contain some uninitialized values when CUPTI returns it in CUpti_BuffersCallbackCompleteFunc If the value of this attribute is zero, CUPTI will initialize the user buffer received in the CUpti_BuffersCallbackRequestFunc to zero before filling it. If the user sets this to zero, a few stalls may appear in critical path because CUPTI will zero out the buffer in the main thread. Set this value before returning from CUpti_BuffersCallbackRequestFunc to ensure it is considered for all the subsequent user buffers. The default value is 0. |
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_PRE_ALLOCATE_VALUE |
Number of device buffers to pre-allocate for a context during the initialization phase. The value is a size_t. Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE for details. This value must be less than the maximum number of device buffers set using the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT Set this value before initializing CUDA or before creating a context to ensure it is considered by the CUPTI. The default value is set to 3 to ping pong between these buffers (if possible). |
CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_PRE_ALLOCATE_VALUE |
Number of profiling semaphore pools to pre-allocate for a context during the initialization phase. The value is a size_t. Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_SIZE for details. This value must be less than the maximum number of profiling semaphore pools set using the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_LIMIT Set this value before initializing CUDA or before creating a context to ensure it is considered by the CUPTI. The default value is set to 3 to ping pong between these pools (if possible). |
CUPTI_ACTIVITY_ATTR_MEM_ALLOCATION_TYPE_HOST_PINNED |
Allocate page-locked (pinned) host memory for storing profiling data for concurrent kernels for each buffer on a context. The value is a uint8_t. Starting with the CUDA 11.2 release, CUPTI allocates profiling buffer in the pinned host memory by default as this might help in improving the performance of the tracing run. Allocating excessive amounts of pinned memory may degrade system performance, since it reduces the amount of memory available to the system for paging. For this reason user might want to change the location from pinned host memory to device memory by setting value of this attribute to 0. The default value is 1. |
enum CUpti_ActivityFlag |
Activity record flags. Flags can be combined by bitwise OR to associated multiple flags with an activity record. Each flag is specific to a certain activity kind, as noted below.
CUPTI_ACTIVITY_FLAG_NONE | Indicates the activity record has no flags. |
CUPTI_ACTIVITY_FLAG_DEVICE_CONCURRENT_KERNELS | Indicates the activity represents a device that supports concurrent kernel execution. Valid for CUPTI_ACTIVITY_KIND_DEVICE. |
CUPTI_ACTIVITY_FLAG_DEVICE_ATTRIBUTE_CUDEVICE | Indicates if the activity represents a CUdevice_attribute value or a CUpti_DeviceAttribute value. Valid for CUPTI_ACTIVITY_KIND_DEVICE_ATTRIBUTE. |
CUPTI_ACTIVITY_FLAG_MEMCPY_ASYNC | Indicates the activity represents an asynchronous memcpy operation. Valid for CUPTI_ACTIVITY_KIND_MEMCPY. |
CUPTI_ACTIVITY_FLAG_MARKER_INSTANTANEOUS | Indicates the activity represents an instantaneous marker. Valid for CUPTI_ACTIVITY_KIND_MARKER. |
CUPTI_ACTIVITY_FLAG_MARKER_START | Indicates the activity represents a region start marker. Valid for CUPTI_ACTIVITY_KIND_MARKER. |
CUPTI_ACTIVITY_FLAG_MARKER_END | Indicates the activity represents a region end marker. Valid for CUPTI_ACTIVITY_KIND_MARKER. |
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE | Indicates the activity represents an attempt to acquire a user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER. |
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE_SUCCESS | Indicates the activity represents success in acquiring the user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER. |
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE_FAILED | Indicates the activity represents failure in acquiring the user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER. |
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_RELEASE | Indicates the activity represents releasing a reservation on user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER. |
CUPTI_ACTIVITY_FLAG_MARKER_COLOR_NONE | Indicates the activity represents a marker that does not specify a color. Valid for CUPTI_ACTIVITY_KIND_MARKER_DATA. |
CUPTI_ACTIVITY_FLAG_MARKER_COLOR_ARGB | Indicates the activity represents a marker that specifies a color in alpha-red-green-blue format. Valid for CUPTI_ACTIVITY_KIND_MARKER_DATA. |
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_SIZE_MASK | The number of bytes requested by each thread Valid for CUpti_ActivityGlobalAccess3. |
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_LOAD | If bit in this flag is set, the access was load, else it is a store access. Valid for CUpti_ActivityGlobalAccess3. |
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_CACHED | If this bit in flag is set, the load access was cached else it is uncached. Valid for CUpti_ActivityGlobalAccess3. |
CUPTI_ACTIVITY_FLAG_METRIC_OVERFLOWED | If this bit in flag is set, the metric value overflowed. Valid for CUpti_ActivityMetric and CUpti_ActivityMetricInstance. |
CUPTI_ACTIVITY_FLAG_METRIC_VALUE_INVALID | If this bit in flag is set, the metric value couldn't be calculated. This occurs when a value(s) required to calculate the metric is missing. Valid for CUpti_ActivityMetric and CUpti_ActivityMetricInstance. |
CUPTI_ACTIVITY_FLAG_INSTRUCTION_VALUE_INVALID | If this bit in flag is set, the source level metric value couldn't be calculated. This occurs when a value(s) required to calculate the source level metric cannot be evaluated. Valid for CUpti_ActivityInstructionExecution. |
CUPTI_ACTIVITY_FLAG_INSTRUCTION_CLASS_MASK | The mask for the instruction class, CUpti_ActivityInstructionClass Valid for CUpti_ActivityInstructionExecution and CUpti_ActivityInstructionCorrelation |
CUPTI_ACTIVITY_FLAG_FLUSH_FORCED | When calling cuptiActivityFlushAll, this flag can be set to force CUPTI to flush all records in the buffer, whether finished or not |
CUPTI_ACTIVITY_FLAG_SHARED_ACCESS_KIND_SIZE_MASK | The number of bytes requested by each thread Valid for CUpti_ActivitySharedAccess. |
CUPTI_ACTIVITY_FLAG_SHARED_ACCESS_KIND_LOAD | If bit in this flag is set, the access was load, else it is a store access. Valid for CUpti_ActivitySharedAccess. |
CUPTI_ACTIVITY_FLAG_MEMSET_ASYNC | Indicates the activity represents an asynchronous memset operation. Valid for CUPTI_ACTIVITY_KIND_MEMSET. |
CUPTI_ACTIVITY_FLAG_THRASHING_IN_CPU | Indicates the activity represents thrashing in CPU. Valid for counter of kind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING in CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER |
CUPTI_ACTIVITY_FLAG_THROTTLING_IN_CPU | Indicates the activity represents page throttling in CPU. Valid for counter of kind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING in CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER |
The sass instruction are broadly divided into different class. Each enum represents a classification.
enum CUpti_ActivityKind |
Each activity record kind represents information about a GPU or an activity occurring on a CPU or GPU. Each kind is associated with a activity record structure that holds the information associated with the kind.
CUpti_ActivityKernel6
CUpti_ActivityInstructionExecution
CUpti_ActivityUnifiedMemoryCounter
CUpti_ActivityPCSamplingRecordInfo
CUpti_ActivityInstructionCorrelation
CUpti_ActivityExternalCorrelation
CUPTI_ACTIVITY_KIND_INVALID | The activity record is invalid. |
CUPTI_ACTIVITY_KIND_MEMCPY | A host<->host, host<->device, or device<->device memory copy. The corresponding activity record structure is CUpti_ActivityMemcpy4. |
CUPTI_ACTIVITY_KIND_MEMSET | A memory set executing on the GPU. The corresponding activity record structure is CUpti_ActivityMemset3. |
CUPTI_ACTIVITY_KIND_KERNEL | A kernel executing on the GPU. This activity kind may significantly change the overall performance characteristics of the application because all kernel executions are serialized on the GPU. Other activity kind for kernel CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL doesn't break kernel concurrency. The corresponding activity record structure is CUpti_ActivityKernel6. |
CUPTI_ACTIVITY_KIND_DRIVER | A CUDA driver API function execution. The corresponding activity record structure is CUpti_ActivityAPI. |
CUPTI_ACTIVITY_KIND_RUNTIME | A CUDA runtime API function execution. The corresponding activity record structure is CUpti_ActivityAPI. |
CUPTI_ACTIVITY_KIND_EVENT | An event value. The corresponding activity record structure is CUpti_ActivityEvent. |
CUPTI_ACTIVITY_KIND_METRIC | A metric value. The corresponding activity record structure is CUpti_ActivityMetric. |
CUPTI_ACTIVITY_KIND_DEVICE | Information about a device. The corresponding activity record structure is CUpti_ActivityDevice2. |
CUPTI_ACTIVITY_KIND_CONTEXT | Information about a context. The corresponding activity record structure is CUpti_ActivityContext. |
CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL | A kernel executing on the GPU. This activity kind doesn't break kernel concurrency. The corresponding activity record structure is CUpti_ActivityKernel6. |
CUPTI_ACTIVITY_KIND_NAME | Thread, device, context, etc. name. The corresponding activity record structure is CUpti_ActivityName. |
CUPTI_ACTIVITY_KIND_MARKER | Instantaneous, start, or end marker. The corresponding activity record structure is CUpti_ActivityMarker2. |
CUPTI_ACTIVITY_KIND_MARKER_DATA | Extended, optional, data about a marker. The corresponding activity record structure is CUpti_ActivityMarkerData. |
CUPTI_ACTIVITY_KIND_SOURCE_LOCATOR | Source information about source level result. The corresponding activity record structure is CUpti_ActivitySourceLocator. |
CUPTI_ACTIVITY_KIND_GLOBAL_ACCESS | Results for source-level global acccess. The corresponding activity record structure is CUpti_ActivityGlobalAccess3. |
CUPTI_ACTIVITY_KIND_BRANCH | Results for source-level branch. The corresponding activity record structure is CUpti_ActivityBranch2. |
CUPTI_ACTIVITY_KIND_OVERHEAD | Overhead activity records. The corresponding activity record structure is CUpti_ActivityOverhead. |
CUPTI_ACTIVITY_KIND_CDP_KERNEL | A CDP (CUDA Dynamic Parallel) kernel executing on the GPU. The corresponding activity record structure is CUpti_ActivityCdpKernel. This activity can not be directly enabled or disabled. It is enabled and disabled through concurrent kernel activity i.e. _CONCURRENT_KERNEL. |
CUPTI_ACTIVITY_KIND_PREEMPTION | Preemption activity record indicating a preemption of a CDP (CUDA Dynamic Parallel) kernel executing on the GPU. The corresponding activity record structure is CUpti_ActivityPreemption. |
CUPTI_ACTIVITY_KIND_ENVIRONMENT | Environment activity records indicating power, clock, thermal, etc. levels of the GPU. The corresponding activity record structure is CUpti_ActivityEnvironment. |
CUPTI_ACTIVITY_KIND_EVENT_INSTANCE | An event value associated with a specific event domain instance. The corresponding activity record structure is CUpti_ActivityEventInstance. |
CUPTI_ACTIVITY_KIND_MEMCPY2 | A peer to peer memory copy. The corresponding activity record structure is CUpti_ActivityMemcpyPtoP3. |
CUPTI_ACTIVITY_KIND_METRIC_INSTANCE | A metric value associated with a specific metric domain instance. The corresponding activity record structure is CUpti_ActivityMetricInstance. |
CUPTI_ACTIVITY_KIND_INSTRUCTION_EXECUTION | Results for source-level instruction execution. The corresponding activity record structure is CUpti_ActivityInstructionExecution. |
CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER | Unified Memory counter record. The corresponding activity record structure is CUpti_ActivityUnifiedMemoryCounter2. |
CUPTI_ACTIVITY_KIND_FUNCTION | Device global/function record. The corresponding activity record structure is CUpti_ActivityFunction. |
CUPTI_ACTIVITY_KIND_MODULE | CUDA Module record. The corresponding activity record structure is CUpti_ActivityModule. |
CUPTI_ACTIVITY_KIND_DEVICE_ATTRIBUTE | A device attribute value. The corresponding activity record structure is CUpti_ActivityDeviceAttribute. |
CUPTI_ACTIVITY_KIND_SHARED_ACCESS | Results for source-level shared acccess. The corresponding activity record structure is CUpti_ActivitySharedAccess. |
CUPTI_ACTIVITY_KIND_PC_SAMPLING | Enable PC sampling for kernels. This will serialize kernels. The corresponding activity record structure is CUpti_ActivityPCSampling3. |
CUPTI_ACTIVITY_KIND_PC_SAMPLING_RECORD_INFO | Summary information about PC sampling records. The corresponding activity record structure is CUpti_ActivityPCSamplingRecordInfo. |
CUPTI_ACTIVITY_KIND_INSTRUCTION_CORRELATION | SASS/Source line-by-line correlation record. This will generate sass/source correlation for functions that have source level analysis or pc sampling results. The records will be generated only when either of source level analysis or pc sampling activity is enabled. The corresponding activity record structure is CUpti_ActivityInstructionCorrelation. |
CUPTI_ACTIVITY_KIND_OPENACC_DATA | OpenACC data events. The corresponding activity record structure is CUpti_ActivityOpenAccData. |
CUPTI_ACTIVITY_KIND_OPENACC_LAUNCH | OpenACC launch events. The corresponding activity record structure is CUpti_ActivityOpenAccLaunch. |
CUPTI_ACTIVITY_KIND_OPENACC_OTHER | OpenACC other events. The corresponding activity record structure is CUpti_ActivityOpenAccOther. |
CUPTI_ACTIVITY_KIND_CUDA_EVENT | Information about a CUDA event. The corresponding activity record structure is CUpti_ActivityCudaEvent. |
CUPTI_ACTIVITY_KIND_STREAM | Information about a CUDA stream. The corresponding activity record structure is CUpti_ActivityStream. |
CUPTI_ACTIVITY_KIND_SYNCHRONIZATION | Records for synchronization management. The corresponding activity record structure is CUpti_ActivitySynchronization. |
CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION | Records for correlation of different programming APIs. The corresponding activity record structure is CUpti_ActivityExternalCorrelation. |
CUPTI_ACTIVITY_KIND_NVLINK | NVLink information. The corresponding activity record structure is CUpti_ActivityNvLink3. |
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_EVENT | Instantaneous Event information. The corresponding activity record structure is CUpti_ActivityInstantaneousEvent. |
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_EVENT_INSTANCE | Instantaneous Event information for a specific event domain instance. The corresponding activity record structure is CUpti_ActivityInstantaneousEventInstance |
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_METRIC | Instantaneous Metric information The corresponding activity record structure is CUpti_ActivityInstantaneousMetric. |
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_METRIC_INSTANCE | Instantaneous Metric information for a specific metric domain instance. The corresponding activity record structure is CUpti_ActivityInstantaneousMetricInstance. |
CUPTI_ACTIVITY_KIND_MEMORY | Memory activity tracking allocation and freeing of the memory The corresponding activity record structure is CUpti_ActivityMemory. |
CUPTI_ACTIVITY_KIND_PCIE | PCI devices information used for PCI topology. The corresponding activity record structure is CUpti_ActivityPcie. |
CUPTI_ACTIVITY_KIND_OPENMP | OpenMP parallel events. The corresponding activity record structure is CUpti_ActivityOpenMp. |
CUPTI_ACTIVITY_KIND_INTERNAL_LAUNCH_API | A CUDA driver kernel launch occurring outside of any public API function execution. Tools can handle these like records for driver API launch functions, although the cbid field is not used here. The corresponding activity record structure is CUpti_ActivityAPI. |
CUPTI_ACTIVITY_KIND_MEMORY2 | Memory activity tracking allocation and freeing of the memory The corresponding activity record structure is CUpti_ActivityMemory2. |
CUPTI_ACTIVITY_KIND_MEMORY_POOL | Memory pool activity tracking creation, destruction and triming of the memory pool. The corresponding activity record structure is CUpti_ActivityMemoryPool. |
Each kind represents the source and destination targets of a memory copy. Targets are host, device, and array.
Each kind represents the type of the memory accessed by a memory operation/copy.
Describes the type of memory operation, to be used with CUpti_ActivityMemory2.
Describes the type of memory pool operation, to be used with CUpti_ActivityMemoryPool.
Describes the type of memory pool, to be used with CUpti_ActivityMemory2.
Sampling period can be set using cuptiActivityConfigurePCSampling
The types of stream to be used with CUpti_ActivityStream.
The types of synchronization to be used with CUpti_ActivitySynchronization.
CUPTI uses different methods to obtain the thread-id depending on the support and the underlying platform. This enum documents these methods for each type. APIs cuptiSetThreadIdType and cuptiGetThreadIdType can be used to set and get the thread-id type.
This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT
Many activities are associated with Unified Memory mechanism; among them are tranfer from host to device, device to host, page fault at host side.
This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH
This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP
This indicates the virtualization mode in which CUDA device is running
enum CUpti_DevType |
The possible reasons that a clock can be throttled. There can be more than one reason that a clock is being throttled so these types can be combined by bitwise OR. These are used in the clocksThrottleReason field in the Environment Activity Record.
Custom correlation kinds are reserved for usage in external tools.
enum CUpti_LinkFlag |
Describes link properties, to be used with CUpti_ActivityNvLink.
enum CUpti_PcieDeviceType |
enum CUpti_PcieGen |
CUptiResult cuptiActivityConfigurePCSampling | ( | CUcontext | ctx, | |
CUpti_ActivityPCSamplingConfig * | config | |||
) |
For Pascal and older GPU architectures this API must be called before enabling activity kind CUPTI_ACTIVITY_KIND_PC_SAMPLING. There is no such requirement for Volta and newer GPU architectures.
For Volta and newer GPU architectures if this API is called in the middle of execution, PC sampling configuration will be updated for subsequent kernel launches.
ctx | The context | |
config | A pointer to CUpti_ActivityPCSamplingConfig structure containing PC sampling configuration. |
CUPTI_SUCCESS | ||
CUPTI_ERROR_INVALID_OPERATION | if this api is called while some valid event collection method is set. | |
CUPTI_ERROR_INVALID_PARAMETER | if config is NULL or any parameter in the config structures is not a valid value | |
CUPTI_ERROR_NOT_SUPPORTED | Indicates that the system/device does not support the unified memory counters |
CUptiResult cuptiActivityConfigureUnifiedMemoryCounter | ( | CUpti_ActivityUnifiedMemoryCounterConfig * | config, | |
uint32_t | count | |||
) |
config | A pointer to CUpti_ActivityUnifiedMemoryCounterConfig structures containing Unified Memory counter configuration. | |
count | Number of Unified Memory counter configuration structures |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_PARAMETER | if config is NULL or any parameter in the config structures is not a valid value | |
CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED | One potential reason is that platform (OS/arch) does not support the unified memory counters | |
CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED_ON_DEVICE | Indicates that the device does not support the unified memory counters | |
CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED_ON_NON_P2P_DEVICES | Indicates that multi-GPU configuration without P2P support between any pair of devices does not support the unified memory counters |
CUptiResult cuptiActivityDisable | ( | CUpti_ActivityKind | kind | ) |
Disable collection of a specific kind of activity record. Multiple kinds can be disabled by calling this function multiple times. By default all activity kinds are disabled for collection.
kind | The kind of activity record to stop collecting |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_KIND | if the activity kind is not supported |
CUptiResult cuptiActivityDisableContext | ( | CUcontext | context, | |
CUpti_ActivityKind | kind | |||
) |
Disable collection of a specific kind of activity record for a context. This setting done by this API will supersede the global settings for activity records. Multiple kinds can be enabled by calling this function multiple times.
context | The context for which activity is to be disabled | |
kind | The kind of activity record to stop collecting |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_KIND | if the activity kind is not supported |
CUptiResult cuptiActivityEnable | ( | CUpti_ActivityKind | kind | ) |
Enable collection of a specific kind of activity record. Multiple kinds can be enabled by calling this function multiple times. By default all activity kinds are disabled for collection.
kind | The kind of activity record to collect |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_NOT_COMPATIBLE | if the activity kind cannot be enabled | |
CUPTI_ERROR_INVALID_KIND | if the activity kind is not supported |
CUptiResult cuptiActivityEnableContext | ( | CUcontext | context, | |
CUpti_ActivityKind | kind | |||
) |
Enable collection of a specific kind of activity record for a context. This setting done by this API will supersede the global settings for activity records enabled by cuptiActivityEnable. Multiple kinds can be enabled by calling this function multiple times.
context | The context for which activity is to be enabled | |
kind | The kind of activity record to collect |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_NOT_COMPATIBLE | if the activity kind cannot be enabled | |
CUPTI_ERROR_INVALID_KIND | if the activity kind is not supported |
CUptiResult cuptiActivityEnableLatencyTimestamps | ( | uint8_t | enable | ) |
This API is used to control the collection of queued and submitted timestamps for kernels whose records are provided through the struct CUpti_ActivityKernel6. Default value is 0, i.e. these timestamps are not collected. This API needs to be called before initialization of CUDA and this setting should not be changed during the profiling session.
enable | is a boolean, denoting whether these timestamps should be collected |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED |
CUptiResult cuptiActivityEnableLaunchAttributes | ( | uint8_t | enable | ) |
This API is used to control the collection of launch attributes for kernels whose records are provided through the struct CUpti_ActivityKernel6. Default value is 0, i.e. these attributes are not collected.
enable | is a boolean denoting whether these launch attributes should be collected |
CUptiResult cuptiActivityFlush | ( | CUcontext | context, | |
uint32_t | streamId, | |||
uint32_t | flag | |||
) |
This function does not return until all activity records associated with the specified context/stream are returned to the CUPTI client using the callback registered in cuptiActivityRegisterCallbacks. To ensure that all activity records are complete, the requested stream(s), if any, are synchronized.
If context
is NULL, the global activity records (i.e. those not associated with a particular stream) are flushed (in this case no streams are synchonized). If context
is a valid CUcontext and streamId
is 0, the buffers of all streams of this context are flushed. Otherwise, the buffers of the specified stream in this context is flushed.
Before calling this function, the buffer handling callback api must be activated by calling cuptiActivityRegisterCallbacks.
context | A valid CUcontext or NULL. | |
streamId | The stream ID. | |
flag | The flag can be set to indicate a forced flush. See CUpti_ActivityFlag |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_CUPTI_ERROR_INVALID_OPERATION | if not preceeded by a successful call to cuptiActivityRegisterCallbacks | |
CUPTI_ERROR_UNKNOWN | an internal error occurred |
CUptiResult cuptiActivityFlushAll | ( | uint32_t | flag | ) |
This function returns the activity records associated with all contexts/streams (and the global buffers not associated with any stream) to the CUPTI client using the callback registered in cuptiActivityRegisterCallbacks.
This is a blocking call but it doesn't issue any CUDA synchronization calls implicitly thus it's not guaranteed that all activities are completed on the underlying devices. Activity record is considered as completed if it has all the information filled up including the timestamps if any. It is the client's responsibility to issue necessary CUDA synchronization calls before calling this function if all activity records with complete information are expected to be delivered.
Behavior of the function based on the input flag:
Before calling this function, the buffer handling callback api must be activated by calling cuptiActivityRegisterCallbacks.
flag | The flag can be set to indicate a forced flush. See CUpti_ActivityFlag |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_OPERATION | if not preceeded by a successful call to cuptiActivityRegisterCallbacks | |
CUPTI_ERROR_UNKNOWN | an internal error occurred |
CUptiResult cuptiActivityFlushPeriod | ( | uint32_t | time | ) |
CUPTI creates a worker thread to minimize the perturbance for the application created threads. CUPTI offloads certain operations from the application threads to the worker thread, this includes synchronization of profiling resources between host and device, delivery of the activity buffers to the client using the callback registered in cuptiActivityRegisterCallbacks. For performance reasons, CUPTI wakes up the worker thread based on certain heuristics.
This API is used to control the flush period of the worker thread. This setting will override the CUPTI heurtistics. Setting time to zero disables the periodic flush and restores the default behavior.
It's allowed to use the API cuptiActivityFlushAll to flush the data on-demand, even when client sets the periodic flush.
time | flush period in msec |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED |
CUptiResult cuptiActivityGetAttribute | ( | CUpti_ActivityAttribute | attr, | |
size_t * | valueSize, | |||
void * | value | |||
) |
Read an activity API attribute and return it in *value
.
attr | The attribute to read | |
valueSize | Size of buffer pointed by the value, and returns the number of bytes written to value | |
value | Returns the value of the attribute |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_PARAMETER | if valueSize or value is NULL, or if attr is not an activity attribute | |
CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT | Indicates that the value buffer is too small to hold the attribute value. |
CUptiResult cuptiActivityGetNextRecord | ( | uint8_t * | buffer, | |
size_t | validBufferSizeBytes, | |||
CUpti_Activity ** | record | |||
) |
This is a helper function to iterate over the activity records in a buffer. A buffer of activity records is typically obtained by receiving a CUpti_BuffersCallbackCompleteFunc callback.
An example of typical usage:
CUpti_Activity *record = NULL; CUptiResult status = CUPTI_SUCCESS; do { status = cuptiActivityGetNextRecord(buffer, validSize, &record); if(status == CUPTI_SUCCESS) { // Use record here... } else if (status == CUPTI_ERROR_MAX_LIMIT_REACHED) break; else { goto Error; } } while (1);
buffer | The buffer containing activity records | |
record | Inputs the previous record returned by cuptiActivityGetNextRecord and returns the next activity record from the buffer. If input value is NULL, returns the first activity record in the buffer. Records of kind CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL may contain invalid (0) timestamps, indicating that no timing information could be collected for lack of device memory. | |
validBufferSizeBytes | The number of valid bytes in the buffer. |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_MAX_LIMIT_REACHED | if no more records in the buffer | |
CUPTI_ERROR_INVALID_PARAMETER | if buffer is NULL. |
CUptiResult cuptiActivityGetNumDroppedRecords | ( | CUcontext | context, | |
uint32_t | streamId, | |||
size_t * | dropped | |||
) |
Get the number of records that were dropped because of insufficient buffer space. The dropped count includes records that could not be recorded because CUPTI did not have activity buffer space available for the record (because the CUpti_BuffersCallbackRequestFunc callback did not return an empty buffer of sufficient size) and also CDP records that could not be record because the device-size buffer was full (size is controlled by the CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE_CDP attribute). The dropped count maintained for the queue is reset to zero when this function is called.
context | The context, or NULL to get dropped count from global queue | |
streamId | The stream ID | |
dropped | The number of records that were dropped since the last call to this function. |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_PARAMETER | if dropped is NULL |
CUptiResult cuptiActivityPopExternalCorrelationId | ( | CUpti_ExternalCorrelationKind | kind, | |
uint64_t * | lastId | |||
) |
This function notifies CUPTI that the calling thread is leaving an external API region.
kind | The kind of external API activities should be correlated with. | |
lastId | If the function returns successful, contains the last external correlation id for this kind , can be NULL. |
CUPTI_SUCCESS | ||
CUPTI_ERROR_INVALID_PARAMETER | The external API kind is invalid. | |
CUPTI_ERROR_QUEUE_EMPTY | No external id is currently associated with kind . |
CUptiResult cuptiActivityPushExternalCorrelationId | ( | CUpti_ExternalCorrelationKind | kind, | |
uint64_t | id | |||
) |
This function notifies CUPTI that the calling thread is entering an external API region. When a CUPTI activity API record is created while within an external API region and CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION is enabled, the activity API record will be preceeded by a CUpti_ActivityExternalCorrelation record for each CUpti_ExternalCorrelationKind.
kind | The kind of external API activities should be correlated with. | |
id | External correlation id. |
CUPTI_SUCCESS | ||
CUPTI_ERROR_INVALID_PARAMETER | The external API kind is invalid |
CUptiResult cuptiActivityRegisterCallbacks | ( | CUpti_BuffersCallbackRequestFunc | funcBufferRequested, | |
CUpti_BuffersCallbackCompleteFunc | funcBufferCompleted | |||
) |
This function registers two callback functions to be used in asynchronous buffer handling. If registered, activity record buffers are handled using asynchronous requested/completed callbacks from CUPTI.
Registering these callbacks prevents the client from using CUPTI's blocking enqueue/dequeue functions.
funcBufferRequested | callback which is invoked when an empty buffer is requested by CUPTI | |
funcBufferCompleted | callback which is invoked when a buffer containing activity records is available from CUPTI |
CUPTI_SUCCESS | ||
CUPTI_ERROR_INVALID_PARAMETER | if either funcBufferRequested or funcBufferCompleted is NULL |
CUptiResult cuptiActivitySetAttribute | ( | CUpti_ActivityAttribute | attr, | |
size_t * | valueSize, | |||
void * | value | |||
) |
Write an activity API attribute.
attr | The attribute to write | |
valueSize | The size, in bytes, of the value | |
value | The attribute value to write |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_PARAMETER | if valueSize or value is NULL, or if attr is not an activity attribute | |
CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT | Indicates that the value buffer is too small to hold the attribute value. |
CUptiResult cuptiComputeCapabilitySupported | ( | int | major, | |
int | minor, | |||
int * | support | |||
) |
This function is used to check the support for a device based on it's compute capability. It sets the support
when the compute capability is supported by the current version of CUPTI, and clears it otherwise. This version of CUPTI might not support all GPUs sharing the same compute capability. It is suggested to use API cuptiDeviceSupported which provides correct information.
major | The major revision number of the compute capability | |
minor | The minor revision number of the compute capability | |
support | Pointer to an integer to return the support status |
CUPTI_SUCCESS | ||
CUPTI_ERROR_INVALID_PARAMETER | if support is NULL |
CUptiResult cuptiDeviceSupported | ( | CUdevice | dev, | |
int * | support | |||
) |
This function is used to check the support for a compute device. It sets the support
when the device is supported by the current version of CUPTI, and clears it otherwise.
dev | The device handle returned by CUDA Driver API cuDeviceGet | |
support | Pointer to an integer to return the support status |
CUPTI_SUCCESS | ||
CUPTI_ERROR_INVALID_PARAMETER | if support is NULL | |
CUPTI_ERROR_INVALID_DEVICE | if dev is not a valid device |
CUptiResult cuptiDeviceVirtualizationMode | ( | CUdevice | dev, | |
CUpti_DeviceVirtualizationMode * | mode | |||
) |
This function is used to query the virtualization mode of the CUDA device.
dev | The device handle returned by CUDA Driver API cuDeviceGet | |
mode | Pointer to an CUpti_DeviceVirtualizationMode to return the virtualization mode |
CUPTI_SUCCESS | ||
CUPTI_ERROR_INVALID_DEVICE | if dev is not a valid device | |
CUPTI_ERROR_INVALID_PARAMETER | if mode is NULL |
CUptiResult cuptiFinalize | ( | void | ) |
This API detaches the CUPTI from the running process. It destroys and cleans up all the resources associated with CUPTI in the current process. After CUPTI detaches from the process, the process will keep on running with no CUPTI attached to it. For safe operation of the API, it is recommended this API is invoked from the exit callsite of any of the CUDA Driver or Runtime API. Otherwise CUPTI client needs to make sure that required CUDA synchronization and CUPTI activity buffer flush is done before calling the API. Sample code showing the usage of the API in the cupti callback handler code:
void CUPTIAPI cuptiCallbackHandler(void *userdata, CUpti_CallbackDomain domain, CUpti_CallbackId cbid, void *cbdata) { const CUpti_CallbackData *cbInfo = (CUpti_CallbackData *)cbdata; // Take this code path when CUPTI detach is requested if (detachCupti) { switch(domain) { case CUPTI_CB_DOMAIN_RUNTIME_API: case CUPTI_CB_DOMAIN_DRIVER_API: if (cbInfo->callbackSite == CUPTI_API_EXIT) { // call the CUPTI detach API cuptiFinalize(); } break; default: break; } } }
CUptiResult cuptiGetAutoBoostState | ( | CUcontext | context, | |
CUpti_ActivityAutoBoostState * | state | |||
) |
The profiling results can be inconsistent in case auto boost is enabled. CUPTI tries to disable auto boost while profiling. It can fail to disable in cases where user does not have the permissions or CUDA_AUTO_BOOST env variable is set. The function can be used to query whether auto boost is enabled.
context | A valid CUcontext. | |
state | A pointer to CUpti_ActivityAutoBoostState structure which contains the current state and the id of the process that has requested the current state |
CUPTI_SUCCESS | ||
CUPTI_ERROR_INVALID_PARAMETER | if CUcontext or state is NULL | |
CUPTI_ERROR_NOT_SUPPORTED | Indicates that the device does not support auto boost | |
CUPTI_ERROR_UNKNOWN | an internal error occurred |
CUptiResult cuptiGetContextId | ( | CUcontext | context, | |
uint32_t * | contextId | |||
) |
Get the ID of a context.
context | The context | |
contextId | Returns a process-unique ID for the context |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_CONTEXT | The context is NULL or not valid. | |
CUPTI_ERROR_INVALID_PARAMETER | if contextId is NULL |
CUptiResult cuptiGetDeviceId | ( | CUcontext | context, | |
uint32_t * | deviceId | |||
) |
If context
is NULL, returns the ID of the device that contains the currently active context. If context
is non-NULL, returns the ID of the device which contains that context. Operates in a similar manner to cudaGetDevice() or cuCtxGetDevice() but may be called from within callback functions.
context | The context, or NULL to indicate the current context. | |
deviceId | Returns the ID of the device that is current for the calling thread. |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_DEVICE | if unable to get device ID | |
CUPTI_ERROR_INVALID_PARAMETER | if deviceId is NULL |
CUptiResult cuptiGetGraphId | ( | CUgraph | graph, | |
uint32_t * | pId | |||
) |
Returns the unique ID of CUDA graph.
graph | The graph. | |
pId | Returns the unique ID of the graph |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_PARAMETER | if graph is NULL |
CUptiResult cuptiGetGraphNodeId | ( | CUgraphNode | node, | |
uint64_t * | nodeId | |||
) |
Returns the unique ID of the CUDA graph node.
node | The graph node. | |
nodeId | Returns the unique ID of the node |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_PARAMETER | if node is NULL |
CUptiResult cuptiGetLastError | ( | void | ) |
Returns the last error that has been produced by any of the cupti api calls or the callback in the same host thread and resets it to CUPTI_SUCCESS.
CUptiResult cuptiGetStreamId | ( | CUcontext | context, | |
CUstream | stream, | |||
uint32_t * | streamId | |||
) |
Get the ID of a stream. The stream ID is unique within a context (i.e. all streams within a context will have unique stream IDs).
context | If non-NULL then the stream is checked to ensure that it belongs to this context. Typically this parameter should be null. | |
stream | The stream | |
streamId | Returns a context-unique ID for the stream |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_STREAM | if unable to get stream ID, or if context is non-NULL and stream does not belong to the context | |
CUPTI_ERROR_INVALID_PARAMETER | if streamId is NULL |
CUptiResult cuptiGetStreamIdEx | ( | CUcontext | context, | |
CUstream | stream, | |||
uint8_t | perThreadStream, | |||
uint32_t * | streamId | |||
) |
Get the ID of a stream. The stream ID is unique within a context (i.e. all streams within a context will have unique stream IDs).
context | If non-NULL then the stream is checked to ensure that it belongs to this context. Typically this parameter should be null. | |
stream | The stream | |
perThreadStream | Flag to indicate if program is compiled for per-thread streams | |
streamId | Returns a context-unique ID for the stream |
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_INITIALIZED | ||
CUPTI_ERROR_INVALID_STREAM | if unable to get stream ID, or if context is non-NULL and stream does not belong to the context | |
CUPTI_ERROR_INVALID_PARAMETER | if streamId is NULL |
CUptiResult cuptiGetThreadIdType | ( | CUpti_ActivityThreadIdType * | type | ) |
Returns the thread-id type used in CUPTI
CUPTI_SUCCESS | ||
CUPTI_ERROR_INVALID_PARAMETER | if type is NULL |
CUptiResult cuptiGetTimestamp | ( | uint64_t * | timestamp | ) |
Returns a timestamp normalized to correspond with the start and end timestamps reported in the CUPTI activity records. The timestamp is reported in nanoseconds.
timestamp | Returns the CUPTI timestamp |
CUPTI_SUCCESS | ||
CUPTI_ERROR_INVALID_PARAMETER | if timestamp is NULL |
CUptiResult cuptiSetThreadIdType | ( | CUpti_ActivityThreadIdType | type | ) |
CUPTI uses the method corresponding to set type to generate the thread-id. See enum CUpti_ActivityThreadIdType for the list of methods. Activity records having thread-id field contain the same value. Thread id type must not be changed during the profiling session to avoid thread-id value mismatch across activity records.
CUPTI_SUCCESS | ||
CUPTI_ERROR_NOT_SUPPORTED | if type is not supported on the platform |