Nvidia Cuda Architecture
Fundamental concepts/components in the CUDA architecture: thread: core/kernel: Block: a collection of parallel threads. Grid: a collection of parallel thread blocks. warp: a set of threads (commonly 32) that get executed simultaneously. Thread blocks are executed as smaller groups of threads known as “warps” in sequence. streaming multiprocessor: the number of blocks per grid is limited by SM. Waprs are scheduled to execute in SMs. Streaming Multiprocessor has a Shared Memory.