..

GPUs Part 3 - Going from here

Written by Romit Jain

Hopefully, you have read part 1 and part 2 of Learning about GPUs series. This part provides an index of all the useful resources one can consider to get a more advanced understanding of GPUs.

Learning about the fundamentals

  1. [Book] Programming Massively Parallel Processors, A Hands-on Approach By David B. Kirk, Wen-mei W. Hwu
    1. This is the best resource to learn about parallel programming and GPUs. The first 4 chapters explain the fundamentals of GPU hardware and its programming model
  2. [YouTube playlist] 12 to 14 videos in COS 436
  3. CUDA Mode
    1. Very good resource for learning about GPUs/CUDA/Triton. They also have a very active Discord
  4. CUDA C++ programming guide
    1. Official guide from Nvidia which can be used as a reference
  5. [YouTube playlist] CUDA teaching center
    1. Short series to get started in CUDA and get a refresher on GPU hardware

Notable Talks

  1. GTC 2021 - How GPU Computing Works
  2. GPU Optimization session hosted by Chip Huyen
  3. GTC 2022 - How CUDA Programming Works - Stephen Jones, CUDA Architect, NVIDIA
  4. Bringing Clang and C++ to GPUs: An Open-Source, CUDA-Compatible GPU C++ Compiler

Notable blogs

  1. What every developer should know about GPU computing
    1. Gentle introduction to the GPU programming model
  2. What shapes do Matrix Multiplication Like?
    1. Puzzles to test your understanding of GPU hardware
  3. Making Deep Learning Go Brrrr From First Principles
  4. How is LLaMa.cpp possible?

Programming tutorials

  1. Tiled matrix multiplication in CUDA
  2. Matrix multiplication in pure CUDA: How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
  3. GPU puzzles by Srush
  4. Triton puzzles by Srush
  5. LLM.c LLM training in raw C/CUDA

Citations

For attribution, please cite this as

@article{romit2024gpus3,
  title   = {GPUs Part 3},
  author  = {Jain, Romit},
  journal = {cmeraki.github.io},
  year    = {2024},
  month   = {June},
  url     = {https://cmeraki.github.io/gpu-part3.html}
}