in 2025, if you want to become a successful AI engineer or researcher, you should NOT learn CUDA furthermore – i'd guess that 80% of successful ML researchers have never written a CUDA kernel practical ML is about training models and using them to make predictions. this has nothing to do with CUDA CUDA is necessary in two cases: (a) you are developing a radically new model that isn't easily expressible in PyTorch or Jax (i.e. Mamba) (b) you are running into performance bottlenecks from current CUDA code and need to make it faster i doubt that either case applies to you chances are you aren't building the next Mamba, and the bottlenecks you'll run into in practice are different you should work on finding the right data or hardware or setting things up properly or distributing efficiently across hardware or researching new efficient ways to run models that other people are working on (like vLLM and SGLang) or better than that, work on your eval pipeline. find ways to measure your model's performance that are more realistic, comprehensive, efficient, fair, etc. TLDR: want to learn? spend your time tinkering with models in PyTorch and Jax. not writing matrix multiplications