6 Keynotes from the First CUDA Mode IRL Hackathon

The last weekend in September, Accel, in collaboration with PyTorch and NVIDIA, held the first-ever CUDA MODE IRL hackathon. What initially started as a reading group Discord community by Mark Saroufim and Andreas Köpf evolved into an incredible 14-hour event in SF that attracted 200 developers from across the globe.

*Accel's Casey Aylward welcomes the group*

From 10 a.m. to midnight, participants were hacking, engaged in technical talks, and collaborating with session leads who are experts in their field.

Keynote Presentations

We were also fortunate to intersperse the day with 6 technical talks from some of the most important builders and researchers in the space:‍

Tri Dao: Together.ai

Tri Dao, the inventor of FlashAttention, presented the latest iteration of his work, Flash Attention 3 where he discussed the core, including warp group MMA and the Tensor Memory Accelerator (TMA).

Supriya Rao: PyTorch Insights

Supriya Rao from PyTorch shared some of the latest developments in PyTorch for easy quantization and sparsity with torchao.

Andrej Karpathy: Eureka Labs and llm.c

Andrej Karpathy shared the story of how he built llm.c and encouraged the audience to build more reference architectures that can fit in an LLMs context length. Andrej also further broke down his keynote via this excellent tweet thread.

Lily (Xiaoxuan) Liu: vLLM

In the evening session, Lily Liu presented vLLM, the most popular high-performance library for LLM inference with a deep dive on speculative decoding.

Tim Dettmers: The Power of Open Source

Tim Dettmers, the original author of the phrase “CUDA MODE” delivered a talk on how open source can win over closed source.

Wen-mei Hwu: How to Pick a Hard Problem

Wen-mei Hwu from NVIDIA discussed how you can pick a hard problem that can sustain you for a decade.

Hackathon Day

During the hackathon, developers paired up focused on GPU optimizations, performance tuning, and scalable inference, working collaboratively throughout the day.

By the early evening, over 40 teams had submitted their projects, with the top five winning teams receiving $40k worth of compute credits from our generous sponsors (Anyscale, Fal.ai, Lambda Labs, Modal Labs, Nebius.ai, Oracle, Prime Intellect and Together.ai) and the top three teams receiving NVIDIA RTX 4080 GPUs signed by CEO Jensen Huang.

Shoutout to the top 3 teams that worked on flexible attention masks in CUTLASS, a NCCL rewrite using Triton and PyTorch binaries without libtorch.

We loved seeing the energy of this community IRL. The community has recently rebranded to “GPU MODE” to be more inclusive of more hardware architectures and programming languages. If this excites you, please join the Discord to be involved and hear more about future community events.

—Casey and Mark Saroufim, co-founder of the GPU Mode community