ADIA Lab Seminar Series

Scalable and Efficient AI: From Supercomputers to Smartphones

13 February 2024, 4:00pm UAE time

Professor Torsten Hoefler

Professor of Computer Science at ETH Zurich, Winner of the Gordon Bell Prize (2019), a member of Academia Europaea, and a Fellow of the ACM and IEEE. 

Following a “Performance as a Science” vision, Professor Hoefler combines mathematical models of architectures and applications to design optimized computing systems.  Before joining ETH Zurich, he led the performance modeling and simulation efforts for the first sustained Petascale supercomputer, Blue Waters, at the University of Illinois at Urbana-Champaign. He is also a key contributor to the Message Passing Interface (MPI) standard where he chaired the "Collective Operations and Topologies" working group.

Torsten won best paper awards at ACM/IEEE Supercomputing in 2010, 2013, 2014, 2019, 2022, and at other international conferences.  He has published numerous peer-reviewed scientific articles and has won numerous prizes for his work, including the IEEE CS Sidney Fernbach Memorial Award in 2022, and the ACM Gordon Bell Prize in 2019. Torsten was elected to the first steering committee of ACM's SIGHPC in 2013 and he was re-elected for every term since then.  His research interests revolve around the central topic of performance-centric system design and include scalable networks, parallel programming techniques, and performance modeling for large-scale simulations and artificial intelligence systems.

Seminar Overview:

Billion-parameter artificial intelligence models have proven to show exceptional performance in a large variety of tasks ranging from natural language processing, computer vision, and image generation to mathematical reasoning and algorithm generation. Those models usually require large parallel computing systems, often called "AI Supercomputers", to be trained initially. We will outline several techniques ranging from data ingestion, parallelization, to accelerator optimization that improve the efficiency of such training systems. Yet, training large models is only a small fraction of practical artificial intelligence computations. Efficient inference is even more challenging - models with hundreds-of-billions of parameters are expensive to use. We continue by discussing model compression and optimization techniques such as fine-grained sparsity as well as quantization to reduce model size and significantly improve efficiency during inference. These techniques may eventually enable inference with powerful models on hand-held devices. I may be able to extend the talk a bit more about reasoning in LLMs with our latest graph of thoughts work but this would not necessarily change the abstract too much anyway.

 Tuesday, 13 February 2024
 
 
 Presentation: 4:00 PM - 5:00 PM
 Networking for in-person guests:  5:00 - 6:00 PM
 
 ADGM Academy, 20th floor, Al  Maqam Tower
 Al Maryah Island , Abu Dhabi