proem

Computer Science

2 years ago

Revolutionizing AI Inference: Unprecedented Scale, Trillion Parameter Models in Real-Time!

1 view

DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

Reza Yazdani Aminabadi, Samyam Rajbhandari, Ammar Ahmad Awan, Cheng Li, Canbing Li, Elton Zheng, Olatunji Ruwase, Shaden Smith, Minjia Zhang, Jeff Rasley, Yuxiong He

Paper Summary

The DeepSpeed-Inference system helps run big transformer models faster and more efficiently. It uses multiple GPUs to speed up processing and can handle models too big for just one GPU. This system reduces waiting time by 6.4 times and increases how much work can be done by 1.5 times compared to other methods. It can handle huge models with trillions of parameters in real-time, using hundreds of GPUs at once. This means it can process models 25 times bigger than before, while still working really fast at 84 trillion operations per second.

Revolutionizing AI Inference: Unprecedented Scale, Trillion Parameter Models in Real-Time!

Paper Summary

Revolutionizing AI Inference: Unprecedented Scale, Trillion Parameter Models in Real-Time!

Paper Summary

Related papers

Related papers