必赢626net入口|欢迎您

首页 >> 科学研究 >> 学术讲座 >> 正文

人工智能学科交叉讲座系列第【41】期:FP4 quantization and its real-world applications on LLMs and diffusion models

信息来源:     发布时间:2025-03-19     浏览量:


报  告 人:Zhiyu Cheng

                   NVIDIA


主  持 人:李萌 助理教授

                必赢71886网址登录必赢626net入口


时      间:2025年3月27日 11:00-12:00

腾讯会议:463-154-887


报告题目:


      FP4 quantization and its real-world applications on LLMs and diffusion models


报告摘要:

As large language models (LLMs) and diffusion models grow in complexity, efficient inference has become a pressing concern. In this talk, we introduce FP4 quantization—an emerging technique that substantially reduces memory usage and computational costs with minimal accuracy trade-offs. We begin by discussing the FP4 numerical format. Next, we delve into the quantization workflow, highlighting both post-training quantization (PTQ) and quantization-aware training (QAT) algorithms, along with practical recipes and best practices for successful implementation on LLMs and diffusion models. We then present quantitative and qualitative results to illustrate FP4 quantization’s impact on real-world applications, such as text, image and video generation. Finally, we introduce the NVIDIA TensorRT Model Optimizer, detailing its capabilities for FP4 quantization and streamlined deployment through TensorRT-LLM.


报告人简介:

Zhiyu Cheng is a manager at NVIDIA, where he focuses on driving algorithms and software development to optimize large-scale inference for generative AI workloads, including large language models (LLMs), vision language models (VLMs), and diffusion models on Nvidia’s latest platforms. He has over 10 years of industry experience in efficient deep learning across his career from NXP, Xilinx, Baidu and OmniML (acquired by Nvidia). Zhiyu has a record of over 30 published papers and patents. He holds a Ph.D. degree in electrical and computer engineering from the University of Illinois with a thesis in the field of information theory.


Baidu
sogou