首页 >> 科学研究 >> 学术讲座 >> 正文

人工智能学科交叉讲座系列第【41】期：FP4 quantization and its real-world applications on LLMs and diffusion models

信息来源: 发布时间:2025-03-19 浏览量:

报告人：Zhiyu Cheng

NVIDIA

主持人：李萌助理教授

必赢71886网址登录必赢626net入口

时间：2025年3月27日 11:00-12:00

腾讯会议：463-154-887

报告题目：

FP4 quantization and its real-world applications on LLMs and diffusion models

报告摘要：

As large language models (LLMs) and diffusion models grow in complexity, efficient inference has become a pressing concern. In this talk, we introduce FP4 quantization—an emerging technique that substantially reduces memory usage and computational costs with minimal accuracy trade-offs. We begin by discussing the FP4 numerical format. Next, we delve into the quantization workflow, highlighting both post-training quantization (PTQ) and quantization-aware training (QAT) algorithms, along with practical recipes and best practices for successful implementation on LLMs and diffusion models. We then present quantitative and qualitative results to illustrate FP4 quantization’s impact on real-world applications, such as text, image and video generation. Finally, we introduce the NVIDIA TensorRT Model Optimizer, detailing its capabilities for FP4 quantization and streamlined deployment through TensorRT-LLM.

报告人简介：

Zhiyu Cheng is a manager at NVIDIA, where he focuses on driving algorithms and software development to optimize large-scale inference for generative AI workloads, including large language models (LLMs), vision language models (VLMs), and diffusion models on Nvidia’s latest platforms. He has over 10 years of industry experience in efficient deep learning across his career from NXP, Xilinx, Baidu and OmniML (acquired by Nvidia). Zhiyu has a record of over 30 published papers and patents. He holds a Ph.D. degree in electrical and computer engineering from the University of Illinois with a thesis in the field of information theory.

上一页：人工智能学科交叉讲座系列第【42】期：Accelerator-Centric Edge AI Architectures for Low-Power and Personalized Wearables

下一页：人工智能学科交叉讲座系列第【40】期：原生类脑通用智能大模型

必赢626net入口|欢迎您

首页

必赢626net入口

必赢官网8873

科学研究

科研基地

必赢官网8873

人才培养

招贤纳士

联系我们

人工智能学科交叉讲座系列第【41】期：FP4 quantization and its real-world applications on LLMs and diffusion models

信息来源: 发布时间:2025-03-19 浏览量:

人工智能学科交叉讲座系列第【41】期：FP4 quantization and its real-world applications on LLMs and diffusion models

信息来源: 发布时间:2025-03-19 浏览量:_showDynClicks("wbnews", 1583922820, 3135)

信息来源: 发布时间:2025-03-19 浏览量: