research-papers
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Original Paper: https://arxiv.org/abs/2211.10438 By: Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han Abstract: Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, existing methods cannot maintain accuracy and hardware efficiency