recommend

Trace

Technical Name	An Efficient End-to-End Large Language Model (LLM) Accelerator Chip Optimized for Edge Computing
Project Operator	National Yang Ming Chiao Tung University
Project Host	黃俊達
Scientific Breakthrough	Our team developed MMA, a high-efficiency accelerator optimized for Transformer architecture and LLM inference. Featuring low power consumption and high performance, it is currently the only solution capable of fully running LLMs on edge devices. Compared to Georgia Tech’s MicroScopiQ, Seoul National University’s Tender, and Korea University’s OPAL, our design is the first to fully realize the vision of LLM on Edge, making secure and real-time AI possible.
Industrial Applicability	Our team developed MMA, the world＇s first accelerator enabling full LLM execution on edge devices. With MX compute and dynamic quantization, it boosts efficiency and accuracy. Supporting multiple precision formats and nonlinear functions, it optimizes data transfer and minimizes latency. When running Llama2-7b, it achieves 95% resource utilization with a PPL only 0.3 off the original. It is currently the only end-to-end LLM-on-edge solution.
Matching Needs	LLMs like ChatGPT are widely used in daily life. Our chip design enables real-time LLM inference without relying on the cloud or GPUs, reducing energy use and protecting data privacy. Imagine a future where anyone can run an LLM directly on their smartphone, or interact with a compact personal chatbot anytime, anywhere, completely offline and without worrying about data leaks. The solution not only enhances convenience but also resolves concerns around privacy, bringing AI into everyday life.
Keyword	Large Language Model(LLM) Accelerator AI Chip Model Quantization Edge Computing MX Format