• Technical Name
  • Hybrid CNN Accelerator System Design and the Associated Model Training/Analyzing Tools
  • Operator
  • National Chiao Tung University
  • Booth
  • Online display only
  • Contact
  • 蔡家齊
  • Email
  • apple.35932003@gmail.com
Technical Description Hybrid CNN DLA system
1. Use SPE to deal with CNN operation efficiently, and support 1, 2, 4, 8-bit CNN operations. It allows Hybrid CNN deep learning accelerator operated in a high-efficiency mode.
2. Use Input Ping-Pong Buffer to optimize DRAM access and scheduling of Hybrid CNN DLA. Use 2-Stage Input Buffer to reduce latency in data transmission among 2-D Systolic PE Arrays.
3. Use independent Partial Sum Sorter to access On-chip memory, allowing 100% hardware utilization.
Hybrid CNN Training/Analyzing Tools
1. Using knowledge transfer, dynamic quantization, and other techniques, this tool can greatly simplify the Hybrid CNN model.
2. Using SSTE to solve the gradient deviation problem in back-propagation when training at the bit-accurate level. By recording the overflowrate, it prevents gradient from nodes with a high probability of overflow rate.
3. Our team develops an automatic analyzing/training tool (ezQUANT), which supports automatically quantizing and fine-tuning DL models.
Scientific Breakthrough 1.1.The Hybrid CNN DLA proposed by this work can support 1/1, 2/2, 4, and 8/8bit CNN operations.
2.On Xilinx ZCU102 FPGA, it can achieve peak performance 691.2 GOPS at 8/8bit, 1382.4 GOPS at 4/4bit, 2764.8 GOPS at 2/2bit, and 5529.6 GOPS at 1/1bit.
3.The first Hybrid CNN training mythology greatly compresses the model size while keeping the accuracy.
4.The first bit-accurate level software in the world, which solves the problem of analyzing accuracy loss when porting the model on the DLA in the existing papers or industrial AI DLA.
Industrial Applicability For the industrial applications, our Hybrid CNN model training tool can reduce the model size efficiently. Then this model can inference in higher speed by our customized DLA. The software tool can be used to CNN model training and the model re-training. And we can provide the corresponding DLA to inference the model in edge application. This solution can avoid quantization error when floating-point model porting to other edge device.