RADACS: Resolution-Aware Diffusion Accelerator

Grand Prize (1st of 73 teams nationwide), 2024 AI Semiconductor Idea & Design Competition (AI-ON).

Team project with Kyungjun Oh and Omin Kwon for the 2024 AI Semiconductor Idea & Design Competition (AI-ON), hosted by the Korean Ministry of Science and ICT and IITP. Building on the FPGA-based diffusion acceleration work from my Embedded System Design course, we proposed RADACS — a hardware architecture that, to our knowledge, is the first to bake computational staleness (the strong feature similarity across adjacent diffusion timesteps, exploited algorithmically by DeepCache and AsyncDiff) directly into accelerator design rather than treating it purely as a software-level technique.

RADACS pairs two specialized matmul units — a High-Res MMU (HMMU) with a large weight buffer for shallow U-Net levels, and a Low-Res MMU (LMMU) with dual memory controllers for deep levels — to match the asymmetric M/K/N shapes that arise from U-Net’s hierarchical resolution structure. These are organized into Dual and Quad U-Net Engines (DU-NE / QU-NE) connected through a feature cache that realizes staleness-based dataflow at the hardware level, making the design naturally compatible with recent algorithmic acceleration techniques. A Vitis HLS prototype on the Xilinx Zynq-7000 demonstrated 1.17× / 1.47× speedups on high- and low-resolution matmul operations over a baseline single-MMU design.

Awarded the Grand Prize, ranking 1st out of 73 teams nationwide, which included a 10-day Silicon Valley research trip — site visits and seminars at NVIDIA, Broadcom, Google, K-ASIC, UC Berkeley, and Stanford, with open Q&A sessions with researchers and engineers at each. As an undergraduate, this was an unusually direct window into the global semiconductor and AI hardware ecosystem.

Press coverage (Korean) · Final presentation (PDF)