Title:Evolution of Diffusion Models: From Birth to Enhanced Efficiency and Controllability
生成AI —擴散模型的演進:從誕生到提升生成效能與可控性
Speaker:賴杰昕博士 (Research Scientist at Sony AI)
Time:2024.02.27 (Tue.) 16:00 – 17:00
Venue:第三綜合大樓1F 101
Tea time:15:30, Room 707
Bio:
- B.S. Mathematics, National Tsing Hua University
- Ph.D. Mathematics, University of Minnesota - Twin Cities Research scientist/Tech leader at Sony AI
- Chieh-Hsin Lai is a research scientist at Sony AI, focusing on robustness, deep generative models (especially diffusion models), and theoretical deep learning.
Abstract:
Diffusion models, pioneers in Generative AI, have significantly propelled the creation of synthetic images, audio, 3D objects/scenes, and proteins. Beyond their role in generation, these models have found practical applications in tasks like media content editing/restoration, as well as in diverse domains such as robotics learning.
In this talk, we'll explore the origins of diffusion models, gaining insights into their mechanisms as differential equations (DE) solving (Song et al. ICLR 2020). With this, we introduced, FP-Diffusion (Lai et al. ICML 2023), improves the diffusion model by aligning it with its underlying mathematical structure, the Fokker-Planck (FP) Equation.
Additionally, the link between diffusion models and DE solving reveals limitations associated with the slow sampling speed of thousand-step generation. Motivated by this, we'll introduce the Consistency Trajectory Model (CTM) (Kim & Lai et al. ICLR 2024), an innovative method enabling one-step diffusion model generation while preserving high fidelity and diversity. If time permits, we'll delve into controllable generation using pre-trained diffusion models, showcasing their utility in tasks such as media restoration and user-specified applications, and exploring their practical applications in industrial business.
Diffusion model是生成人工智慧的先驅,推動了合成圖像、音頻、3D物體/場景和蛋白質的生成。除了在生成中的作用外,這些模型在媒體內容編輯/修復等任務中發現了實際應用,且還應用於機器人學習等各種領域。
在這場演講中,我們將探討 Diffusion model的起源,深入了解其和求解微分方程的關聯(Song et al. ICLR 2020) 。基於此,我們引入了FP-Diffusion (Lai et al. ICML 2023),通過使其與其本質的數學結構 -Fokker-Planck(FP)方程式更吻合,而改進了 Diffusion model。此外,Diffusion model與DE求解(通常需要數百至千步)之間的聯繫揭示了其生成慢取樣速度的限制。在這方面的動機下,我們將介紹一種創新方法,即Consistency Trajectory Model (CTM) (Kim & Lai et al. ICLR 2024),實現了一步Diffusion model生成,同時保持高度保真度和多樣性。如果時間允許,我們將深入探討使用預訓練Diffusion Model進行可控生成,展示其在反問題和可控生成等任務中的實用性,並探索其在業界中的實際應用。