This podcast offers a comprehensive exploration of multi-modal generative AI. We examine the two dominant families of techniques, the multi-modal large language models (MLLM) and diffusion models, covering their probabilistic modeling procedures, multi-modal architecture designs, and advanced applications in image/video large language models, as well as text-to-image/video generation.
We look at how these models are being used in text-to-image/video generation and then dive into the future directions of unified models, controllable generation, and lightweight multi-modal AI.
Online Tutorials:
#genai #levelup #level9 #learn #generativeai #ai #aipapers #podcast #deeplearning #machinelearning #multimodal