📋 Project Overview
A from-scratch implementation of Denoising Diffusion Probabilistic Models (DDPM) designed for deep understanding of diffusion model mathematics and architecture.
🎯 Problem Definition & Goals
- Problem: Many diffusion model implementations are either too complex or too simplified.
- Goal 1: Implement DDPM from scratch following the original paper.
- Goal 2: Extend to advanced techniques including DDIM and Classifier-Free Guidance.
- Goal 3: Create maintainable, testable code for diffusion model research.
⚙️ Key Features & Contributions
- Multiple Model Architectures: Implemented both UNet and Diffusion Transformer (DiT).
- Config-Based Management: All hyperparameters managed via YAML config files.
- Advanced Sampling: DDIM for 10-50x faster sampling plus CFG.
- Apple Silicon Support: Automatic MPS detection for GPU acceleration.
- Comprehensive Testing: Unit tests via pytest.
🔧 Technical Challenges & Solutions
- Variance Scheduling: Understanding beta schedules and generation quality.
- Timestep Embedding: Proper sinusoidal embedding implementation.
- Memory Efficiency: Long diffusion chains caused memory issues.
- DiT Integration: Adapting transformer architecture for image generation.
📈 Results & Learnings
- Successful Generation: Trained models generate diverse, high-quality MNIST digits.
- Sampling Speed: DDIM achieves comparable quality with 10x fewer steps.
- Educational Value: Comprehensive learning resource with detailed comments.
- Key Learning: Deep understanding of score-based generative models.