🎬

Drama Analysis Pipeline with Multi-Agents

2025.02 - 2025.10

🔒 Internal project — source code and detailed outputs are confidential. Metrics and architecture described here are based on personal notes from the internship period.

📋 Project Overview

Built a multi-agent pipeline that quantitatively analyzes drama scripts for storyline quality, character appeal, and commercial potential to support production decisions. 4-person team, deployed for actual script review during the alpha-beta-gamma test period at CJ AI Center.

🎯 Problem Definition & Goals

Problem: Manual review of 1–4 episodes per script was expensive and slow, with reviewer bias creating inconsistent evaluations and bottlenecks that risked overlooking valuable scripts.
Goal 1: Automate script analysis with objective metrics through a multi-agent system.
Goal 2: Robustly parse diverse script formats (PDF, HWP, DOCX) with high Korean text accuracy.
Goal 3: Improve OCR quality (WER) and scene-classification quality (F1) to a level viable for real production use.

⚙️ Key Features & Contributions

Multi-Format Document Parser: Preprocessing modules to convert PDF, HWP, DOCX scripts into structured text, with a Korean OCR benchmark dataset for validation.
VLM Integration (Qwen2.5-7B-VL): Replaced traditional OCR with a Vision Language Model to handle complex multi-column layouts and Korean text, eliminating ordering errors.
Scene Analysis Agent: Scene-level strength/weakness classifier with CoT prompting, enabling the LLM to reason about contextual nuance and subtext.
AWS On-Demand GPU Deployment: Designed a boto3-driven on-demand EC2 provisioning flow — pre-initialized EBS volume mounts, AMI-based startup, and region-aware GPU capacity selection — eliminating idle GPU costs for this periodic-use workload.
LangChain Orchestration: Connected Parser, Analyzer, and Evaluator agents via LangChain with state-based workflow and conditional parallel execution.

🔧 Technical Challenges & Solutions

Korean OCR — WER >20%: Traditional OCR (PaddleOCR) showed high error rates and column-ordering failures on multi-column script layouts.
Solution: Switched to Qwen2.5-7B-VL for document-structure-aware parsing. WER dropped from 20% to 7%.
Scene Classification — F1 0.2: Sparse training data and over-segmented genre-specific prompts caused the model to overfit to surface patterns rather than understand narrative nuance.
Solution: Benchmarked Korean-strength reasoning models (Deepseek-R1, c4ai-command-a) and simplified prompts to let the model reason freely. F1 improved from 0.2 to 0.5.
AWS Cold Start & Cost: On-premises deployment failed; naive AWS setup with S3 downloads caused 20–30 min cold starts. Resolved with pre-initialized EBS volumes, AMI snapshots, and uv-based environment setup, cutting startup time significantly while removing idle GPU costs.

📈 Results & Learnings

OCR Accuracy: WER reduced from 20% → 7% via VLM-based parsing, securing reliable input data for downstream agents.
Classification Quality: Scene-level F1 score improved from 0.2 → 0.5 through model selection and prompt simplification.
Production Deployment: Pipeline used in actual script review during the alpha-beta-gamma test period. On-demand GPU structure eliminated idle costs for this periodic workload.
Key Learnings: Data quality and simple prompting outperform over-engineered constraints. Infrastructure details (EBS mount stability, AMI readiness, regional GPU capacity) matter as much as model performance for real deployments.

🛠️ Technologies

Python LangChain Qwen2.5-7B-VL FastAPI AWS EC2 boto3 PyTorch