📋 Project Overview
Developed a specialized Korean medical domain LLM by creating comprehensive
healthcare datasets and
fine-tuning the rtzr/ko-gemma-2-9b-it model. The project focused on building a model
capable
of answering Korean medical licensing examination questions while adhering to healthcare domain ethics
and safety guidelines. This work contributed to the KorMedMCQA benchmark research.
🎯 Problem Definition & Goals
- Problem: Existing LLMs lack specialized knowledge in Korean medical domain, performing poorly on healthcare licensing examinations and medical question-answering tasks.
- Goal 1: Curate and construct high-quality Korean medical datasets from multiple authoritative sources.
- Goal 2: Fine-tune a Korean LLM to achieve competitive performance on KorMedMCQA benchmark.
- Goal 3: Ensure the model provides ethically sound and medically accurate responses.
⚙️ Key Features & Contributions
- Multi-Source Dataset Curation: Aggregated data from KorMedMCQA, MedExpQA (translated), UltraMed, COD (Clinical Observation Data), and Asan Medical Center disease dictionary.
- Data Quality Control: Implemented rigorous filtering and validation processes to ensure medical accuracy and remove potentially harmful content.
- Fine-tuning Pipeline: Developed efficient training pipeline using HuggingFace Transformers with LoRA and QLoRA for parameter-efficient fine-tuning.
- Evaluation Framework: Created comprehensive evaluation suite covering multiple medical domains including clinical knowledge, diagnostics, and treatment recommendations.
🔧 Technical Challenges & Solutions
- Data Scarcity: Limited Korean medical training data available. Solved by crawling disease dictionaries, translating English medical datasets, and augmenting existing Korean sources.
- Domain Adaptation: General-purpose LLMs struggled with medical terminology. Applied continued pre-training on medical corpus before task-specific fine-tuning.
- Ethical Considerations: Medical responses require careful handling of safety-critical information. Implemented response filtering and added appropriate disclaimers for medical advice.
- Evaluation Consistency: Medical QA evaluation needed domain expertise. Used KorMedMCQA benchmark with standardized scoring methodology for reliable comparison.
📈 Results & Learnings
- Benchmark Performance: Achieved significant improvement over baseline on KorMedMCQA benchmark, demonstrating effective domain adaptation.
- Published Model: Released fine-tuned model on HuggingFace for community use and further research.
- Research Contribution: Contributed to the KorMedMCQA paper accepted for publication, advancing Korean medical NLP research.
- Key Learning: Gained expertise in domain-specific LLM fine-tuning, medical NLP challenges, and the importance of responsible AI development in healthcare applications.