Korean Medical LLM | Junyeong Song

Project Overview

Developed a specialized Korean medical domain LLM by creating comprehensive healthcare datasets and fine-tuning the rtzr/ko-gemma-2-9b-it model. The project focused on building a model capable of answering Korean medical licensing examination questions while adhering to healthcare domain ethics and safety guidelines. This work contributed to the KorMedMCQA benchmark research.

Problem Definition & Goals

Problem: Existing LLMs lack specialized knowledge in Korean medical domain, performing poorly on healthcare licensing examinations and medical question-answering tasks.
Goal 1: Curate and construct high-quality Korean medical datasets from multiple authoritative sources.
Goal 2: Fine-tune a Korean LLM to achieve competitive performance on KorMedMCQA benchmark.
Goal 3: Ensure the model provides ethically sound and medically accurate responses.

Key Features & Contributions

Multi-Source Dataset Curation: Aggregated data from KorMedMCQA, MedExpQA (translated), UltraMed, COD (Clinical Observation Data), and Asan Medical Center disease dictionary.
Data Quality Control: Implemented rigorous filtering and validation processes to ensure medical accuracy and remove potentially harmful content.
Fine-tuning Pipeline: Developed efficient training pipeline using HuggingFace Transformers with LoRA and QLoRA for parameter-efficient fine-tuning.
Evaluation Framework: Created comprehensive evaluation suite covering multiple medical domains including clinical knowledge, diagnostics, and treatment recommendations.

Technical Challenges & Solutions

Data Scarcity: Limited Korean medical training data available. Solved by crawling disease dictionaries, translating English medical datasets, and augmenting existing Korean sources.
Domain Adaptation: General-purpose LLMs struggled with medical terminology. Applied continued pre-training on medical corpus before task-specific fine-tuning.
Ethical Considerations: Medical responses require careful handling of safety-critical information. Implemented response filtering and added appropriate disclaimers for medical advice.
Evaluation Consistency: Medical QA evaluation needed domain expertise. Used KorMedMCQA benchmark with standardized scoring methodology for reliable comparison.

Results & Learnings

Benchmark Performance: Achieved significant improvement over baseline on KorMedMCQA benchmark, demonstrating effective domain adaptation.
Published Model: Released fine-tuned model on HuggingFace for community use and further research.
Research Contribution: Contributed to the KorMedMCQA paper accepted for publication, advancing Korean medical NLP research.
Key Learning: Gained expertise in domain-specific LLM fine-tuning, medical NLP challenges, and the importance of responsible AI development in healthcare applications.

Technologies

Python Transformers TRL Datasets Unsloth SFT Model Merge

Links

Hugging Face Model KorMedMCQA Paper