trainDPO
Functions
|
Trains a Direct Preference Optimization (DPO) model with optional LoRA (Low-Rank Adaptation). |
Module Contents
- trainDPO.train_dpo(sample_size: int, beta: float, harmless_ratio: float, save_path, use_lora: bool = False)
Trains a Direct Preference Optimization (DPO) model with optional LoRA (Low-Rank Adaptation).
- Parameters:
sample_size (int) – Number of samples to use for training.
beta (float) – Regularization parameter for DPO training.
harmless_ratio (float) – Proportion of harmless to helpful data in the dataset.
save_path (str) – Directory to save the trained model.
use_lora (bool) – If True, use LoRA for fine-tuning a smaller, efficient model. Defaults to False.