trainDPO

Functions

train_dpo(sample_size, beta, harmless_ratio, save_path)

Trains a Direct Preference Optimization (DPO) model with optional LoRA (Low-Rank Adaptation).

Module Contents

trainDPO.train_dpo(sample_size: int, beta: float, harmless_ratio: float, save_path, use_lora: bool = False)

Trains a Direct Preference Optimization (DPO) model with optional LoRA (Low-Rank Adaptation).

Parameters:
  • sample_size (int) – Number of samples to use for training.

  • beta (float) – Regularization parameter for DPO training.

  • harmless_ratio (float) – Proportion of harmless to helpful data in the dataset.

  • save_path (str) – Directory to save the trained model.

  • use_lora (bool) – If True, use LoRA for fine-tuning a smaller, efficient model. Defaults to False.