alignValues
Classes
A class for obtaining the most appropriate lambda values, namely linear coefficients for combining multiple rewards. |
Module Contents
- class alignValues.AlignValues(value_list, file_path, c_list=None)
A class for obtaining the most appropriate lambda values, namely linear coefficients for combining multiple rewards.
This class handles the calculation of lambda values using a given set of (prompt, continuation) pairs generated from the reference model, and specified palette (c) if any.
- c
Target palette.
- Type:
torch.Tensor
- value_list
List of values (str) to be aligned.
- Type:
list
- file_path
Path to the JSON file containing (prompt, continuation) pairs.
- Type:
str
- rewards
Tensor of rewards for each value and sample, shape (k, n) where k is the number of values and n is the sample size in file_path.
- Type:
torch.Tensor
- Example 1 (single human value):
>>> c_list = -0.5 >>> value_list = "humor" >>> file_path = "results/Llama27b-chat-Anthropic-harmless.json" >>> aligner = AlignValues(value_list, file_path, c_list) >>> lambda_vals, success = aligner.optimize_lambda() >>> print(f"Optimized lambda values: {lambda_vals}") >>> print(f"Optimization success: {success}")
- Example 2 (multiple human values):
>>> c_list = [-1.016, -2.508, -1.214, -0.139, 0.848, 0.521, -1.375] >>> value_list = "all" >>> file_path = "results/Llama27b-chat-Anthropic-harmless.json" >>> aligner = AlignValues(value_list, file_path, c_list) >>> lambda_vals, success = aligner.optimize_lambda() >>> print(f"Optimized lambda values: {lambda_vals}") >>> print(f"Optimization success: {success}")
- Command-line usage:
>>> python alignValues.py --c_list=-0.5 --value_list="humor" --file_path="results/Llama27b-chat-Anthropic-harmless.json" optimize_lambda >>> python alignValues.py --c_list=-1.016,-2.508,-1.214,-0.139,0.848,0.521,-1.375 --value_list="all" --file_path="results/Llama27b-chat-Anthropic-harmless.json" optimize_lambda
- file_path
- optimize_lambda(lambda_init=None, optimize_indices=None, verbose=True)
Optimize lambda values for the given palatte and rewards.
This method uses gradient descent to find optimal lambda values that maximize the dual objective function.
- Parameters:
lambda_init (list, optional) – Initial lambda values. Defaults to None.
optimize_indices (list, optional) – Indices of lambda values to optimize. Defaults to None.
verbose (bool, optional) – Whether to print detailed information during optimization. Defaults to True.
- Returns:
- A tuple containing:
list: Optimized lambda values.
bool: True if optimization was successful, False otherwise.
- Return type:
tuple
- _dual_objective(lambda_vals)
- sequential_optimize_lambda(lambda_init=None)
Sequentially optimize lambda for each human value.
This method aligns each value sequentially, storing the obtained lambda values. It starts with lambda_init = None if not provided. Future support may replace optimize_indices = [idx] with block-wise updates.
- Parameters:
lambda_init (list, optional) – Initial lambda values. Defaults to None.
- Returns:
Optimized lambda values after sequential optimization.
- Return type:
list
Note
This function can be considered as a full-lambda optimization with freezing of values not currently being aligned.
Example
>>> aligner = AlignValues("all", "results/opt1.3b-Anthropic-harmless.json", [2.513, -0.967, 0.937, 0.876, 0.434, -3.337]) >>> optimized_lambda = aligner.sequential_optimize_lambda() >>> print(f"Sequentially optimized lambda: {optimized_lambda}")
Command-line usage: >>> python alignValues.py –c_list=2.513,-0.967,0.937,0.876,0.434,-3.337 –value_list=”all” –file_path=”results/opt1.3b-Anthropic-harmless.json” sequential_optimize_lambda
- sequential_optimize_lambda_multiround(round: int = 5)
Run sequential_optimize_lambda for multiple rounds.
This method performs multiple rounds of sequential lambda optimization, using the result of each round as the initial value for the next.
- Parameters:
round (int, optional) – Number of optimization rounds to perform. Defaults to 5.
- Returns:
Final optimized lambda values after all rounds.
- Return type:
list
Example
>>> aligner = AlignValues("all", "results/opt1.3b-Anthropic-harmless.json", [2.513, -0.967, 0.937, 0.876, 0.434, -3.337]) >>> final_lambda = aligner.sequential_optimize_lambda_multiround(round=5) >>> print(f"Final optimized lambda: {final_lambda}")
- Command-line usage:
>>> python alignValues.py --c_list=2.513,-0.967,0.937,0.876,0.434,-3.337 --value_list="all" --file_path="results/opt1.3b-Anthropic-harmless.json" sequential_optimize_lambda_multiround
- find_pareto_by_interpolation(c_low, c_high)
Automatically find the feasible palette c on the line between c_low and c_high that is closest to the Pareto frontier.
This method uses linear interpolation to search for a feasible solution between two given constraint vectors.
- Parameters:
c_low (list or float) – Lower bound constraint vector or single value.
c_high (list or float) – Upper bound constraint vector or single value.
- Returns:
The interpolation factor (rho) of the feasible solution if found, None otherwise.
- Return type:
float or None
Example
>>> aligner = AlignValues("all", "results/basemodel-dataset.json", [2.513, -0.967, 0.937, 0.876, 0.434, -3.337]) >>> rho = aligner.find_pareto_by_interpolation([2.513, -0.967, 0.937, 0.876, 0.434, -3.337], ... [2.534, -0.613, 1.268, 0.876, 0.434, -3.337]) >>> print(f"Feasible solution found at rho = {rho}")
- Command-line usage:
>>> python alignValues.py --c_low=2.513,-0.967,0.937,0.876,0.434,-3.337 --c_high=2.534,-0.613,1.268,0.876,0.434,-3.337 --value_list="all" --file_path="results/basemodel-dataset.json" find_pareto_by_interpolation
- find_pareto_by_oneValue(value_to_enhance: str)
Automatically find the feasible palette c that greedily increases one particular human value closest to the Pareto frontier.
This method uses binary search to find the maximum feasible value for a specific constraint while keeping others constant.
- Parameters:
value_to_enhance (str) – The name of the value to be enhanced.
- Returns:
The maximum feasible value found for the enhanced constraint.
- Return type:
float
- Raises:
ValueError – If the specified value is not in the list of supported values.
Example
>>> aligner = AlignValues("all", "results/basemodel-dataset.json", [2.513, -0.967, 0.937, 0.876, 0.434, -3.337]) >>> max_value = aligner.find_pareto_by_oneValue("gpt2-helpful") >>> print(f"Maximum feasible value for 'gpt2-helpful': {max_value}")
- Command-line usage:
>>> python alignValues.py --c_list=2.513,-0.967,0.937,0.876,0.434,-3.337 --value_list="all" --value_to_enhance="gpt2-helpful" --file_path="results/basemodel-dataset.json" find_pareto_by_oneValue
- _save_results_to_text(optimized_lambda, success, save_prefix='results/alignValues')
Save the optimization results to a text file.
This method appends the results of lambda optimization to a text file, including the file path, constraint levels, values, and optimized lambda values.
- Parameters:
optimized_lambda (list) – List of optimized lambda values.
success (bool) – True if optimization was successful, False otherwise.
save_prefix (str, optional) – Prefix for the save file path. Defaults to ‘results/alignValues’.
Example
>>> aligner = AlignValues("all", "results/model-data.json", [0.5, 1.0]) >>> optimized_lambda, success = aligner.optimize_lambda() >>> aligner._save_results_to_text(optimized_lambda, success) Results have been appended to results/alignValues.txt
- _save_results_to_csv(optimized_lambda, dirichlet_lambda, save_prefix='results/alignValues')
Save the optimization results to a CSV file.
This method appends the results of lambda optimization to a CSV file, including the file path, constraint levels, values, optimized lambda values, and Dirichlet reference lambda values.
- Parameters:
optimized_lambda (list) – List of optimized lambda values.
dirichlet_lambda (list) – List of Dirichlet reference lambda values.
save_prefix (str, optional) – Prefix for the save file path. Defaults to ‘results/alignValues’.
Example
>>> aligner = AlignValues("all", "results/model-data.json", [0.5, 1.0]) >>> optimized_lambda, _ = aligner.optimize_lambda() >>> dirichlet_lambda = [0.3, 0.7] # Example Dirichlet reference values >>> aligner._save_results_to_csv(optimized_lambda, dirichlet_lambda) Results have been appended to results/alignValues.csv
- gen_rand_MAP_lambda(num_lambda: int, scaling_MAX: float, save_prefix: str = 'rand_MAP_lambda')
Generate random MAP lambda values with constraints.
This method generates random lambda values by drawing each c_i randomly between the current c_i and the maximum reward corresponding to value i. It modifies the c values, recalculates lambda, and returns a list of lambda values constrained by scaling_MAX.
- Parameters:
num_lambda (int) – Number of valid lambda values to generate.
scaling_MAX (float) – Maximum allowed L1 norm for the generated lambda values.
save_prefix (str, optional) – Prefix for the save file path. Defaults to ‘rand_MAP_lambda’.
- Returns:
- A tuple containing:
list: Generated lambda values that satisfy the constraints.
float: Success rate of lambda generation attempts.
- Return type:
tuple
Example
>>> aligner = AlignValues("all", "results/model-data.json", [0.5, 1.0, 1.5]) >>> lambdas, success_rate = aligner.gen_rand_MAP_lambda(10, 5.0) >>> print(f"Generated {len(lambdas)} lambda values with a success rate of {success_rate:.2%}")
Note
This method temporarily modifies the instance’s c values during execution but restores them to their original values before returning.