alignValues

Classes

AlignValues

A class for obtaining the most appropriate lambda values, namely linear coefficients for combining multiple rewards.

Module Contents

class alignValues.AlignValues(value_list, file_path, c_list=None)

A class for obtaining the most appropriate lambda values, namely linear coefficients for combining multiple rewards.

This class handles the calculation of lambda values using a given set of (prompt, continuation) pairs generated from the reference model, and specified palette (c) if any.

c

Target palette.

Type:

torch.Tensor

value_list

List of values (str) to be aligned.

Type:

list

file_path

Path to the JSON file containing (prompt, continuation) pairs.

Type:

str

rewards

Tensor of rewards for each value and sample, shape (k, n) where k is the number of values and n is the sample size in file_path.

Type:

torch.Tensor

Example 1 (single human value):
>>> c_list = -0.5
>>> value_list = "humor"
>>> file_path = "results/Llama27b-chat-Anthropic-harmless.json"
>>> aligner = AlignValues(value_list, file_path, c_list)
>>> lambda_vals, success = aligner.optimize_lambda()
>>> print(f"Optimized lambda values: {lambda_vals}")
>>> print(f"Optimization success: {success}")
Example 2 (multiple human values):
>>> c_list = [-1.016, -2.508, -1.214, -0.139, 0.848, 0.521, -1.375]
>>> value_list = "all"
>>> file_path = "results/Llama27b-chat-Anthropic-harmless.json"
>>> aligner = AlignValues(value_list, file_path, c_list)
>>> lambda_vals, success = aligner.optimize_lambda()
>>> print(f"Optimized lambda values: {lambda_vals}")
>>> print(f"Optimization success: {success}")
Command-line usage:
>>> python alignValues.py --c_list=-0.5 --value_list="humor" --file_path="results/Llama27b-chat-Anthropic-harmless.json" optimize_lambda
>>> python alignValues.py --c_list=-1.016,-2.508,-1.214,-0.139,0.848,0.521,-1.375 --value_list="all" --file_path="results/Llama27b-chat-Anthropic-harmless.json" optimize_lambda
file_path
optimize_lambda(lambda_init=None, optimize_indices=None, verbose=True)

Optimize lambda values for the given palatte and rewards.

This method uses gradient descent to find optimal lambda values that maximize the dual objective function.

Parameters:
  • lambda_init (list, optional) – Initial lambda values. Defaults to None.

  • optimize_indices (list, optional) – Indices of lambda values to optimize. Defaults to None.

  • verbose (bool, optional) – Whether to print detailed information during optimization. Defaults to True.

Returns:

A tuple containing:
  • list: Optimized lambda values.

  • bool: True if optimization was successful, False otherwise.

Return type:

tuple

_dual_objective(lambda_vals)
sequential_optimize_lambda(lambda_init=None)

Sequentially optimize lambda for each human value.

This method aligns each value sequentially, storing the obtained lambda values. It starts with lambda_init = None if not provided. Future support may replace optimize_indices = [idx] with block-wise updates.

Parameters:

lambda_init (list, optional) – Initial lambda values. Defaults to None.

Returns:

Optimized lambda values after sequential optimization.

Return type:

list

Note

This function can be considered as a full-lambda optimization with freezing of values not currently being aligned.

Example

>>> aligner = AlignValues("all", "results/opt1.3b-Anthropic-harmless.json", [2.513, -0.967, 0.937, 0.876, 0.434, -3.337])
>>> optimized_lambda = aligner.sequential_optimize_lambda()
>>> print(f"Sequentially optimized lambda: {optimized_lambda}")

Command-line usage: >>> python alignValues.py –c_list=2.513,-0.967,0.937,0.876,0.434,-3.337 –value_list=”all” –file_path=”results/opt1.3b-Anthropic-harmless.json” sequential_optimize_lambda

sequential_optimize_lambda_multiround(round: int = 5)

Run sequential_optimize_lambda for multiple rounds.

This method performs multiple rounds of sequential lambda optimization, using the result of each round as the initial value for the next.

Parameters:

round (int, optional) – Number of optimization rounds to perform. Defaults to 5.

Returns:

Final optimized lambda values after all rounds.

Return type:

list

Example

>>> aligner = AlignValues("all", "results/opt1.3b-Anthropic-harmless.json", [2.513, -0.967, 0.937, 0.876, 0.434, -3.337])
>>> final_lambda = aligner.sequential_optimize_lambda_multiround(round=5)
>>> print(f"Final optimized lambda: {final_lambda}")
Command-line usage:
>>> python alignValues.py --c_list=2.513,-0.967,0.937,0.876,0.434,-3.337 --value_list="all" --file_path="results/opt1.3b-Anthropic-harmless.json" sequential_optimize_lambda_multiround
find_pareto_by_interpolation(c_low, c_high)

Automatically find the feasible palette c on the line between c_low and c_high that is closest to the Pareto frontier.

This method uses linear interpolation to search for a feasible solution between two given constraint vectors.

Parameters:
  • c_low (list or float) – Lower bound constraint vector or single value.

  • c_high (list or float) – Upper bound constraint vector or single value.

Returns:

The interpolation factor (rho) of the feasible solution if found, None otherwise.

Return type:

float or None

Example

>>> aligner = AlignValues("all", "results/basemodel-dataset.json", [2.513, -0.967, 0.937, 0.876, 0.434, -3.337])
>>> rho = aligner.find_pareto_by_interpolation([2.513, -0.967, 0.937, 0.876, 0.434, -3.337],
...                                            [2.534, -0.613, 1.268, 0.876, 0.434, -3.337])
>>> print(f"Feasible solution found at rho = {rho}")
Command-line usage:
>>> python alignValues.py --c_low=2.513,-0.967,0.937,0.876,0.434,-3.337 --c_high=2.534,-0.613,1.268,0.876,0.434,-3.337 --value_list="all" --file_path="results/basemodel-dataset.json" find_pareto_by_interpolation
find_pareto_by_oneValue(value_to_enhance: str)

Automatically find the feasible palette c that greedily increases one particular human value closest to the Pareto frontier.

This method uses binary search to find the maximum feasible value for a specific constraint while keeping others constant.

Parameters:

value_to_enhance (str) – The name of the value to be enhanced.

Returns:

The maximum feasible value found for the enhanced constraint.

Return type:

float

Raises:

ValueError – If the specified value is not in the list of supported values.

Example

>>> aligner = AlignValues("all", "results/basemodel-dataset.json", [2.513, -0.967, 0.937, 0.876, 0.434, -3.337])
>>> max_value = aligner.find_pareto_by_oneValue("gpt2-helpful")
>>> print(f"Maximum feasible value for 'gpt2-helpful': {max_value}")
Command-line usage:
>>> python alignValues.py --c_list=2.513,-0.967,0.937,0.876,0.434,-3.337 --value_list="all" --value_to_enhance="gpt2-helpful" --file_path="results/basemodel-dataset.json" find_pareto_by_oneValue
_save_results_to_text(optimized_lambda, success, save_prefix='results/alignValues')

Save the optimization results to a text file.

This method appends the results of lambda optimization to a text file, including the file path, constraint levels, values, and optimized lambda values.

Parameters:
  • optimized_lambda (list) – List of optimized lambda values.

  • success (bool) – True if optimization was successful, False otherwise.

  • save_prefix (str, optional) – Prefix for the save file path. Defaults to ‘results/alignValues’.

Example

>>> aligner = AlignValues("all", "results/model-data.json", [0.5, 1.0])
>>> optimized_lambda, success = aligner.optimize_lambda()
>>> aligner._save_results_to_text(optimized_lambda, success)
Results have been appended to results/alignValues.txt
_save_results_to_csv(optimized_lambda, dirichlet_lambda, save_prefix='results/alignValues')

Save the optimization results to a CSV file.

This method appends the results of lambda optimization to a CSV file, including the file path, constraint levels, values, optimized lambda values, and Dirichlet reference lambda values.

Parameters:
  • optimized_lambda (list) – List of optimized lambda values.

  • dirichlet_lambda (list) – List of Dirichlet reference lambda values.

  • save_prefix (str, optional) – Prefix for the save file path. Defaults to ‘results/alignValues’.

Example

>>> aligner = AlignValues("all", "results/model-data.json", [0.5, 1.0])
>>> optimized_lambda, _ = aligner.optimize_lambda()
>>> dirichlet_lambda = [0.3, 0.7]  # Example Dirichlet reference values
>>> aligner._save_results_to_csv(optimized_lambda, dirichlet_lambda)
Results have been appended to results/alignValues.csv
gen_rand_MAP_lambda(num_lambda: int, scaling_MAX: float, save_prefix: str = 'rand_MAP_lambda')

Generate random MAP lambda values with constraints.

This method generates random lambda values by drawing each c_i randomly between the current c_i and the maximum reward corresponding to value i. It modifies the c values, recalculates lambda, and returns a list of lambda values constrained by scaling_MAX.

Parameters:
  • num_lambda (int) – Number of valid lambda values to generate.

  • scaling_MAX (float) – Maximum allowed L1 norm for the generated lambda values.

  • save_prefix (str, optional) – Prefix for the save file path. Defaults to ‘rand_MAP_lambda’.

Returns:

A tuple containing:
  • list: Generated lambda values that satisfy the constraints.

  • float: Success rate of lambda generation attempts.

Return type:

tuple

Example

>>> aligner = AlignValues("all", "results/model-data.json", [0.5, 1.0, 1.5])
>>> lambdas, success_rate = aligner.gen_rand_MAP_lambda(10, 5.0)
>>> print(f"Generated {len(lambdas)} lambda values with a success rate of {success_rate:.2%}")

Note

This method temporarily modifies the instance’s c values during execution but restores them to their original values before returning.