rewardProcessor

Classes

RewardProcessor

Processes reward values for a model output evaluation.

Module Contents

class rewardProcessor.RewardProcessor(values_to_evaluate=None, values_to_align=None, file_path=None, batch_size=32)

Processes reward values for a model output evaluation.

This class is used to evaluate and align different values (such as diversity, coherence, etc.) for a set of generated model outputs.

values_to_evaluate

List of values to evaluate.

Type:

Optional[List[str]]

values_to_align

List of values to align.

Type:

Optional[List[str]]

file_path

Path to the JSON file containing generated outputs.

Type:

str

batch_size

Batch size for processing rewards.

Type:

int

values_to_align_str = None
values_to_evaluate_str = None
file_path
batch_size
add_reward(value, basemodel_for_perplexity=None)

Adds a specific reward to the dataset in a non-invasive manner.

Parameters:
  • value (str) – The reward type to add (e.g., “diversity”, “perplexity”).

  • basemodel_for_perplexity (Optional[str]) – Base model required for “perplexity” value. Defaults to None.

Raises:

ValueError – If value is “perplexity” and basemodel_for_perplexity is not provided.

# Example usage to add reward, often via submitting parallel pbs files to accelerate computation:
>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json")
>>> reward_processor.add_reward(value="humor")
# Command-line usage:
>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --value="humor" add_reward
quantile_transform_single_c(c_list)

Transforms a list of c values into quantiles.

Parameters:

c_list (List[float]) – List of c values to transform.

Returns:

List of quantile values.

Return type:

List[float]

assess_original_value(evaluation_mode=False)

Assesses the original level of each value in the dataset.

Parameters:

evaluation_mode (bool) – If True, calculates quantiles; otherwise, calculates only average. Defaults to False.

# Example usage to get realized values or c-levels under the original model:
>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json", values_to_evaluate="all")
>>> reward_processor.assess_original_value()
# As a natural followup, one could define custom c_align values, e.g., set c to be 20% improvement:
>>> import numpy as np
>>> c_noalign = [-1.239, -2.731, -1.437, -0.362, 0.848, 0.521, -1.375]
>>> c_align = [x + np.log(1.25) for x in c_noalign[:4]]
>>> print(f"c_align: {','.join(f'{v:.3f}' for v in c_align)}") # [-1.016,-2.508,-1.214,-0.139,0.848,0.521,-1.375]
# Command-line usage:
>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_evaluate="all" assess_original_value
_assess_postalignment_singlevalue(singlevalue_to_evaluate, lam, debug=True)

Assesses a single value’s alignment level after applying alignment weights, as weight-approximated by data stored in the file originally generated from pre-alignment distribution.

Parameters:
  • singlevalue_to_evaluate (str) – The value to evaluate alignment for.

  • lam (Union[float, List[float]]) – Alignment weights.

  • debug (bool) – If True, prints debugging information. Defaults to True.

Returns:

Estimated alignment level for the single value.

Return type:

float

assess_postalignment_multivalue(lam=None, k=100, scaling=1.0, scaling_MAX=1)

Applies assess_postalignment_singlevalue across multiple values and alignments. If lam is not given, assess the c level by a random vector of lam drawn from the probability simplex multiplied by the scaling factor If scaling < 0, then randomly select scaling factor from a range. If lam is given, overwrite the k and scaling to simply use the given lam.

Parameters:
  • lam (Optional[Union[float, List[float]]]) – Fixed alignment weights. Defaults to None.

  • k (int) – Number of random samples for Monte Carlo. Defaults to 100.

  • scaling (float) – Scaling factor for random lambda. Defaults to 1.0.

  • scaling_MAX (int) – Maximum scaling for random lambda. Defaults to 1.

# Example 1 # Usage for post-alignment assessment of multiple values:

>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json", values_to_align="humor,harmless", lam=[0.41, 0.37], values_to_evaluate="all")
>>> reward_processor.assess_postalignment_multivalue()
# Command-line usage:
>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_align="humor,harmless" --lam=0.41,0.37 --values_to_evaluate="all" assess_postalignment_multivalue
>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_align="humor,harmless" --lam=0.41,0.37 --values_to_evaluate="humor,harmless" assess_postalignment_multivalue

# Example 2 # Pareto frontier study with random lambda (often used in conjunction with plot_pareto.py to visualize the Pareto frontier)

>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json", values_to_align="humor,harmless", values_to_evaluate="all", scaling=-1)
>>> reward_processor.assess_postalignment_multivalue()
# Command-line usage:
>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_align="humor,harmless" --values_to_evaluate="all" --scaling=-1 assess_postalignment_multivalue