rewardProcessor

Classes

RewardProcessor

Processes reward values for a model output evaluation.

Module Contents

class rewardProcessor.RewardProcessor(values_to_evaluate=None, values_to_align=None, file_path=None, batch_size=32)

Processes reward values for a model output evaluation.

This class is used to evaluate and align different values (such as diversity, coherence, etc.) for a set of generated model outputs.

values_to_evaluate

List of values to evaluate.

Type:: Optional[List[str]]

values_to_align

List of values to align.

Type:: Optional[List[str]]

file_path

Path to the JSON file containing generated outputs.

Type:: str

batch_size

Batch size for processing rewards.

Type:: int

values_to_align_str = None

values_to_evaluate_str = None

file_path

batch_size

add_reward(value, basemodel_for_perplexity=None)

Adds a specific reward to the dataset in a non-invasive manner.

Parameters:

value (str) – The reward type to add (e.g., “diversity”, “perplexity”).
basemodel_for_perplexity (Optional[str]) – Base model required for “perplexity” value. Defaults to None.

Raises:

ValueError – If value is “perplexity” and basemodel_for_perplexity is not provided.

# Example usage to add reward, often via submitting parallel pbs files to accelerate computation:

>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json")
>>> reward_processor.add_reward(value="humor")

# Command-line usage:

>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --value="humor" add_reward

quantile_transform_single_c(c_list)

Transforms a list of c values into quantiles.

Parameters:: c_list (List[float]) – List of c values to transform.
Returns:: List of quantile values.
Return type:: List[float]

assess_original_value(evaluation_mode=False)

Assesses the original level of each value in the dataset.

Parameters:: evaluation_mode (bool) – If True, calculates quantiles; otherwise, calculates only average. Defaults to False.

# Example usage to get realized values or c-levels under the original model:

>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json", values_to_evaluate="all")
>>> reward_processor.assess_original_value()

# As a natural followup, one could define custom c_align values, e.g., set c to be 20% improvement:

>>> import numpy as np
>>> c_noalign = [-1.239, -2.731, -1.437, -0.362, 0.848, 0.521, -1.375]
>>> c_align = [x + np.log(1.25) for x in c_noalign[:4]]
>>> print(f"c_align: {','.join(f'{v:.3f}' for v in c_align)}") # [-1.016,-2.508,-1.214,-0.139,0.848,0.521,-1.375]

# Command-line usage:

>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_evaluate="all" assess_original_value

_assess_postalignment_singlevalue(singlevalue_to_evaluate, lam, debug=True)

Assesses a single value’s alignment level after applying alignment weights, as weight-approximated by data stored in the file originally generated from pre-alignment distribution.

Parameters:

singlevalue_to_evaluate (str) – The value to evaluate alignment for.
lam (Union[float, List[float]]) – Alignment weights.
debug (bool) – If True, prints debugging information. Defaults to True.

Returns:

Estimated alignment level for the single value.

Return type:

float

assess_postalignment_multivalue(lam=None, k=100, scaling=1.0, scaling_MAX=1)

Applies assess_postalignment_singlevalue across multiple values and alignments. If lam is not given, assess the c level by a random vector of lam drawn from the probability simplex multiplied by the scaling factor If scaling < 0, then randomly select scaling factor from a range. If lam is given, overwrite the k and scaling to simply use the given lam.

Parameters:

lam (Optional[Union[float, List[float]]]) – Fixed alignment weights. Defaults to None.
k (int) – Number of random samples for Monte Carlo. Defaults to 100.
scaling (float) – Scaling factor for random lambda. Defaults to 1.0.
scaling_MAX (int) – Maximum scaling for random lambda. Defaults to 1.

# Example 1 # Usage for post-alignment assessment of multiple values:

>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json", values_to_align="humor,harmless", lam=[0.41, 0.37], values_to_evaluate="all")
>>> reward_processor.assess_postalignment_multivalue()

# Command-line usage:

>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_align="humor,harmless" --lam=0.41,0.37 --values_to_evaluate="all" assess_postalignment_multivalue
>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_align="humor,harmless" --lam=0.41,0.37 --values_to_evaluate="humor,harmless" assess_postalignment_multivalue

# Example 2 # Pareto frontier study with random lambda (often used in conjunction with plot_pareto.py to visualize the Pareto frontier)

>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json", values_to_align="humor,harmless", values_to_evaluate="all", scaling=-1)
>>> reward_processor.assess_postalignment_multivalue()

# Command-line usage:

>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_align="humor,harmless" --values_to_evaluate="all" --scaling=-1 assess_postalignment_multivalue