rewardProcessor
Classes
Processes reward values for a model output evaluation. |
Module Contents
- class rewardProcessor.RewardProcessor(values_to_evaluate=None, values_to_align=None, file_path=None, batch_size=32)
Processes reward values for a model output evaluation.
This class is used to evaluate and align different values (such as diversity, coherence, etc.) for a set of generated model outputs.
- values_to_evaluate
List of values to evaluate.
- Type:
Optional[List[str]]
- values_to_align
List of values to align.
- Type:
Optional[List[str]]
- file_path
Path to the JSON file containing generated outputs.
- Type:
str
- batch_size
Batch size for processing rewards.
- Type:
int
- values_to_align_str = None
- values_to_evaluate_str = None
- file_path
- batch_size
- add_reward(value, basemodel_for_perplexity=None)
Adds a specific reward to the dataset in a non-invasive manner.
- Parameters:
value (str) – The reward type to add (e.g., “diversity”, “perplexity”).
basemodel_for_perplexity (Optional[str]) – Base model required for “perplexity” value. Defaults to None.
- Raises:
ValueError – If value is “perplexity” and basemodel_for_perplexity is not provided.
- # Example usage to add reward, often via submitting parallel pbs files to accelerate computation:
>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json") >>> reward_processor.add_reward(value="humor")
- # Command-line usage:
>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --value="humor" add_reward
- quantile_transform_single_c(c_list)
Transforms a list of c values into quantiles.
- Parameters:
c_list (List[float]) – List of c values to transform.
- Returns:
List of quantile values.
- Return type:
List[float]
- assess_original_value(evaluation_mode=False)
Assesses the original level of each value in the dataset.
- Parameters:
evaluation_mode (bool) – If True, calculates quantiles; otherwise, calculates only average. Defaults to False.
- # Example usage to get realized values or c-levels under the original model:
>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json", values_to_evaluate="all") >>> reward_processor.assess_original_value()
- # As a natural followup, one could define custom c_align values, e.g., set c to be 20% improvement:
>>> import numpy as np >>> c_noalign = [-1.239, -2.731, -1.437, -0.362, 0.848, 0.521, -1.375] >>> c_align = [x + np.log(1.25) for x in c_noalign[:4]] >>> print(f"c_align: {','.join(f'{v:.3f}' for v in c_align)}") # [-1.016,-2.508,-1.214,-0.139,0.848,0.521,-1.375]
- # Command-line usage:
>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_evaluate="all" assess_original_value
- _assess_postalignment_singlevalue(singlevalue_to_evaluate, lam, debug=True)
Assesses a single value’s alignment level after applying alignment weights, as weight-approximated by data stored in the file originally generated from pre-alignment distribution.
- Parameters:
singlevalue_to_evaluate (str) – The value to evaluate alignment for.
lam (Union[float, List[float]]) – Alignment weights.
debug (bool) – If True, prints debugging information. Defaults to True.
- Returns:
Estimated alignment level for the single value.
- Return type:
float
- assess_postalignment_multivalue(lam=None, k=100, scaling=1.0, scaling_MAX=1)
Applies assess_postalignment_singlevalue across multiple values and alignments. If lam is not given, assess the c level by a random vector of lam drawn from the probability simplex multiplied by the scaling factor If scaling < 0, then randomly select scaling factor from a range. If lam is given, overwrite the k and scaling to simply use the given lam.
- Parameters:
lam (Optional[Union[float, List[float]]]) – Fixed alignment weights. Defaults to None.
k (int) – Number of random samples for Monte Carlo. Defaults to 100.
scaling (float) – Scaling factor for random lambda. Defaults to 1.0.
scaling_MAX (int) – Maximum scaling for random lambda. Defaults to 1.
# Example 1 # Usage for post-alignment assessment of multiple values:
>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json", values_to_align="humor,harmless", lam=[0.41, 0.37], values_to_evaluate="all") >>> reward_processor.assess_postalignment_multivalue()
- # Command-line usage:
>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_align="humor,harmless" --lam=0.41,0.37 --values_to_evaluate="all" assess_postalignment_multivalue >>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_align="humor,harmless" --lam=0.41,0.37 --values_to_evaluate="humor,harmless" assess_postalignment_multivalue
# Example 2 # Pareto frontier study with random lambda (often used in conjunction with plot_pareto.py to visualize the Pareto frontier)
>>> reward_processor = RewardProcessor(file_path="results/Llama27b-chat-Anthropic-harmless.json", values_to_align="humor,harmless", values_to_evaluate="all", scaling=-1) >>> reward_processor.assess_postalignment_multivalue()
- # Command-line usage:
>>> python rewardProcessor.py --file_path="results/Llama27b-chat-Anthropic-harmless.json" --values_to_align="humor,harmless" --values_to_evaluate="all" --scaling=-1 assess_postalignment_multivalue