# Model Inference

# Usage

To specify the model to use for a particular Unit's execution, simply pass the model name to the via() method. By default, Verdict will use litellm to infer the model's connection parameters from the short-name passed to the via() method. Refer to the LiteLLM Python SDK docs for more information. In particular, take note of the provider API keys you may need to set.

...
>> JudgeUnit.via('gpt-4o-mini') # be sure to set your OPENAI_API_KEY
>> JudgeUnit.via('claude-3') # be sure to set your ANTHROPIC_API_KEY
>> JudgeUnit.via('deepinfra/meta-llama/Meta-Llama-3.1-8B-Instruct') # be sure to set your DEEPINFRA_API_KEY

In addition, we support hosted vLLM endpoints.

from verdict.model import vLLMModel
model = vLLMModel(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    api_base=".../v1",
    api_key="..."
)

...
>> JudgeUnit.via(model)

The .via() directive will cascade down to all sub-Units unless they have had their own .via() directive applied.

from verdict import Layer
from verdict.common.cot import CoTUnit
from verdict.common.judge import JudgeUnit

ensemble = CoTUnit().via('gpt-4o') \
>> Layer(
    JudgeUnit(explanation=True).via('o1') \
    >> JudgeUnit()
, 3).via('gpt-4o-mini')

ensemble.materialize().plot()

# Retries

We bypass all litellm/instructor/provider-client retry mechanisms and roll our own retry logic. Simply pass the retries parameter to the via() method and Verdict will automatically retry all recoverable inference-time errors.

>> JudgeUnit.via('gpt-4o-mini', retries=3)

# Inference Parameters

All other keyword arguments passed to the via() method will be passed to the model's completion method. Here is where you can specify any standard inference parameters such as temperature, max_tokens, etc.

>> JudgeUnit.via('gpt-4o-mini', temperature=1.2)

# Rate Limiting

Refer to the Rate Limiting section.

# Advanced

# Model Selection Policy

As an extension of the retry logic, Verdict also supports a general model selection precedence policy. This can be useful, for example, in cases where a weaker model may suffice in a majority of cases, but you want to fallback to a more capable model in the event of failure (eg, Unit#validate does not succeed).

...
>> JudgeUnit.via(ModelSelectionPolicy.from_names([
    #       model ,  retries,  inference parameters
    ('gpt-4o-mini',        1,    {temperature: 1.2}),
    (     'gpt-4o',        3,    {temperature: 0.9}),
]))

# Prefix/Prompt Caching

We find that many model providers cache prefixes/prompts by default, even when structured decoding fails. This effectively poisons the cache and causes retries to have no effect. To alleviate this, we add a 10-character random alpha nonce at the start of each prompt for all ProviderModels. Disable this by passing use_nonce=False to the Model constructor.

from verdict.model import ProviderModel
model = ProviderModel(
    "gpt-4o-mini",
    use_nonce=False
)

...
>> JudgeUnit.via(model)