#
Model Inference
#
Usage
To specify the model to use for a particular Unit
's execution, simply pass the model name to the via()
method. By default, Verdict
will use litellm
to infer the model's connection parameters from the short-name passed to the via()
method. Refer to the LiteLLM Python SDK docs for more information. In particular, take note of the provider API keys you may need to set.
...
>> JudgeUnit.via('gpt-4o-mini') # be sure to set your OPENAI_API_KEY
>> JudgeUnit.via('claude-3') # be sure to set your ANTHROPIC_API_KEY
>> JudgeUnit.via('deepinfra/meta-llama/Meta-Llama-3.1-8B-Instruct') # be sure to set your DEEPINFRA_API_KEY
In addition, we support hosted vLLM endpoints.
from verdict.model import vLLMModel
model = vLLMModel(
"meta-llama/Meta-Llama-3-8B-Instruct",
api_base=".../v1",
api_key="..."
)
...
>> JudgeUnit.via(model)
The .via()
directive will cascade down to all sub-Unit
s unless they have had their own .via()
directive applied.
from verdict import Layer
from verdict.common.cot import CoTUnit
from verdict.common.judge import JudgeUnit
ensemble = CoTUnit().via('gpt-4o') \
>> Layer(
JudgeUnit(explanation=True).via('o1') \
>> JudgeUnit()
, 3).via('gpt-4o-mini')
ensemble.materialize().plot()
#
Retries
We bypass all litellm
/instructor
/provider-client retry mechanisms and roll our own retry logic. Simply pass the retries parameter to the via()
method and Verdict
will automatically retry all recoverable inference-time errors.
>> JudgeUnit.via('gpt-4o-mini', retries=3)
#
Inference Parameters
All other keyword arguments passed to the via()
method will be passed to the model's completion
method. Here is where you can specify any standard inference parameters such as temperature
, max_tokens
, etc.
>> JudgeUnit.via('gpt-4o-mini', temperature=1.2)
#
Rate Limiting
Refer to the Rate Limiting section.
#
Advanced
#
Model Selection Policy
As an extension of the retry logic, Verdict
also supports a general model selection precedence policy. This can be useful, for example, in cases where a weaker model may suffice in a majority of cases, but you want to fallback to a more capable model in the event of failure (eg, Unit#validate
does not succeed).
...
>> JudgeUnit.via(ModelSelectionPolicy.from_names([
# model , retries, inference parameters
('gpt-4o-mini', 1, {temperature: 1.2}),
( 'gpt-4o', 3, {temperature: 0.9}),
]))
#
Prefix/Prompt Caching
We find that many model providers cache prefixes/prompts by default, even when structured decoding fails. This effectively poisons the cache and causes retries to have no effect. To alleviate this, we add a 10-character random alpha nonce at the start of each prompt for all ProviderModel
s. Disable this by passing use_nonce=False
to the Model
constructor.
from verdict.model import ProviderModel
model = ProviderModel(
"gpt-4o-mini",
use_nonce=False
)
...
>> JudgeUnit.via(model)