Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Completion Parameters

Two layers of parameters:

  • CompletionParameters: normalized generation intent (temperature, max tokens, stop, response format, ...). NOT a wire shape: each provider's build_request maps it to its own request body, so the same params drive any provider identically.
  • NodeCompletionParameters: per-request behavior around the call (system prompt override, JSON repair, retry, cost tracking, caching, the wrapped CompletionParameters).

You pass NodeCompletionParameters to complete; None means defaults.

CompletionParameters

#![allow(unused)]
fn main() {
use minillmlib::CompletionParameters;

let params = CompletionParameters::new()
    .with_max_tokens(512)
    .with_temperature(0.7)
    .with_stop(vec!["END".to_string()]);
}
FieldMeaning
max_tokensProvider emits its own key (max_completion_tokens, max_tokens, Anthropic's required max_tokens)
temperature, top_p, top_kSampling
frequency_penalty, presence_penalty, repetition_penaltyPenalties
stopStop sequences (Anthropic stop_sequences)
seedReproducibility
response_formatForce JSON output (with_json_response())
reasoningExtended-thinking effort/budget
tools, tool_choiceTool definitions (OpenAI-shaped, passed through)
extraProvider-specific keys (the honest escape hatch, e.g. OpenRouter routing)

NodeCompletionParameters

#![allow(unused)]
fn main() {
use minillmlib::{CompletionParameters, NodeCompletionParameters};

let params = NodeCompletionParameters::new()
    .with_params(CompletionParameters::new().with_max_tokens(200))
    .with_system_prompt("You are concise.")  // prepend if the thread has no system message
    .expecting_json()                         // parse + repair the response as JSON
    .with_force_prepend("Answer: ")           // make the model continue from this prefix
    .with_cost_tracking(true);                // request usage and fire the cost callback
}
BuilderMeaning
with_params(..)The wrapped CompletionParameters
with_system_prompt(..)Prepend a system message if absent
with_format_kwargs(..) / with_format_kwarg(k, v)Fill {placeholder}s thread-wide at call time
with_parse_json(true) / expecting_json()Repair the response as JSON
with_force_prepend(..)Prime the assistant turn so the model continues it
with_cache(true)Auto-mark the whole prefix for caching (see Caching)
with_cost_tracking(true)Request and report usage/cost
with_token_price(..)Per-request price override
retry, exp_back_off, back_off_time, max_back_offRetry policy
crash_on_refusal, crash_on_empty_responseReject empty / no-JSON responses
timeout_secsTotal deadline (non-streaming) or idle timeout (streaming)