Completion Parameters
Two layers of parameters:
CompletionParameters: normalized generation intent (temperature, max tokens, stop, response format, ...). NOT a wire shape: each provider'sbuild_requestmaps it to its own request body, so the same params drive any provider identically.NodeCompletionParameters: per-request behavior around the call (system prompt override, JSON repair, retry, cost tracking, caching, the wrappedCompletionParameters).
You pass NodeCompletionParameters to complete; None means defaults.
CompletionParameters
#![allow(unused)] fn main() { use minillmlib::CompletionParameters; let params = CompletionParameters::new() .with_max_tokens(512) .with_temperature(0.7) .with_stop(vec!["END".to_string()]); }
| Field | Meaning |
|---|---|
max_tokens | Provider emits its own key (max_completion_tokens, max_tokens, Anthropic's required max_tokens) |
temperature, top_p, top_k | Sampling |
frequency_penalty, presence_penalty, repetition_penalty | Penalties |
stop | Stop sequences (Anthropic stop_sequences) |
seed | Reproducibility |
response_format | Force JSON output (with_json_response()) |
reasoning | Extended-thinking effort/budget |
tools, tool_choice | Tool definitions (OpenAI-shaped, passed through) |
extra | Provider-specific keys (the honest escape hatch, e.g. OpenRouter routing) |
NodeCompletionParameters
#![allow(unused)] fn main() { use minillmlib::{CompletionParameters, NodeCompletionParameters}; let params = NodeCompletionParameters::new() .with_params(CompletionParameters::new().with_max_tokens(200)) .with_system_prompt("You are concise.") // prepend if the thread has no system message .expecting_json() // parse + repair the response as JSON .with_force_prepend("Answer: ") // make the model continue from this prefix .with_cost_tracking(true); // request usage and fire the cost callback }
| Builder | Meaning |
|---|---|
with_params(..) | The wrapped CompletionParameters |
with_system_prompt(..) | Prepend a system message if absent |
with_format_kwargs(..) / with_format_kwarg(k, v) | Fill {placeholder}s thread-wide at call time |
with_parse_json(true) / expecting_json() | Repair the response as JSON |
with_force_prepend(..) | Prime the assistant turn so the model continues it |
with_cache(true) | Auto-mark the whole prefix for caching (see Caching) |
with_cost_tracking(true) | Request and report usage/cost |
with_token_price(..) | Per-request price override |
retry, exp_back_off, back_off_time, max_back_off | Retry policy |
crash_on_refusal, crash_on_empty_response | Reject empty / no-JSON responses |
timeout_secs | Total deadline (non-streaming) or idle timeout (streaming) |