Cost Tracking
The library tracks usage and cost per request, and is honest about when a cost is actually known.
Token buckets
Input tokens are split into three disjoint, additive buckets so caching is priced correctly across every provider's differing wire conventions:
uncached_input_tokens: full-price prompt tokens,cache_read_tokens: served from a warm cache (cheap),cache_write_tokens: written to the cache this request (a premium).
Total input is the sum of the three; cost is a clean weighted sum, no subtraction.
Resolution: never a fake $0
Every reported CostInfo carries a CostResolution:
| Resolution | Meaning |
|---|---|
Resolved | The USD cost is authoritative (native, or tokens × a configured TokenPrice) |
Unpriced | Tokens are real, but no native cost and no TokenPrice was set. cost is 0.0 but must NOT be treated as a free request. Set a TokenPrice to resolve it. |
Unknown | Cost could not be determined at all (no usage, and any out-of-band query failed) |
Check resolution before trusting cost.
A callback per completion
#![allow(unused)] fn main() { use minillmlib::{ChatNode, GeneratorInfo, NodeCompletionParameters, CompletionParameters, CostInfo}; use std::sync::{Arc, Mutex}; async fn run() -> minillmlib::Result<()> { let gen = GeneratorInfo::openrouter("google/gemini-2.5-flash-lite"); let total = Arc::new(Mutex::new(0.0)); let sink = total.clone(); let params = NodeCompletionParameters::new() .with_params(CompletionParameters::new().with_max_tokens(200)) .with_cost_tracking(true) .with_cost_callback(move |info: CostInfo| { // info.cost, .prompt_tokens, .completion_tokens, // .cache_read_tokens, .cache_write_tokens, .resolution *sink.lock().unwrap() += info.cost; }); let root = ChatNode::root("You are helpful."); root.add_user("Hi").complete(&gen, Some(¶ms)).await?; println!("total spent: {}", *total.lock().unwrap()); Ok(()) } }
Enforced tracking via CompletionContext
When you want cost reporting to be structurally guaranteed (not opt-in per call),
wrap the generator in a CompletionContext and use complete_tracked. It always
reports cost through the context's async callback, and on a cancelled or
usage-less stream it resolves out-of-band (e.g. OpenRouter's /generation query)
or reports Unknown, rather than silently booking $0.
#![allow(unused)] fn main() { use minillmlib::{CompletionContext, CostInfo, AsyncCostCallback, CompletionMeta, GeneratorInfo, ChatNode}; use std::sync::Arc; async fn run() -> minillmlib::Result<()> { let generator = GeneratorInfo::openrouter("m"); let callback: AsyncCostCallback = Arc::new(|cost: CostInfo, _meta: CompletionMeta| { Box::pin(async move { // persist `cost` to your DB / metering here let _ = cost; }) }); let ctx = CompletionContext::new(generator, serde_json::json!({}), callback, "https://app", "App"); let root = ChatNode::root("You are helpful."); let _answer = root.add_user("Hi").complete_tracked(&ctx, None).await?; Ok(()) } }
For streaming, complete_streaming_tracked returns a TrackedStream that settles
cost when it finishes or is cancelled (use cancel().await for a reliable
settle; a plain drop is best-effort).